The AI-ronman 🚀
Posts
OpenAI's DevDay updates ✨

OpenAI's DevDay updates ✨

October 02, 2024

The AI-ronman 🚀

🎩 Tip your hats, humans! The AI show is about to begin, and trust us, it's a real page-turner—unlike some eBooks we know.

Quick Takes ⚡

Google's answer to ChatGPT's advanced voice mode now freely available
OpenAI reveals four key developments at DevDay 2024
Nvidia launches NVLM-D-72B to rival GPT-4o
Microsoft enhances Copilot with Voice and Vision capabilities
OpenAI's marketing chief says o1 can handle 5-hour tasks
Meta's digital twin catalog brings realism to 3D models

Deep Dive 🔍

Google's answer to ChatGPT's advanced voice mode now freely available 📱

Google announces that Gemini Live is now free for all Android users via the Gemini app. This advanced assistant is faster and offers a more intuitive, immersive experience than the standard version. After initially being available to Gemini Advance subscribers, Google has expanded the rollout to all Android devices, as shared on X. To get started, download the Gemini Live app, open it, tap “Live,” follow the on-screen steps, and begin interacting. Reviews highlight its enhanced conversation capabilities, though it’s limited to US English with multiple accent choices.

OpenAI reveals four key developments at DevDay 2024

During DevDay 2024, OpenAI launched four updates to enhance AI accessibility and affordability for developers:
1. Realtime API: Facilitates the development of speech-to-speech applications by utilizing the same model that powers Advanced Voice, offering a selection of six distinct voices. It is suitable for applications like travel planning and phone orders at approximately $18 per hour. It supports instant responses, improving user experience while requiring disclosure of AI usage.
2. Vision Fine-Tuning API: Enhances GPT-4o by integrating image and text data, boosting visual understanding for applications such as visual search, autonomous vehicle detection, and medical imaging with as few as 100 images. Developers retain full control over their data with built-in safety checks.
3. Prompt Caching: Enables cost and latency reductions by reusing input tokens from previous prompts, ideal for coding tasks and multi-turn conversations. It can reduce processing times by up to 50% and is automatically enabled for prompts over 1,024 tokens in the latest GPT-4o versions.
4. Model Distillation: Allows developers to build cost-efficient models using outputs from GPT-4o and o1-preview, simplifying high-performance model creation like GPT-4o mini. Features include automatic dataset generation and performance evaluations, with free training tokens available until October 31.

Nvidia launches NVLM-D-72B to rival GPT-4o 💥

Nvidia has released NVLM-D-72B, an open and expansive AI model set to compete with GPT-4. It excels in processing complex visual data, meme analysis, and mathematical problem-solving, achieving a 4.3-point improvement in text tasks through multimodal training. The AI community has enthusiastically received the model for its research and development potential, but it also poses concerns regarding possible misuse and threats to existing AI business models.
Microsoft enhances Copilot with Voice and Vision capabilities 🚀

Microsoft announced a series of AI enhancements for its Copilot assistant on Windows PCs, featuring advanced voice and vision functionalities. Copilot Voice facilitates conversational interactions, mirroring OpenAI’s Voice Mode, while Copilot Vision enables contextual assistance within the Microsoft Edge browser by understanding web content. The update also brings back the Recall feature with enhanced security, introduces ‘Think Deeper’ for improved reasoning using OpenAI’s o1 model, and includes personalization tweaks to better align with user preferences, pushing Copilot closer to a fully agentic assistant.

OpenAI's marketing chief says o1 can handle 5-hour tasks 📜

At HubSpot's Inbound event, Dane Vahey, OpenAI’s Head of Strategic Marketing, highlighted AI’s expanding role in marketing. He introduced a suite of AI tools for data analysis, automation, research, and content creation, emphasizing AI as a "thinking partner." Vahey showcased the new o1 model, capable of managing tasks up to five hours, enhancing strategic planning for marketers despite minor flaws.
Meta's digital twin catalog brings realism to 3D models 📦

Meta has introduced the Digital Twin Catalog (DTC) through Reality Labs Research, offering more than 2,400 ultra-accurate 3D models. 🍽️ The DTC aims to democratize digital twin technology, focusing on everyday objects like kitchen utensils. Advanced scanning technology ensures each model is captured with sub-millimeter precision, delivering exceptional detail and realism for various digital applications.

🎭 Curtain call! The bots need their beauty sleep, and so do you. Keep those circuits humming!

Ciao for now!

Author: Poonam 👧

Karan 😎 🚀
CEO, GlazeGPT.com