Episodios

  • GDM’s Pushmeet Kohli on solving science's biggest challenges with AI
    Sep 15 2025

    Pushmeet Kohli, Head of Science and Strategic Initiatives at Google DeepMind, joins host Logan Kilpatrick to explore the intersection of AI and scientific discovery. Learn how the team's unique problem-solving framework led to innovations like AlphaFold and AlphaEvolve, and how new tools like AI Co-scientist aim to democratize these types of breakthroughs for everyone.

    Watch on YouTube: https://www.youtube.com/watch?v=o7mdsL6BHsk

    Chapters:
    0:00 - Intro
    1:04 - Recent Alpha launches
    02:15 - Framework for selecting research domains
    06:21 - Scientific, commercial and social impact
    15:00 - Wielding AGI for breakthroughs
    16:48 - Tech transfer and team collaboration
    19:46 - IMO Gold Medal
    21:42 - Evaluating math proofs
    22:55 - From specialized models to Deep Think
    24:22 - Do math skills generalize?
    25:53 - Generalizing the IMO model
    27:43 - Democratizing AI science tools
    30:09 - AI Co-scientist
    35:17 - An API for science?

    Más Menos
    37 m
  • Behind the scenes of Google's state-of-the-art "nano-banana" image model
    Aug 27 2025

    Join host Logan Kilpatrick in discussion with some of the minds behind Google's new state-of-the-art image model, Gemini 2.5 Flash. Product and research leads from the Gemini team break down the technology behind its key capabilities, including interleaved generation for complex edits and new approaches to achieving character consistency and pixel-perfect control. With Nicole Brichtova, Kaushik Shivakumar, Mostafa Dehghani and Robert Riachi.

    Watch on YouTube:

    Chapters:
    0:37 - New model introduction
    1:21 -Demo - Image Editing
    3:44 - Text rendering capabilities
    4:44 Beyond human preference evals
    6:44 - Text rendering as a proxy for quality
    8:38 - Positive transfer between modalities
    11:25 - Demo - Multi-turn, context aware image generation
    13:54 - Pixel-perfect editing and character consistency
    15:51 - Interleaved image generation
    17:59 - Specialized vs. native models
    19:52 - Understanding nuanced prompts
    20:59 - User feedback shaping model development
    22:37 - Improvements in character consistency
    24:17 - More natural looking images from team collaboration
    26:41 - What’s next for image generation models

    Más Menos
    31 m
  • Demis Hassabis on shipping momentum, better evals and world models
    Aug 11 2025

    Demis Hassabis, CEO of Google DeepMind, sits down with host Logan Kilpatrick. In this episode, learn about the evolution from game-playing AI to today's thinking models, how projects like Genie 3 are building world models to help AI understand reality and why new testing grounds like Kaggle’s Game Arena are needed to evaluate progress on the path to AGI.

    Watch on YouTube: https://www.youtube.com/watch?v=njDochQ2zHs

    Chapters:
    00:00 - Intro
    01:16 - Recent GDM momentum
    02:07 - Deep Think and agent systems
    04:11 - Jagged intelligence
    07:02 - Genie 3 and world models
    10:21 - Future applications of Genie 3
    13:01 - The need for better benchmarks and Kaggle Game Arena
    19:03 - Evals beyond games
    21:47 - Tool use for expanding AI capabilities
    24:52 - Shift from models to systems
    27:38 - Roadmap for Genie 3 and the omni model
    29:25 - The quadrillion token club

    Más Menos
    31 m
  • Building real-time voice applications with Live API
    Aug 6 2025

    Shrestha Basu Mallick, one of the product leads for the Gemini API, joins host Logan Kilpatrick for a deep dive of Gemini Live API, Google’s real-time, multimodal interface for developers. Learn about how native audio alongside new capabilities like proactive audio and async function calling unlocks the unique power of audio as an interface.

    Watch on YouTube: https://www.youtube.com/watch?v=4xlwlU6h-wM

    0:00 - Intro
    1:18 - Live API Overview
    3:36 - Why audio is a special modality
    5:07 - Speed vs. precision in audio
    6:17 - Controllable and promptable TTS
    8:31 - What developers are building with the Live API
    11:14 - URL context and async calling features
    15:02 - Proactive audio and affective dialog
    16:55 - Addressing developer feedback
    21:54 - Live API roadmap
    23:49 - The role of long context
    24:57 - What’s next for the Live API
    26:41 - State of the AI audio market
    30:10 - Advice for developers getting started with the Live API
    31:16 - Live API demo
    38:10 - Demo wrap up and closing

    Más Menos
    40 m
  • Building a frontier AI search experience
    Jul 23 2025

    Robby Stein, VP of Product for Google Search, joins host Logan Kilpatrick to explore how Search is evolving into a frontier AI product. Their conversation covers the shift from simple keywords to complex, conversational queries, the rise of agentic capabilities that can take action on your behalf, and the vision to help billions of users truly "ask anything." Learn more about the technology behind AI Overviews, AI Mode, Deep Search, and the future of multimodal interaction.

    Watch on YouTube: https://youtu.be/zUB5A_ezIOU

    Chapters
    01:07 Search as a Frontier AI Product
    02:38 Reaching 1.5 Billion Users
    03:37 What Is AI Mode?
    04:17 Understanding Query Fan-Out
    05:18 Balancing Latency and performance with Gemini 2.5 Pro
    06:51 How Deep Search works
    09:08 Fine-tuning models for product experience
    11:24 Shifting user behaviors
    14:07 The rise of visual search
    16:52 Speech and conversational AI in Search
    18:36 Comparing Gemini and Search
    20:04 Real-time tool use in Search
    22:52 Evolving the Search interface
    26:03 Making Search more personal
    29:15 The agentic future of Search
    31:15 Agents beyond booking tickets
    37:11 On-the-fly software creation
    38:06 Google DeepMind and Search collaboration
    40:08 What's next for Search


    Más Menos
    43 m
  • Gemini's Multimodality
    Jul 2 2025

    Ani Baddepudi, Gemini Model Behavior Product Lead, joins host Logan Kilpatrick for a deep dive into Gemini's multimodal capabilities. Their conversation explores why Gemini was built as a natively multimodal model from day one, the future of proactive AI assistants, and how we are moving towards a world where "everything is vision." Learn about the differences between video and image understanding and token representations, higher FPS video sampling, and more.

    Chapters:

    0:00 - Intro
    1:12 - Why Gemini is natively multimodal
    2:23 - The technology behind multimodal models
    5:15 - Video understanding with Gemini 2.5
    9:25 - Deciding what to build next
    13:23 - Building new product experiences with multimodal AI
    17:15 - The vision for proactive assistants
    24:13 - Improving video usability with variable FPS and frame tokenization
    27:35 - What’s next for Gemini’s multimodal development
    31:47 - Deep dive on Gemini’s document understanding capabilities
    37:56 - The teamwork and collaboration behind Gemini
    40:56 - What’s next with model behavior


    Watch on YouTube: https://www.youtube.com/watch?v=K4vXvaRV0dw

    Más Menos
    44 m
  • Building Gemini's Coding Capabilities
    Jun 16 2025

    Connie Fan, Product Lead for Gemini's coding capabilities, and Danny Tarlow, Research Lead for Gemini's coding capabilities, join host Logan Kilpatrick for an in-depth discussion on how the team built one of the world's leading AI coding models. Learn more about the early goals that shaped Gemini's approach to code, the rise of 'vibe coding' and its impact on development, strategies for tackling large codebases with long context and agents, and the future of programming languages in the age of AI.

    Watch on YouTube: ⁠https://www.youtube.com/watch?v=jwbG_m-X-gE⁠

    Chapters:

    0:00 - Intro
    1:10 - Defining Early Coding Goals
    6:23 - Ingredients of a Great Coding Model
    9:28 - Adapting to Developer Workflows
    11:40 - The Rise of Vibe Coding
    14:43 - Code as a Reasoning Tool
    17:20 - Code as a Universal Solver
    20:47 - Evaluating Coding Models
    24:30 - Leveraging Internal Googler Feedback
    26:52 - Winning Over AI Skeptics
    28:04 - Performance Across Programming Languages
    33:05 - The Future of Programming Languages
    36:16 - Strategies for Large Codebases
    41:06 - Hill Climbing New Benchmarks
    42:46 - Short-Term Improvements
    44:42 - Model Style and Taste
    47:43 - 2.5 Pro’s Breakthrough
    51:06 - Early AI Coding Experiences
    56:19 - Specialist vs. Generalist Models

    Más Menos
    1 h
  • Sergey Brin on the Future of AI & Gemini
    Jun 16 2025

    A conversation with Sergey Brin, co-founder of Google and computer scientist working on Gemini, in reaction to a year of progress with Gemini.

    Watch on YouTube: https://www.youtube.com/watch?v=o7U4DV9Fkc0

    Chapters

    0:20 - Initial reactions to I/O
    2:00 - Focus on Gemini’s core text model
    4:29 - Native audio in Gemini and Veo 3
    8:34 - Insights from model training runs
    10:07 - Surprises in current AI developments vs. past expectations
    14:20 - Evolution of model training
    16:40 - The future of reasoning and Deep Think
    20:19 - Google’s startup culture and accelerating AI innovation
    24:51 - Closing

    Más Menos
    27 m