The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

De: Sam Charrington
  • Resumen

  • Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more.
    All rights reserved
    Más Menos
Episodios
  • Generative Benchmarking with Kelly Hong - #728
    Apr 23 2025
    In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.
    Más Menos
    54 m
  • Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
    Apr 14 2025
    In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727.
    Más Menos
    1 h y 34 m
  • Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726
    Apr 7 2025
    Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.
    Más Menos
    52 m
adbl_web_global_use_to_activate_webcro768_stickypopup

Lo que los oyentes dicen sobre The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Calificaciones medias de los clientes
Total
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0
Ejecución
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0
Historia
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.