The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

De: Sam Charrington
  • Resumen

  • Machine learning and artificial intelligence are dramatically changing the way businesses operate and people live. The TWIML AI Podcast brings the top minds and ideas from the world of ML and AI to a broad and influential community of ML/AI researchers, data scientists, engineers and tech-savvy business and IT leaders. Hosted by Sam Charrington, a sought after industry analyst, speaker, commentator and thought leader. Technologies covered include machine learning, artificial intelligence, deep learning, natural language processing, neural networks, analytics, computer science, data science and more.
    All rights reserved
    Más Menos
Episodios
  • CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729
    Apr 29 2025
    Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729.
    Más Menos
    56 m
  • Generative Benchmarking with Kelly Hong - #728
    Apr 23 2025
    In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728.
    Más Menos
    54 m
  • Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727
    Apr 14 2025
    In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse, interpretable alternatives. The conversation explores several fascinating discoveries about large language models, including how they plan ahead when writing poetry (selecting the rhyming word "rabbit" before crafting the sentence leading to it), perform mathematical calculations using unique algorithms, and process concepts across multiple languages using shared neural representations. Emmanuel details how the team can intervene in model behavior by manipulating specific neural pathways, revealing how concepts are distributed throughout the network's MLPs and attention mechanisms. The discussion highlights both capabilities and limitations of LLMs, showing how hallucinations occur through separate recognition and recall circuits, and demonstrates why chain-of-thought explanations aren't always faithful representations of the model's actual reasoning. This research ultimately supports Anthropic's safety strategy by providing a deeper understanding of how these AI systems actually work. The complete show notes for this episode can be found at https://twimlai.com/go/727.
    Más Menos
    1 h y 34 m
adbl_web_global_use_to_activate_webcro768_stickypopup

Lo que los oyentes dicen sobre The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Calificaciones medias de los clientes
Total
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0
Ejecución
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0
Historia
  • 5 out of 5 stars
  • 5 estrellas
    3
  • 4 estrellas
    0
  • 3 estrellas
    0
  • 2 estrellas
    0
  • 1 estrella
    0

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.