Episodios

  • Why validity beats scale when building multi‑step AI systems
    Jan 6 2026

    In this episode, Dr. Sebastian (Seb) Benthall joins us to discuss research from his and Andrew's paper entitled “Validity Is What You Need” for agentic AI that actually works in the real world.

    Our discussion connects systems engineering, mechanism design, and requirements to multi‑step AI that creates enterprise impact to achieve measurable outcomes.

    • Defining agentic AI beyond LLM hype
    • Limits of scale and the need for multi‑step control
    • Tool use, compounding errors, and guardrails
    • Systems engineering patterns for AI reliability
    • Principal–agent framing for governance
    • Mechanism design for multi‑stakeholder alignment
    • Requirements engineering as the crux of validity
    • Hybrid stacks: LLM interface, deterministic solvers
    • Regression testing through model swaps and drift
    • Moving from universal copilots to fit‑for‑purpose agents

    You can also catch more of Seb's research on our podcast. Tune in to Contextual integrity and differential privacy: Theory versus application.


    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    40 m
  • 2025 AI review: Why LLMs stalled and the outlook for 2026
    Dec 22 2025

    Here it is! We review the year where scaling large AI models hit its ceiling, Google reclaimed momentum with efficient vertical integration, and the market shifted from hype to viability.

    Join us as we talk about why human-in-the-loop is failing, why generative AI agents validating other agents compounds errors, and how small expert data quietly beat the big models.

    • Google’s resurgence with Gemini 3.0 and TPU-driven efficiency
    • Monetization pressures and ads in co-pilot assistants
    • Diminishing returns from LLM scaling
    • Human-in-the-loop pitfalls and incentives
    • Agents vs validation and compounding error
    • Small, high-quality data outperforming synthetic
    • Expert systems, causality, and interpretability
    • Research trends return toward statistical rigor
    • 2026 outlook for ROI, governance, and trust

    We remain focused on the responsible use of AI. And while the market continues to adjust expectations for return on investment from AI, we're excited to see companies exploring "return on purpose" as the new foray into transformative AI systems for their business.


    What are you excited about for AI in 2026?


    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    42 m
  • Big data, small data, and AI oversight with David Sandberg
    Dec 9 2025

    In this episode, we look at the actuarial principles that make models safer: parallel modeling, small data with provenance, and real-time human supervision. To help us, long-time insurtech and startup advisor David Sandberg, FSA, MAAA, CERA, joins us to share more about his actuarial expertise in data management and AI.

    We also challenge the hype around AI by reframing it as a prediction machine and putting human judgment at the beginning, middle, and end. By the end, you might think about “human-in-the-loop” in a whole new way.

    • Actuarial valuation debates and why parallel models win
    • AI’s real value: enhance and accelerate the growth of human capital
    • Transparency, accountability, and enforceable standards
    • Prediction versus decision and learning from actual-to-expected
    • Small data as interpretable, traceable fuel for insight
    • Drift, regime shifts, and limits of regression and LLMs
    • Mapping decisions, setting risk appetite, and enterprise risk management (ERM) for AI
    • Where humans belong: the beginning, middle, and end of the system
    • Agentic AI complexity versus validated end-to-end systems
    • Training judgment with tools that force critique and citation

    Cultural references:

    • Foundation, AppleTV
    • The Feeling of Power, Isaac Asimov
    • Player Piano, Kurt Vonnegut

    For more information, see Actuarial and data science: Bridging the gap.



    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    50 m
  • Metaphysics and modern AI: What is space and time?
    Nov 11 2025

    We explore how space and time form a single fabric, testing our daily beliefs through questions about free-fall, black holes, speed, and momentum to reveal what models get right and where they break.

    To help us, we’re excited to have our friend David Theriault, a science and sci-fi afficionado; and our resident astrophysicist, Rachel Losacco, to talk about practical exploration in space and time. They'll even unpack a few concerns they have about how space and time were depicted in the movie Interstellar (2014).

    Highlights:

    • Introduction: Why fundamentals beat shortcuts in science and AI
    • Time as experience versus physical parameter
    • Plato’s ideals versus Aristotle’s change as framing tools
    • Free-fall, G-forces, and what we actually feel
    • Gravity wells, curvature, and moving through space-time
    • Black holes, tidal forces, and spaghettification
    • Momentum and speed: Laser probe, photon momentum, and braking limits
    • Doppler shifts, time dilation, and length contraction
    • Why light’s speed stays constant across frames
    • Modeling causality and preparing for the next paradigm

    This episode about space and time is the second in our series about metaphysics and modern AI. Each topic in the series is leading to the fundamental question, "Should AI try to think?"

    Step away from your keyboard and enjoy this journey with us.

    Previous episodes:

    • Introduction: Metaphysics and modern AI
    • What is reality?



    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    38 m
  • Metaphysics and modern AI: What is reality?
    Oct 27 2025

    In the first episode of our series on metaphysics, Michael Herman joins us from Episode #14 on “What is consciousness?” to discuss reality. More specifically, the question of objects in reality. The team explores Plato’s forms, Aristotle’s realism, emergence, and embodiment to determine whether AI models can approximate from what humans uniquely experience.

    • Defining objects via properties, perception, and persistence
    • Banana and circle examples for identity and ideals
    • Plato versus Aristotle on forms and realism
    • Ship of Theseus and continuity through change
    • Samples, complexes, and emergence in systems
    • Embodiment, consciousness, and why LLMs lack lived unity
    • Existentialist focus on subjective reality and meaning
    • Why metaphysics matters for AI governance and safety

    Join us for the next part of the metaphysics series to explore space and time. Subscribe now.

    What we're reading:

    • [Mumford's] Metaphysics: A Very Short Introduction (Andrew)



    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    39 m
  • Metaphysics and modern AI: What is thinking? - Series Intro
    Oct 7 2025

    This episode is the intro to a special project by The AI Fundamentalists’ hosts and friends. We hope you're ready for a metaphysics mini‑series to explore what thinking and reasoning really mean and how those definitions should shape AI research.

    Join us for thought-provoking discussions as we tackle basic questions: What is metaphysics and its relevance to AI? What constitutes reality? What defines thinking? How do we understand time? And perhaps most importantly, should AI systems attempt to "think," or are we approaching the entire concept incorrectly?

    Show notes:

    • Why metaphysics matters for AI foundations
    • Definitions of thinking from peers and what they imply
    • Mixture‑of‑experts, ranking, and the illusion of reasoning
    • Turing test limits versus deliberation and causality
    • Towers of Hanoi, agentic workflows, and brittle stepwise reasoning
    • Math, context, and multi‑component system failures
    • Proposed plan for the series and areas to explore
    • Invitation for resources, critiques, and future guests

    We hope you enjoy this philosophical journey to examine the intersection of ancient philosophical questions and cutting-edge technology.


    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    16 m
  • AI in practice: Guardrails and security for LLMs
    Sep 30 2025

    In this episode, we talk about practical guardrails for LLMs with data scientist Nicholas Brathwaite. We focus on how to stop PII leaks, retrieve data, and evaluate safety with real limits. We weigh managed solutions like AWS Bedrock against open-source approaches and discuss when to skip LLMs altogether.

    • Why guardrails matter for PII, secrets, and access control
    • Where to place controls across prompt, training, and output
    • Prompt injection, jailbreaks, and adversarial handling
    • RAG design with vector DB separation and permissions
    • Evaluation methods, risk scoring, and cost trade-offs
    • AWS Bedrock guardrails vs open-source customization
    • Domain-adapted safety models and policy matching
    • When deterministic systems beat LLM complexity

    This episode is part of our "AI in Practice” series, where we invite guests to talk about the reality of their work in AI. From hands-on development to scientific research, be sure to check out other episodes under this heading in our listings.

    Related research:

    • Building trustworthy AI: Guardrail technologies and strategies (N. Brathwaite)
    • Nic's GitHub


    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    35 m
  • AI in practice: LLMs, psychology research, and mental health
    Sep 4 2025

    We’re excited to have Adi Ganesan, a PhD researcher at Stony Brook University, the University of Pennsylvania, and Vanderbilt, on the show. We’ll talk about how large language models LLMs) are being tested and used in psychology, citing examples from mental health research. Fun fact: Adi was Sid's research partner during his Ph.D. program.

    Discussion highlights

    • Language models struggle with certain aspects of therapy including being over-eager to solve problems rather than building understanding
    • Current models are poor at detecting psychomotor symptoms from text alone but are oversensitive to suicidality markers
    • Cognitive reframing assistance represents a promising application where LLMs can help identify thought traps
    • Proper evaluation frameworks must include privacy, security, effectiveness, and appropriate engagement levels
    • Theory of mind remains a significant challenge for LLMs in therapeutic contexts; example: The Sally-Anne Test.
    • Responsible implementation requires staged evaluation before patient-facing deployment

    Resources

    To learn more about Adi's research and topics discussed in this episode, check out the following resources:

    • Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation
    • Therapist Behaviors paper: [2401.00820] A Computational Framework for Behavioral Assessment of LLM Therapists
    • Cognitive reframing paper: Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction - ACL Anthology
    • Faux Pas paper: Testing theory of mind in large language models and humans | Nature Human Behaviour
    • READI: Readiness Evaluation for Artificial Intelligence-Mental Health Deployment and Implementation (READI): A Review and Proposed Framework
    • Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation | npj Mental Health Research
    • GPT-4’s Schema of Depression: Explaining GPT-4’s Schema of Depression Using Machine Behavior Analysis
    • Adi’s Profile: Adithya V Ganesan - Google Scholar




    What did you think? Let us know.

    Do you have a question or a discussion topic for the AI Fundamentalists? Connect with them to comment on your favorite topics:

    • LinkedIn - Episode summaries, shares of cited articles, and more.
    • YouTube - Was it something that we said? Good. Share your favorite quotes.
    • Visit our page - see past episodes and submit your feedback! It continues to inspire future episodes.
    Más Menos
    42 m
adbl_web_global_use_to_activate_DT_webcro_1694_expandible_banner_T1