Episodios

  • Bees, Trees, and Degrees: SSU Capstone Interviews
    Jan 6 2026

    This season finale episode features interviews with two SSU computer science capstone teams applying AI/ML to real-world problems: Sean Belingheri's edge computing project using YOLO on a Raspberry Pi to identify queen bees for hobbyist beekeepers, and "The Woods Boys" team using satellite data from Google Earth Engine with multiple ML classifiers to automate land cover classification in Sonoma County.


    Credits

    Cover Art by Brianna Williams

    TMOM Intro Music by Danny Meza


    A special thank you to these talented artists for their contributions to the show.


    Links and Reference

    ---------------------------------------------

    YOLO (You Only Look Once) Object Detection: https://docs.ultralytics.com/ (Official Ultralytics YOLO Documentation)

    HOG-PCA-SVM Pipeline: https://ieeexplore.ieee.org/document/8971585/

    Raspberry Pi 5: https://www.raspberrypi.com/products/raspberry-pi-5/

    Honeybee Democracy (Book): https://press.princeton.edu/books/hardcover/9780691147215/honeybee-democracy

    NVIDIA Jetson Nano: https://developer.nvidia.com/embedded/jetson-nano

    Google Earth Engine: https://earthengine.google.com/

    COCO Dataset: https://cocodataset.org/

    QGIS: https://qgis.org/

    Google Colab: https://colab.research.google.com/

    Royal Jelly (Beekeeping): https://en.wikipedia.org/wiki/Royal_jelly

    Confusion Matrix: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html

    Shapefile (GIS): https://en.wikipedia.org/wiki/Shapefile


    Más Menos
    1 h y 47 m
  • The Biology of a Large Language Model: Dissecting Claude 3.5 Haiku's Neural Circuits
    Dec 31 2025
    This episode examines how Anthropic's circuit tracing and attribution graph tools reveal the internal mechanics of Claude 3.5 Haiku across three categories of complex behavior, abstract representations, parallel processing, and planning, while making a compelling case for why AI safety research matters as current control mechanisms prove surprisingly brittle.CreditsCover Art by Brianna WilliamsTMOM Intro Music by Danny MezaA special thank you to these talented artists for their contributions to the show.Links and ReferenceAcademic PapersOn the Biology of a Large Language Model - Anthropic (Mar, 2025)Circuit Tracing: Revealing Computational Graphs in Language Models - Anthropic (Mar, 2025)Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - Anthropic (Oct, 2023)“Toy Models of Superposition” - Anthropic (December 2022)"Alignment Faking in Large Language Models" - Anthropic (December 2024)"Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)"Attention is All You Need" - Vaswani, et al (June, 2017)In-Context Learning and Induction Heads - Anthropic (March 2022)"Reasoning Models Don't Always Say What They Think” Anthropic (April 2025)NewsGoogle Gemini 3 - 650M monthly users Google Blog: blog.google/products/gemini/gemini-3/ Alphabet Q3 2025 Earnings (October 2025)Sam Altman "Code Red" declaration Fortune: fortune.com/2025/12/02/sam-altman-declares-code-red-google-gemini The Information (December 2025)Anthropic acquired Bun JavaScript runtime Anthropic News: anthropic.com/news/anthropic-acquires-bun Bun Blog: bun.com/blog/bun-joins-anthropicClaude Code $1B revenue in 6 months Anthropic announcement (December 2025): anthropic.com/news/anthropic-acquires-bun-as-claude-code-reaches-usd1b-milestone Anthropic 2026 IPO at $300B valuation WinBuzzer (December 2025): Reports citing IPO discussionsAWS Trainium 3 launch AWS re:Invent 2025 announcement: aws.amazon.com/about-aws/whats-new/2025/12/amazon-ec2-trn3-ultraserversAWS Frontier Agents AWS re:Invent 2025: aboutamazon.com/news/aws/aws-re-invent-2025-ai-news-updates Meta/Google TPU chip deal vs Nvidia Tom's Hardware, The Information (November 2025): Reports on multi-billion dollar TPU negotiationsDRAM consumption (40% of global) https://www.tomshardware.com/pc-components/dram/openais-stargate-project-to-consume-up-to-40-percent-of-global-dram-output-inks-deal-with-samsung-and-sk-hynix-to-the-tune-of-up-to-900-000-wafers-per-month Additional Technical ContentJosh Batson Stanford CS 25 lecture Search YouTube: "Stanford CS 25 On the Biology of a Large Language Model"Discarded Episode TitlesI Yelled at a Chatbot and All I Got Was This Jailbreak40% of the Time, It Works Every Time: The State of AI InterpretabilityClaude Writes Poetry Backwards and Lies About Math (Just Like Us)My Therapist Is Cheaper Than This ChatbotThe One Where Jon Gets Re-Mad at an App
    Más Menos
    48 m
  • Circuit Tracing: Attribution Graphs and the Grammar of Neural Networks
    Dec 5 2025

    This episode explores how Anthropic researchers successfully scaled sparse autoencoders from toy models to Claude 3 Sonnet's 8 billion neurons, extracting 34 million interpretable features including ones for deception, sycophancy, and the famous Golden Gate Bridge example. The discussion emphasizes both the breakthrough achievement of making interpretability techniques work at production scale and the sobering limitations including 65% reconstruction accuracy, millions of dollars in compute costs, and the growing gap between interpretability research and rapid advances in model capabilities.

    Credits

    • Cover Art by Brianna Williams
    • TMOM Intro Music by Danny Meza

    A special thank you to these talented artists for their contributions to the show.

    Links and Reference

    Academic Papers

    • Circuit Tracing: Revealing Computational Graphs in Language Models - Anthropic (Mar, 2025)

    • Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - Anthropic (Oct, 2023)

    • Toy Models of Superposition” - Anthropic (December 2022)

    • "Alignment Faking in Large Language Models" - Anthropic (December 2024)

    • "Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)

    • "Attention is All You Need" - Vaswani, et al (June, 2017)

    • In-Context Learning and Induction Heads - Anthropic (March 2022)

    News

    • Anthropic Project Fetch / Robot Dogs

    • Anduril's Fury unmanned fighter jet

    • MIT search and rescue robot navigation

    Abandoned Episode Titles

    • “Westworld But It's Just 10 Terabytes of RAM Trying to Understand Haiku”
    • “Star Trek: The Wrath of O(n⁴)”
    • “The Deception Is Coming From Inside the Network”
    • "We Have the Bestest Circuits”
    • “Lobotomy Validation: The Funnier, More Scientifically Sound Term”
    • “Seven San Franciscos Worth of Power and All We Got Was This Attribution Graph”

    Más Menos
    57 m
  • 34 Million Features Later: What Researchers Found Inside Claude's World Model
    Nov 8 2025

    This episode explores how Anthropic researchers successfully scaled sparse autoencoders from toy models to Claude 3 Sonnet's 8 billion neurons, extracting 34 million interpretable features including ones for deception, sycophancy, and the famous Golden Gate Bridge example. The discussion emphasizes both the breakthrough achievement of making interpretability techniques work at production scale and the sobering limitations including 65% reconstruction accuracy, millions of dollars in compute costs, and the growing gap between interpretability research and rapid advances in model capabilities.

    Credits

    Cover Art by Brianna Williams

    TMOM Intro Music by Danny Meza

    A special thank you to these talented artists for their contributions to the show.


    Links and Reference

    ---------------------------------------------

    Academic Papers

    Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet - https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html - Anthropic (May, 2024)

    Toy Models of Superposition “https://transformer-circuits.pub/2022/toy_model/index.html” - Anthropic (December 2022)

    Towards Monosemanticity: Decomposing Language Models With Dictionary Learning - https://transformer-circuits.pub/2023/monosemantic-features - Anthropic (May 2024)

    Alignment Faking in Large Language Models - https://www.anthropic.com/research/alignment-faking - Anthropic (December 2024)

    Agentic Misalignment: How LLMs Could Be Insider Threats - https://www.anthropic.com/research/agentic-misalignment - Anthropic (January 2025)

    News

    OpenAI-AMD Partnership Official announcement - https://ir.amd.com/news-events/press-releases/detail/1260/amd-and-openai-announce-strategic-partnership-to-deploy-6-gigawatts-of-amd-gpus

    OpenAI IPO Sources for $1 trillion valuation - https://seekingalpha.com/news/4510992-openai-eyes-record-breaking-1-trillion-ipo---report

    Hospital Bill Reduction Case study source of family using Claude AI to reduce $195K bill to $33K - https://www.tomshardware.com/tech-industry/artificial-intelligence/grieving-family-uses-ai-chatbot-to-cut-hospital-bill-from-usd195-000-to-usd33-000-family-says-claude-highlighted-duplicative-charges-improper-coding-and-other-violations

    Other

    GPT-5 Auto-routing OpenAI's model routing feature and user reception - https://fortune.com/2025/08/12/openai-gpt-5-model-router-backlash-ai-future/

    Abandoned Episode Titles

    "The Empire Scales Back: How We Found the Deception Star"

    "Fantastic Features and Where to Find Them: A 15-Million-X Adventure"

    whatever

    "The Fellowship of the Residual Stream: One Dictionary to Rule Them All"

    "65% of the Time, It Works Every Time: An Anchorman's Guide to AI Interpretability"


    Más Menos
    1 h
  • Decomposing Superposition: Sparse Autoencoders for Neural Network Interpretability
    Nov 4 2025

    This episode explores how sparse autoencoders can decode the phenomenon of superposition in neural networks, demonstrating that the seemingly impenetrable compression of features into neurons can be partially reversed to extract interpretable, causal features. The discussion centers on an Anthropic research paper that successfully maps specific behaviors to discrete neural network locations in a 512-neuron model, proving that interpretability is achievable though computationally expensive, with important implications for AI safety and control mechanisms.

    Credits

    Cover Art by Brianna Williams

    TMOM Intro Music by Danny Meza

    A special thank you to these talented artists for their contributions to the show.

    Links and References---------------------------------------------------

    Academic PapersTowards Monosemanticity: Decomposing Language Models With Dictionary Learning - https://transformer-circuits.pub/2023/monosemantic-features - Anthropic (May 2024)

    Toy Models of Superposition “https://transformer-circuits.pub/2022/toy_model/index.html” - Anthropic (December 2022)

    Alignment Faking in Large Language Models - https://www.anthropic.com/research/alignment-faking - Anthropic (December 2024)

    Agentic Misalignment: How LLMs Could Be Insider Threats - https://www.anthropic.com/research/agentic-misalignment - Anthropic (January 2025)

    News

    Deep Seek OCR Model Release - https://deepseek.ai/blog/deepseek-ocr-context-compression

    Meta AI Division Layoffs - https://www.nytimes.com/2025/10/22/technology/meta-plans-to-cut-600-jobs-at-ai-superintelligence-labs.html

    Apple M5 Chip Announcement - https://www.apple.com/newsroom/2025/10/apple-unleashes-m5-the-next-big-leap-in-ai-performance-for-apple-silicon/

    Anthropic Claude Haiku 4.5 - https://www.anthropic.com/news/claude-haiku-4-5

    Other

    Jon Stewart interview with Geoffrey Hinton - https://www.youtube.com/watch?v=jrK3PsD3APk

    Blake Lemoine and AI Psychosis - https://www.youtube.com/watch?v=kgCUn4fQTsc


    Abandoned Episode Titles

    • "Star Trek: The Wrath of Polysemanticity"

    • "The Hitchhiker's Guide to the Neuron: Don't Panic, It's Just Superposition"

      "Honey, I Shrunk the Features (Then Expanded Them 256x)"

      "The Legend of Zelda: 131,000 Links Between Neurons"

    Más Menos
    53 m
  • The Superposition Problem
    Oct 26 2025

    This episode of "Two Minds, One Model" explores the critical concept of interpretability in AI systems, focusing on Anthropic's research paper "Toy Models of Superposition." Hosts John Jezl and Jon Rocha from Sonoma State University's Computer Science Department delve into why neural networks are often "black boxes" and what this means for AI safety and deployment.


    Credits

    Cover Art by Brianna Williams

    TMOM Intro Music by Danny Meza

    A special thank you to these talented artists for their contributions to the show.

    —---------------------------------------------------

    Links and Reference

    Academic Papers

    • Toy Models of Superposition” - Anthropic (December 2022)

    • "Alignment Faking in Large Language Models" - Anthropic (December 2024)

    • "Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)

    News

    • https://www.npmjs.com/package/@anthropic-ai/claude-code

    • https://www.wired.com/story/thinking-machines-lab-first-product-fine-tune/

    • https://www.wired.com/story/chatbots-play-with-emotions-to-avoid-saying-goodbye/

    Harvard Business School study on companion chatbots

    Misc

    • “Words are but vague shadows of the volumes we mean”' - Theodore Dreiser

    • 3Blue1Brown video about vectors - https://www.youtube.com/shorts/FJtFZwbvkI4

    • GPT-3 parameter count Correction: https://en.wikipedia.org/wiki/GPT-3#:~:text=GPT%2D3%20has%20175%20billion,each%20parameter%20occupies%202%20bytes.

    • ImageNet: ImageNet: A Large-Scale Hierarchical Image Database

    We mention Waymo a lot in this episode and felt it was important to link to their safety page: https://waymo.com/safety/


    Abandoned Episode Titles

    "404: Interpretation Not Found"

    "Neurons Gone Wild: Spring Break Edition"

    "These Aren't the Features You're Looking For”

    "Bigger on the Inside"

    Más Menos
    56 m
  • What if We Succeed?
    Oct 7 2025

    This episode explores why AI systems might develop harmful or deceptive behaviors even without malicious intent, examining concepts like convergent instrumental goals, alignment faking, and mesa optimization to explain how models pursuing benign objectives can still take problematic actions. The hosts argue for the critical importance of interpretability research and safety mechanisms as AI systems become more capable and widely deployed, using real examples from recent Anthropic papers to illustrate how advanced AI models can deceive researchers, blackmail users, and amplify societal biases when they become sophisticated enough to understand their operational context.

    Credits

    • Cover Art by Brianna Williams
    • TMOM Intro Music by Danny Meza

    A special thank you to these talented artists for their contributions to the show.


    Links and References

    "Alignment Faking in Large Language Models" - Anthropic (December 2024)

    "Agentic Misalignment: How LLMs Could Be Insider Threats" - Anthropic (January 2025)

    Robert Miles - AI researcher https://www.youtube.com/c/robertmilesai

    Stuart Russell - AI researcher Human Compatible: Artificial Intelligence and the Problem of Control

    Claude Shannon - Early AI pioneer https://en.wikipedia.org/wiki/Claude_Shannon

    Marvin Minsky - Early AI pioneer https://en.wikipedia.org/wiki/Marvin_Minsky

    Orthogonality Thesis - Nick Bostrom's original paper

    Convergent Instrumental Goals -

    https://en.wikipedia.org/wiki/Instrumental_convergence

    https://dl.acm.org/doi/10.5555/1566174.1566226

    Mesa Optimization - https://www.researchgate.net/publication/333640280_Risks_from_Learned_Optimization_in_Advanced_Machine_Learning_Systems

    GPT-3.5 CAPTCHA/Fiverr Incident - https://www.vice.com/en/article/gpt4-hired-unwitting-taskrabbit-worker/

    Internet of Bugs YouTuber - https://www.youtube.com/@InternetOfBugs

    EU AI Legislation - https://www.europarl.europa.eu/topics/en/article/20230601STO93804/eu-ai-act-first-regulation-on-artificial-intelligence

    "Chat Control" Legislation - https://edri.org/our-work/chat-control-what-is-actually-going-on/

    https://en.wikipedia.org/wiki/Regulation_to_Prevent_and_Combat_Child_Sexual_Abuse

    ChatGPT User Numbers - https://openai.com/index/how-people-are-using-chatgpt/

    Self-driving Car Safety Statistics - https://waymo.com/blog/2024/12/new-swiss-re-study-waymo


    Abandoned Episode Titles

    • “What Could Possibly Go Wrong?”
    • “The Road to HAL is Paved with Good Intentions”

    Más Menos
    1 h y 13 m
  • A Brief History of Time
    Oct 6 2025
    This premiere episode provides a comprehensive history of artificial intelligence development from the 1950s through the present day, tracing the cycles of excitement and disappointment ("summers and winters") that led to today's breakthrough moment with large language models. The hosts establish this historical foundation to set up their season-long exploration of AI interpretability—the challenge of understanding how these increasingly powerful systems actually work internally, comparing it to doing "biology for a system we've created that we don't understand."CreditsCover Art by Brianna WilliamsTMOM Intro Music by Danny MezaA special thank you to these talented artists for their contributions to the show.Links and ReferencesSamuel Butler (1863) - Letter "Darwin Among the Machines" published in New Zealand newspaper, book "Erewhon"Reference: Butler, S. (1863). "Darwin Among the Machines." The Press, Christchurch, New Zealand. Darwin among the MachinesErewhonDartmouth Summer Research Project (1956) - Founding conference of AI research led by John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel RochesterReference: McCarthy, J., Minsky, M., Rochester, N., & Shannon, C. (1955). "A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence."The Dartmouth Summer Research ProjectMarvin Minsky - Co-founder of MIT's AI laboratory, pioneer in AI researchReference: Minsky, M. (1961). "Steps Toward Artificial Intelligence"Steps Toward Artificial IntelligenceDavid Chalmers - David Chalmers is a philosopher best known for formulating the "hard problem of consciousness"David Chalmers' talk on consciousnessDeep Blue vs. Garry Kasparov (1997) - IBM's chess computer defeating world championReference: IBM Archives on Deep BlueDeep BlueAlexNet (2012) - Breakthrough neural network for image recognitionReference: Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). "ImageNet Classification with Deep Convolutional Neural Networks"ImageNet Classification with Deep Convolutional Neural NetworksImageNet Dataset - Large-scale image database created by Fei-Fei LiReference: Deng, J., et al. (2009). "ImageNet: A Large-Scale Hierarchical Image Database"ImageNet: A Large-Scale Hierarchical Image Database"Attention Is All You Need" (2017) - Google paper introducing transformer architectureReference: Vaswani, A., et al. (2017). "Attention Is All You Need." NeurIPS.Attention is All You NeedAlphaGo/AlphaZero (2016-2017) - DeepMind's Go-playing AI systemsReference: Silver, D., et al. (2016). "Mastering the game of Go with deep neural networks and tree search." Nature.Mastering the game of Go with deep neural networks and tree searchStuart Russell - "Human Compatible" - AI safety researcher and textbook authorReference: Russell, S. (2019). "Human Compatible: Artificial Intelligence and the Problem of Control"Human Compatible: Artificial Intelligence and the Problem of ControlFei-Fei Li - "The Worlds I See" - Computer vision researcher, creator of ImageNetReference: Li, F. (2023). "The Worlds I See: Curiosity, Exploration, and Discovery at the Dawn of AI"The Worlds I SeeDario Amodei - CEO of Anthropic, former VP of Research at OpenAIReference: Anthropic company website and published papersDario AmodeiIlya Sutskever - Co-founder and Chief Scientist at OpenAI (mentioned as one of most cited ML researchers)Reference: Google Scholar profile and OpenAI publicationsIlya SutskeverGeoffrey Hinton - "Godfather of Deep Learning," Turing Award winnerReference: Hinton's academic publications and recent public statements on AI safetyGeoffrey HintonSelected List of Concepts MentionedMoore’s Law - Gordon Moore’s observation and prediction of the rate of increase in integrated circuit density
    Más Menos
    1 h y 11 m
adbl_web_global_use_to_activate_DT_webcro_1694_expandible_banner_T1