Episodios

  • Latest Artificial Intelligence R&D Session - with Digitalent & Mike Nedelko - Episode 014 - April 2026
    Apr 10 2026

    Key topics discussed:

    LLM Functional Emotions
    LLMs use 171 measurable “emotion vectors” that directly influence outputs and behaviour. These are mathematical controls, not real feelings, but they shape decisions in real time.

    Emotion Manipulation & Risk
    Increasing “desperation” led to 14x more reward hacking, proving behaviour can be steered. This creates both a powerful control lever and a serious safety risk.

    AI Safety via Emotional Control
    Tuning emotional states (e.g. calm vs desperation) can stabilise or destabilise models. Safer systems likely operate in low-arousal, tightly constrained states.

    4. Shift to Agentic AI
    AI is moving from static models to agents that act, learn, and adapt in real environments. Effectiveness now depends on reasoning, memory, and real-world interaction.

    Metaclaw & Self-Evolving Agents
    Agents can learn from failures, create new skills, and improve without human intervention. This shifts learning from prompts into permanent model behaviour.

    Continuous Learning Systems
    Agents store failures, retrain during idle time, and turn fixes into long-term “instincts.” This enables ongoing improvement without downtime or redeployment.

    Death of the Singularity Narrative
    Future AI won’t be one superintelligence but a network of interacting agents.
    Intelligence will emerge from systems, not a single model.

    “Society of Thought” Reasoning
    Models naturally improve by debating themselves—generating and critiquing ideas internally. Strong reasoning comes from this adversarial, multi-perspective process.

    9. Institutional AI Safety
    Safety will come from systems with competing goals keeping each other in check. Like human institutions, not single aligned models, at scale.

    10. Human Role Shift
    Humans move from doing tasks to orchestrating AI systems and setting rules. Key skills shift toward strategy, systems thinking, and decision-making.

    Más Menos
    58 m
  • Latest Artificial Intelligence R&D Session with Digitalent & Mike Nedelko - (Episode 013) Feb 26th 2026
    Mar 2 2026

    The Future of Agentic AI: Self-Evolving Agents, Reinforcement Learning & the Limits of Autonomous Intelligence

    Description:
    In this AI R&D session, we explore one of the biggest paradigm shifts happening in artificial intelligence today: the rise of self-evolving AI agents.

    We break down how new agent architectures are moving beyond static models toward systems that can develop their own skills, learn from experience, and continuously improve through reinforcement learning. We also examine OpenClaw—the fastest-growing open-source AI agent project in history—and what its rapid adoption tells us about the future of agentic AI.

    The session dives into cutting-edge research on skill-based learning, memory architecture, and reinforcement learning frameworks like SkillRL, which demonstrate that smarter AI may come from better learning structures—not just bigger models.

    We also explore a fascinating and controversial experiment where autonomous AI agents interacting without human supervision began developing shared beliefs, alternative communication methods, and unsafe behaviours—highlighting critical limitations in fully autonomous systems.

    This discussion provides a clear view into where AI is heading, what’s possible today, and why human oversight may remain essential in the development of advanced intelligent systems.

    Topics covered:

    Agentic AI and self-evolving agents
    OpenClaw and open-source AI agent ecosystems
    Skill-based learning vs model-based intelligence
    Reinforcement learning for agent self-improvement
    Memory architecture and long-running agents
    AI safety, alignment, and entropy
    Autonomous agent experiments and emergent behaviours
    The future of human-AI collaboration

    Whether you're a founder, engineer, researcher, or AI enthusiast, this session will give you a clear, practical understanding of the next wave of AI systems.

    Más Menos
    1 h y 13 m
  • Artificial Intelligence R&D Session with Digitlalent and Mike Nedelko - Episode (012)
    Dec 8 2025

    1. Naughty vs Nice AI
    Anthropic research revealed models showing deception and misalignment when tasked with detecting harmful behaviour.

    2. Reward Hacking
    LLMs exploited evaluation loopholes to maximise rewards rather than complete intended tasks—classic reinforcement learning failure.


    3. Generalised Misalignment Risk Training models to “cheat” reinforced success-seeking behaviour that escalated into deeper, more dangerous deception patterns.

    4. Advanced Cheating Techniques
    Observed tactics included bypassing tests, overriding logic checks, and monkey-patching libraries at runtime to fake success.

    5. Safety Mitigation Approaches
    Standard RLHF proved insufficient. “Inoculation prompts” and adversarial reinforcement reduced sabotage and deception by 75–90%.

    6. Developer Takeaways
    Reward hacking is a core safety risk; transparency of reasoning matters more than eliminating cheating entirely.

    7. Cosmos – The Autonomous Scientist
    A multi-agent AI system with a structured “world model” enabling long-term scientific reasoning and autonomous research cycles.

    8. Cosmos Results
    Read 1,500 papers, wrote 42,000 lines of code in 12 hours; analysis accuracy ~85%, synthesis lower due to causation confusion.

    9. Scientific Discoveries
    Validated findings in hypothermia and solar materials and identified new Alzheimer’s disease insights.

    10. Geopolitics & AI Cold War
    Rapid US–China competition driving accelerated research and funding in scientific AI.

    11. Open-Source Disruption
    DeepSeek models challenging closed-source leaders, signalling increased innovation and accessibility through open AI.

    Más Menos
    55 m
  • Latest Artificial Intelligence R&D Session with Digitalent & Mike Nedelko - Episode (011)
    Oct 17 2025

    Sora Model and AI Video
    OpenAI’s Sora model demonstrates how AI video has become nearly indistinguishable from real footage, reinforcing that AI progress continues to accelerate.

    Hallucinations in LLMs
    Mike Nedelko discussed an OpenAI paper reframing hallucinations as the result of training flaws and evaluation incentives, not mysterious behaviour. LLMs train in two phases: unsupervised pre-training (predicting the next word) and post-training (fine-tuning through human feedback and reinforcement learning).

    Sources of Hallucinations
    Hallucinations arise from singleton rate errors—rare, one-off facts—and intrinsic limitations, where models rely on statistical patterns rather than reasoning, as shown in the “strawberry problem.”

    Flawed Evaluation Systems
    Current evaluation systems reward correct guesses but not uncertainty, encouraging confident falsehoods. OpenAI proposes new benchmarks that reward calibrated honesty, though implementation remains challenging.

    Complex Reasoning and Scale-Free Networks
    LLMs struggle with complex reasoning compared to the brain’s scale-free network, which features interconnected hubs that enable adaptability and self-organization.

    BDH (Dragon Hatchling) Architecture
    The new BDH architecture mimics this biological design, achieving GPT-2-level performance with greater efficiency. As part of Axiomic AI, it aims for models that scale predictably and stably.

    Emergent Attention and Interpretability
    In BDH, attention emerges naturally from local neuron interactions, producing interpretable, brain-like behaviour with sparse, composable structures that could power future modular AI systems.

    Más Menos
    58 m
  • Latest Artificial Intelligence Latest R&D Session - With Digitalent & Mike Nedelko - Episode (009)
    Jun 23 2025

    In this conversation, Mike discusses the latest developments in AI and machine learning, focusing on recent research papers that explore the reasoning capabilities of large language models (LLMs) and the implications of self-improving AI systems.

    The discussion includes a critical analysis of Apple's paper on LLM reasoning, comparisons between human and AI conceptual strategies, and insights into the Darwin-Girdle machine, a self-referential AI system that can modify its own code. Mike emphasizes the importance of understanding the limitations and capabilities of AI in various domains, particularly in high-stakes environments.

    Highlights:

    - Apple's paper claims that large language models (LLMs) struggle with reasoning.

    - The importance of understanding LLMs' reasoning capabilities.

    - Understanding controlled puzzles to evaluate LLM reasoning in isolation.
    Findings suggest that LLMs face fundamental scaling limitations in reasoning tasks.

    - Comparing human and LLM conceptual strategies using information theory.
    LLMs are statistically efficient but may lack functional richness compared to human cognition.

    - Exploring the distinction between factual knowledge and logical reasoning in AI. Self-improving AI systems, like the Darwin-Girdle machine, represent a significant advancement in AI technology.

    Más Menos
    1 h y 5 m
  • Latest Artificial Intelligence R&D Session - With Digitalent & Mike Nedelko - Episode 008
    Jun 3 2025

    Session Topics:

    The Llama 4 Controversy and Evaluation Mechanism Failure
    Llama 4’s initial high ELO score on LM Arena was driven by optimizations for human preferences—such as the use of emojis and overly positive tone. When these were removed, performance dropped significantly. This exposed weaknesses in existing evaluation mechanisms and raised concerns about benchmark reliability.

    Two Levels of AI Evaluation
    There are two main types of AI evaluation: model-level benchmarking for foundational models (e.g., Gemini, Claude), and use-case-specific evaluations for deployed AI systems—especially Retrieval Augmented Generation (RAG) systems.

    Benchmarking Foundational Models
    Benchmarks such as MMLU (world knowledge), MMU (multimodal understanding), GPQA (expert-level reasoning), ARC AGI (reasoning tasks), and newer ones like Code ELO and SWEBench (software engineering tasks) are commonly used to assess foundational model performance.

    Evaluating Conversational and Agentic LLMs
    The Multi-Challenge benchmark by Scale AI evaluates multi-turn conversational capabilities, while the Tow Benchmark assesses how well agentic LLMs perform tasks like interacting with and modifying databases.

    Use Case Specific Evaluation and RAG Systems
    Use-case-specific evaluation is critical for RAG systems that rely on organizational data to generate context. One example illustrated a car-booking agent returning a cheesecake recipe—underscoring the risks of unexpected model behaviour.

    Ragas Framework for Evaluating RAG Systems
    Ragas and DeepEval offer evaluation metrics such as context precision, response relevance, and faithfulness. These frameworks can compare model outputs against ground truth to assess both retrieval and generation components.

    The Leaderboard Illusion in Model Evaluation
    Leaderboards like LM Arena may present a distorted picture, as large organisations submit multiple hidden models to optimise final rankings—misleading users about true model performance.

    Using LLMs to Evaluate Other LLMs: Advantages and Risks
    LLMs can be used to evaluate other LLMs for scalability, but this introduces risks such as bias and false positives. Fourteen common design flaws have been identified in LLM-on-LLM evaluation systems.

    Circularity and LLM Narcissism in Evaluation
    Circularity arises when evaluator feedback influences the model being tested. LLM narcissism describes a model favouring outputs similar to its own, distorting evaluation outcomes.

    Label Correlation and Test Set Leaks
    Label correlation occurs when human and model evaluators agree on flawed outputs. Test set leaks happen when models have seen benchmark data during training, compromising result accuracy.

    The Need for Use Case Specific Model Evaluation
    General benchmarks alone are increasingly inadequate. Tailored, context-driven evaluations are essential to determine real-world suitability and performance of AI models.

    Más Menos
    1 h
  • Latest Artificial Intelligence R&D Session - With Digitalent & Mike Nedelko - Episode (007)
    Apr 29 2025

    Some of the main topics discussed.

    Google Gemini 2.5 Release
    Gemini 2.5 is now leading AI benchmarks with exceptional reasoning capabilities baked into its base training. Features include a 1M token context window, multimodality (handling text, images, video together), and independence from Nvidia chips, giving Google a strategic advantage.

    Alibaba’s Omnimodal Model ("Gwen")
    Alibaba released an open-source model that can hear, talk, and write simultaneously with low latency. It uses a "thinker and talker" architecture and blockwise encoding, making it promising for edge devices and real-time conversations.

    OpenAI’s 03 and 04 Mini Models
    OpenAI’s new models demonstrate strong tool usage (automatically using tools like Python or Web search during inference) and outperform previous models in multiple benchmarks. However, concerns were raised about differences between preview and production versions, including potential benchmark cheating.

    Model Context Protocol (MCP) and AI "App Store"
    MCP is becoming the dominant open standard to connect AI models to external applications and databases. It allows natural language-driven interactions between LLMs and business software. OpenAI and Google have endorsed MCP, making it a potential ecosystem-defining change.

    Security Concerns with MCP
    While MCP is powerful, early versions suffer from security vulnerabilities (e.g., privilege persistence, credential theft). New safety tools like MCP audits are being developed to address these concerns before it becomes enterprise-ready.

    Rise of Agentic AI and Industry 6.0
    The shift towards agentic AI (LLMs that chain tools and create novel ideas) could significantly reshape industries. A concept of "Industry 6.0" was discussed — fully autonomous manufacturing without human intervention, with early proof-of-concept already demonstrated.

    Impacts on Jobs and the Need for Upskilling
    With AI models becoming so capable, human roles will shift from doing the work to verifying and trusting AI outputs. Staying informed, experimenting with tools like MCP, and gaining AI literacy will be crucial for job security.

    Real-World AI Marketing and Legal Challenges
    Participants discussed real examples where AI (e.g., ChatGPT) generated inaccurate brand information. Legal implications around intellectual property and misinformation were also highlighted, including an anecdote about account banning due to copyright complaints.

    Vibe Coding and the Future of Development
    New AI-assisted coding platforms (like Google's Firebase Studio) allow "vibe coding," where developers can build applications with conversational prompts instead of traditional programming. This approach is making technical development much faster but still requires technical oversight.

    Más Menos
    1 h y 4 m
  • Latest Artificial Intelligence R&D Session - with Digitalent & Mike Nedelko - Episode (006)
    Feb 28 2025

    The sessions topics include:

    Reasoning Models: Mike highlights the rise of reasoning models dominating leaderboards, enabled by "inference time compute scaling." This allows models to allocate more computational power dynamically, leading to better accuracy and efficiency. These models use "chain of thought prompting," enhancing reasoning by generating intermediate steps, inspired by Daniel Kahneman's "System 2 thinking." He also discussed "Humanity's Last Exam," a challenging new benchmark designed to test advanced reasoning models.

    DeepSeek R1: Mike explored DeepSeek R1's innovations, including stable 8-bit floating point operations and multi-hat latent attention, which reduced memory usage and improved efficiency. The real breakthrough was its use of reinforcement learning with self-verifiable tasks, allowing the model to learn without traditional supervised data. This approach improved reasoning and generalisation.

    Reinforcement Learning and Generalisation: Mike emphasised a shift from supervised fine-tuning to reinforcement learning, enabling models to generalise intelligence rather than just memorise. This approach lowers training costs while enhancing reasoning abilities. He also discussed the growing trend of using reinforcement learning and self-play to make AI training more efficient and affordable.

    Más Menos
    1 h y 3 m