ThursdAI - The top AI news from the past week Podcast Por From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week arte de portada

ThursdAI - The top AI news from the past week

ThursdAI - The top AI news from the past week

De: From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
Escúchala gratis

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

sub.thursdai.newsAlex Volkov
Política y Gobierno
Episodios
  • 📆 ThursdAI - Aug 14 - A week with GPT5, OSS world models, VLMs in OSS, Tiny Gemma & more AI news
    Aug 15 2025
    Hey everyone, Alex here 👋Last week, I tried to test GPT-5 and got really surprisingly bad results, but it turns out, as you'll see below, it's partly because they had a bug in the router, and partly because ... well, the router itself! See below for an introduction, written by GPT-5, it's actually not bad?Last week was a whirlwind. We live‑streamed GPT‑5’s “birthday,” ran long, and then promptly spent the next seven days poking every corner of the new router‑driven universe.This week looked quieter on the surface, but it actually delivered a ton: two open‑source world models you can drive in real time, a lean vision‑language model built for edge devices, a 4B local search assistant that tops Perplexity Pro on SimpleQA, a base model “extraction” from GPT‑OSS that reverses alignment, fresh memory features landing across the big labs, and a practical prompting guide to unlock GPT‑5’s reasoning reliably.We also had Alan Dao join to talk about Jan‑v1 and what it takes to train a small model that consistently finds the right answers on the open web—locally.Not bad eh? Much better than last time 👏 Ok let's dive in, a lot to talk about in this "chill" AI week (show notes at the end as always) first open source, and then GPT-5 reactions and then... world models!00:00 Introduction and Welcome00:33 Host Introductions and Health Updates01:26 Recap of Last Week's AI News01:46 Discussion on GPT-5 and Prompt Techniques03:03 World Models and Genie 303:28 Interview with Alan Dow from Jan04:59 Open Source AI Releases06:55 Big Companies and APIs10:14 New Features and Tools14:09 Liquid Vision Language Model26:18 Focusing on the Task at Hand26:18 Reinforcement Learning and Reward Functions26:35 Offline AI and Privacy27:13 Web Retrieval and API Integration30:34 Breaking News: New AI Models30:41 Google's New Model: Gemma 333:53 Meta's Dino E3: Advancements in Computer Vision38:50 Open Source Model Updates45:56 Weights & Biases: New Features and Updates51:32 GPT-5: A Week in Review55:12 Community Outcry Over AI Model Changes56:06 OpenAI's Response to User Feedback56:38 Emotional Attachment to AI Models57:52 GPT-5's Performance in Coding and Writing59:55 Challenges with GPT-5's Custom Instructions01:01:45 New Prompting Techniques for GPT-501:04:10 Evaluating GPT-5's Reasoning Capabilities01:20:01 Open Source World Models and Video Generation01:27:54 Conclusion and Future ExpectationsOpen Source AIWe've had quite a lot of Open Source this week on the show, including a breaking news from the Gemma team!Liquid AI's drops LFM2-VL (X, blog, HF)Let's kick things off with our friends at Liquid AI who released LFM2-VL - their new vision-language models coming in at a tiny 440M and 1.6B parameters.Liquid folks continue to surprise with speedy, mobile device ready models, that run 2X faster vs top VLM peers. With a native 512x512 resolution (which breaks any image size into 512 smart tiles) and an OCRBench of 74, this tiny model beats SmolVLM2 while being half the size. We've chatted with Maxime from liquid about LFM2 back in july, and it's great to see they are making them multimodal as well with the same efficiency gains!Zhipu (z.ai) unleashes GLM-4.5V - 106B VLM (X, Hugging Face)In another "previous good model that now has eyes" release, the fine folks from Zhipu continued training thier recently released (and excelled) GLM 4.5-air with a vision encoder, resulting in probably one of the top vision models in the open source!It's an MoE with only 12B active parameters (106B total) and gets SOTA across 42 public vision-language benches + has a "thinking mode" that reasons about what it sees.Given that GLM-4.5Air is really a strong model, this is de fact the best visual intelligence in the open source, able to rebuild websites from a picture for example and identify statues and locations!Jan V1 - a tiny (4B) local search assistant QwenFinetune (X, Hugging Face)This one release got a lot of attention, as the folks at Menlo Research (Alan Dao who came to chat with us about Jan on the pod today) released an Apache 2 finetune of Qwen3-4B-thinking, that's focused on SimpleQA.They showed that their tiny model is beating Perplexity Pro on SimpleQA.Alan told us on the pod that Jan (the open source Jan app) is born to be an open source alternative to searching with local models!The trick is, you have to enable some source of search data (Exa, Serper, Tavily) via MCP and then enable tools in Jan, and then.. you have a tiny, completely local perplexity clone with a 4B model!Google drops Gemma 3 270M (blog)In some #breakingNews, Google open sourced a tiny (270M) parameters, "good at instruction following" Gemma variant. This joins models like SmolLM and LFM2 in the "smol models" arena, being only 300MB, you can run this.. on a toaster. This one apparently also fine-tunes very well while being very energy efficient!Big Companies (AKA OpenAI corner this past 2 weeks)Ok ok, we're finally here, a week with GPT-5! ...
    Más Menos
    1 h y 30 m
  • 📅 ThursdAI - GPT5 is here
    Aug 7 2025
    Hey folks 👋 Alex here, writing to you, from a makeshift recording studio in an Eastern European hookah bar, where I spent the last 7 hours. Why you ask? Well, when GPT-5 drops, the same week as OpenAI dropping the long awaited OSS models + Google is shipping perfect memory World Models (Genie 3) and tons of other AI drops, well I just couldn't stay away from the stream.Vacation or not, ThursdAI is keeping you up to date (for 32 months straight, which is also the time since the original GPT-4 release which gave this show its name!)So, what did we have today on the stream? Well, we started as usual, talking about the AI releases of the week, as if OpenAI dropping OSS models (apache 2) 120B and 20B is "usual". We then covered incredible releases like Google's World model Genie3 (more on this next week!) and Qwen-image + a few small Qwens.We then were VERY excited to tune in, and watch the (very long) announcement stream from OpenAI, in which they spent an hour to tell us about GPT-5.This was our longest stream by far (3.5 hours, 1hr was just OpenAI live stream) and I'm putting this here mostly unedited, but chapters are up so feel free to skip to the parts that are interesting to you the most.00:00 Introduction and Special Guests00:56 Twitter Space and Live Streaming Plans02:12 Open Source AI Models Overview03:44 Qwen and Other New AI Models08:59 Community Interaction and Comments10:01 Technical Deep Dive into AI Models25:06 OpenAI's New Releases and Benchmarks38:49 Expectations and Use Cases for AI Models40:03 Tool Use vs. Deep Knowledge in AI41:02 Evaluating GPT OSS and OpenAI Critique42:29 Historical and Medical Knowledge in AI51:16 Opus 4.1 and Coding Models55:38 Google's Genie 3: A New World Model01:00:43 Kitten TTS: A Lightweight Text-to-Speech Model01:02:07 11 Labs' Music Generation AI01:08:51 OpenAI's GPT-5 Launch Event01:24:33 Building a French Learning Web App01:26:22 Exploring the Web App Features01:29:19 Introducing Enhanced Voice Features01:30:02 Voice Model Demonstrations01:32:32 Personalizing Chat GPT01:33:23 Memory and Scheduling Features01:35:06 Safety and Training Enhancements01:39:17 Health Applications of GPT-501:45:07 Coding with GPT-501:46:57 Advanced Coding Capabilities01:52:59 Real-World Coding Demonstrations02:10:26 Enterprise Applications of GPT-502:11:49 Amgen's Use of GPT-5 in Drug Design02:12:09 BBVA's Financial Analysis with GPT-502:12:33 Healthcare Applications of GPT-502:12:52 Government Adoption of GPT-502:13:22 Pricing and Availability of GPT-502:13:51 Closing Remarks by Chief Scientist Yakob02:16:03 Live Reactions and Discussions02:16:41 Technical Demonstrations and Comparisons02:33:53 Healthcare and Scientific Advancements with GPT-502:47:09 Final Thoughts and Wrap-Up---My first reactions to GPT-5Look, I gotta keep it real with you, my first gut reaction was, hey, I'm on vacation, I don't have time to edit and write the newsletter (EU timezone) so let's see how ChatGPT-5 handles this task. After all, OpenAI has removed all other models from the dropdown, it's all GPT-5 now. (pricing from the incredible writeup by Simon Willison available here)And to tell you the truth, I was really disappointed! GPT seems to be incredible at coding benchmarks, with 400K tokens and incredible pricing (just $1.25/$10 compared to Opus $15/$75) this model, per the many friends who got to test it early, is a beast at coding! Readily beating opus on affordability per token, switching from thinking to less thinking when it needs to, it definitely seems like a great improvement for coding and agentic tasks.But for my, very much honed prompt of "hey, help me with ThursdAI drafts, here's previous drafts that I wrote myself, mimic my tone" it failed.. spectacularly!Here's just a funny example, after me replying that it did a bad job:It literally wrote "I'm Alex, I build the mind, not the vibe" 🤦‍♂️ What.. the actual...For comparison, here's o3, with the same prompt, with a fairly true to tone draft:High taste testers take on GPT-5But hey, I have tons of previous speakers in our group chats, and many of them who got early access (I didn't... OpenAI, I can be trusted lol) rave about this model. They are saying that this is a huge jump in intelligence.Folks like Dr Derya Unutmaz, who jumped on the live show and described how GPT5 does incredible things with less hallucinations, folks like Swyx from Latent.Space who had early access and even got invited to give first reactions to the OpenAI office, and Pietro Schirano who also showed up in an OpenAI video.So definitely, definitely check out their vibes, as we all try to wrap our heads around this new intelligence king we got!Other GPT5 updatesOpenAI definitely cooked, don't get me wrong, with this model plugging into everything else in their platform like memory, voice (that was upgraded and works in custom GPTs now, yay!), canvas and study mode, this will definitely be an upgrade for many folks using the models.They have now also opened ...
    Más Menos
    2 h y 56 m
  • 📆 ThursdAI – Jul 31, 2025 – Qwen’s Small Models Go Big, StepFun’s Multimodal Leap, GLM-4.5’s Chart Crimes, and Runway’s Mind‑Bending Video Edits + GPT-5 soon?
    Aug 1 2025
    This is a free preview of a paid episode. To hear more, visit sub.thursdai.newsWoohoo, we're almost done with July (my favorite month) and the Open Source AI decided to go out with some fireworks 🎉Hey everyone, Alex here, writing this without my own personal superintelligence (more: later) and this week has been VERY BUSY with many new open source releases.Just 1 hour before the show we already had 4 breaking news releases, a tiny Qwen3-coder, Cohere and StepFun both dropped multimodal SOTAs and our friends from Krea dropped a combined model with BFL called Flux[Krea] 👏 This is on top of a very very busy week, with Runway adding conversation to their video model Alpha, Zucks' superintelligence vision and a new SOTA open video model Wan 2.2. So let's dive straight into this (as always, all show notes and links are in the end) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open Source LLMs & VLMs Tons of new stuff here, I'll try to be brief but each one of these releases deserves a deeper dive for sure. Alibaba is on 🔥 with 3 new Qwen models this weekYes, this is very similar to last week, where they have also dropped 3 new SOTA models in a week, but, these are additional ones. It seems that someone in Alibaba figured out that after splitting away from the hybrid models, they can now release each model separately and get a lot of attention per model! Here's the timeline: * Friday (just after our show): Qwen3-235B-Thinking-2507 drops (235B total, 22B active, HF) * Tuesday: Qwen3-30B-Thinking-2507 (30B total, 3B active, HF)* Today: Qwen3-Coder-Flash-2507 lands (30B total, 3B active for coding, HF)Lets start with the SOTA reasoner, the 235B(A22B)-2507 is absolutely the best reasoner among the open source models.We've put the model on our inference service (at crazy prices $.10/$.10) and it's performing absolutely incredible on reasoning tasks. It also jumped to the top OSS model on Artificial Analysis scores, EQBench, Long Context and more evals. It a really really good reasoning model! Smaller Qwens for local useJust a week ago, we've asked Junyang on our show, about smaller models that folks can run on their devices, and he avoided by saying "we're focusing on the larger models" and this week, they delivered not 1 but 2 smaller versions of the bigger models (perfect for Speculative Decoding if you can host the larger ones that is) The most interesting one is the Qwen3-Coder-flash, which came out today, with very very impressive stats - and the ability to run locally with almost 80 tok/s on a macbook! So for the last two weeks, we now have 3 Qwens (Instruct, Thinking, Coder) and 2 sizes for each (all three have a 30B/A3B version now for local use) 👏Z.ai GLM and StepFun Step3 As we've said previously, Chinese companies completely dominate the open source AI field right now, and this week as saw yet another crazy testament to how stark the difference is! We've seen a rebranded Zhipu (Z.ai previously THUDM) release their new GLM 4.5 - which gives Qwen3-thinking a run for it's money. Not quite at that level, but definitely very close. I personally didn't love the release esthetics, showing a blended eval score, which nobody can replicate feels a bit off. We also talked about how StepFun has stepped in (sorry for the pun) with a new SOTA in multimodality, called Step3. It's a 321B MoE (with a huge 38B active param count) that achieves very significant multi modal scores (The benchmarks look incredible: 74% on MMMU, 64% on MathVision) Big Companies APIs & LLMsWell, we were definitely thinking we'll get GPT-5 or the Open Source AI model from OpenAI this week, but alas, the tea leaves readers were misled (or were being misleading). We 100% know that gpt-5 is coming as multiple screenshots were blurred and then deleted showing companies already testing it. But it looks like August is going to be even hotter than July, with multiple sightings of anonymous testing models on Web Dev arena, like Zenith, Summit, Lobster and a new mystery model on OpenRouter called Zenith - that some claim are the different thinking modes of GPT-5 and the open source model? Zuck shares vision for personalized superintelligence (Meta)In a very "Nat Fridman" like post, Mark Zuckerberg finally shared the vision behind his latest push to assemble the most cracked AI engineers.In his vision, Meta is the right place to provide each one with personalized superintelligence, enhancing individual abilities with user agency according to their own values. (as opposed to a centralized model, which feels like his shot across the bow for the other frontier labs) A few highlights: Zuck leans heavily into the rise of personal devices on top of which humans will interact with this superintelligence, including AR glasses and a departure from a complete "let's open source everything" dogman of the past, now there ...
    Más Menos
    1 h y 38 m
Todavía no hay opiniones