The Right Way to Do AI Evals (ft Freddie Vargus)
No se pudo agregar al carrito
Add to Cart failed.
Error al Agregar a Lista de Deseos.
Error al eliminar de la lista de deseos.
Error al añadir a tu biblioteca
Error al seguir el podcast
Error al dejar de seguir el podcast
-
Narrado por:
-
De:
Are your AI agents unreliable? In this guide, we reveal a professional system for AI evals to help you build and ship better AI products, faster. Learn how to systematically test LLM performance, evaluate complex tool use, and improve multi-turn conversations. We break down the exact process for building a high-quality eval dataset, using milestones and minefields to track agent behaviour, and how to properly use an LLM as a judge without compromising quality. Stop guessing and start making real, measurable improvements to your AI today.
Check out Quotient AI
https://www.quotientai.co/
Sign up for A.I. coaching for professionals at: https://www.anetic.co
Get FREE AI tools
pip install tool-use-ai
Connect with us https://x.com/ToolUseAI
https://x.com/MikeBirdTech
https://x.com/freddie_v4
00:00:00 - intro
00:02:54 - Why You Need AI Evals
00:06:13 - How to Evaluate AI Agent Tool Use
00:29:24 - The Process for Building Your First Eval Dataset
00:42:44 - Using an LLM as a Judge The Right Way
Subscribe for more insights on AI tools, productivity, and AI evals.
Tool Use is a weekly conversation with AI experts brought to you by Anetic.