“Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study” by Adam Karvonen
No se pudo agregar al carrito
Solo puedes tener X títulos en el carrito para realizar el pago.
Add to Cart failed.
Por favor prueba de nuevo más tarde
Error al Agregar a Lista de Deseos.
Por favor prueba de nuevo más tarde
Error al eliminar de la lista de deseos.
Por favor prueba de nuevo más tarde
Error al añadir a tu biblioteca
Por favor intenta de nuevo
Error al seguir el podcast
Intenta nuevamente
Error al dejar de seguir el podcast
Intenta nuevamente
-
Narrado por:
-
De:
Since the GPT-4 release, I've evaluated frontier models on a basic manufacturing task, which tests both visual perception and physical reasoning. While Gemini 2.5 Pro recently showed progress on the visual front, all models tested continue to fail significantly on physical reasoning. They still perform terribly overall. Because of this, I think that there will be an interim period where a significant [...]
---
Outline:
(01:28) The Evaluation
(02:29) Visual Errors
(04:03) Physical Reasoning Errors
(06:09) Why do LLM's struggle with physical tasks?
(07:37) Improving on physical tasks may be difficult
(10:14) Potential Implications of Uneven Automation
(11:48) Conclusion
(12:24) Appendix
(12:44) Visual Errors
(14:36) Physical Reasoning Errors
---
First published:
April 14th, 2025
Source:
https://www.lesswrong.com/posts/r3NeiHAEWyToers4F/frontier-ai-models-still-fail-at-basic-physical-tasks-a
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Todavía no hay opiniones