Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models Podcast Por  arte de portada

Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Arxiv paper - Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Escúchala gratis

Ver detalles del espectáculo

Acerca de esta escucha

In this episode, we discuss Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models by Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay. The paper introduces Vibe-Eval, an open benchmark and framework with 269 visual understanding prompts designed to evaluate multimodal chat models on everyday and challenging tasks. It highlights that over half of the hardest prompts are incorrectly answered by current frontier models, emphasizing the benchmark's difficulty. The authors discuss evaluation methods, demonstrate correlation between automatic and human assessments, provide free API access, and release all code and data publicly. Github: https://github.com/reka-ai/reka-vibe-eval

Todavía no hay opiniones