• Reward Models | Data Brew | Episode 40

  • Mar 20 2025
  • Duración: 40 m
  • Podcast

Reward Models | Data Brew | Episode 40

  • Resumen

  • In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).

    Highlights include:
    - How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
    - Techniques like Policy Proximal Optimization (PPO) and Direct Preference
    Optimization (DPO) for enhancing response quality.
    - The role of reward models in improving coding, math, reasoning, and other NLP tasks.

    Connect with Brandon Cui:
    https://www.linkedin.com/in/bcui19/

    Más Menos
adbl_web_global_use_to_activate_webcro768_stickypopup

Lo que los oyentes dicen sobre Reward Models | Data Brew | Episode 40

Calificaciones medias de los clientes

Reseñas - Selecciona las pestañas a continuación para cambiar el origen de las reseñas.