Reward Models | Data Brew | Episode 40 Podcast Por  arte de portada

Reward Models | Data Brew | Episode 40

Reward Models | Data Brew | Episode 40

Escúchala gratis

Ver detalles del espectáculo

Acerca de esta escucha

In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).

Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.

Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/

adbl_web_global_use_to_activate_webcro805_stickypopup
Todavía no hay opiniones