180: Reinforcement Learning

No se pudo agregar al carrito

Solo puedes tener X títulos en el carrito para realizar el pago.

Por favor prueba de nuevo más tarde

Por favor prueba de nuevo más tarde

Por favor prueba de nuevo más tarde

Por favor intenta de nuevo

Intenta nuevamente

Intenta nuevamente

Escúchala gratis

Ver detalles del espectáculo

Intro topic: Grills

News/Links:

You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre

Book of the Show

Patrick:
- The Player of Games (Ian M Banks)
  - https://a.co/d/1ZpUhGl (non-affiliate)
Jason:
- Basic Roleplaying Universal Game Engine
  - https://amzn.to/3ES4p5i

Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h

Tool of the Show

Topic: Reinforcement Learning

Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
Online vs Offline RL
Optimization algorithms
- Value optimization
  - SARSA
  - Q-Learning
- Policy optimization
  - Policy Gradients
  - Actor-Critic
  - Proximal Policy Optimization
Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
Policy Evaluation
- Propensity scoring versus model-based
Challenges to training RL model
- Two optimization loops
  - Collecting feedback vs updating the model
- Difficult optimization target
  - Policy evaluation
RLHF & GRPO

★ Support this podcast on Patreon ★

Todavía no hay opiniones