
180: Reinforcement Learning
No se pudo agregar al carrito
Solo puedes tener X títulos en el carrito para realizar el pago.
Add to Cart failed.
Por favor prueba de nuevo más tarde
Error al Agregar a Lista de Deseos.
Por favor prueba de nuevo más tarde
Error al eliminar de la lista de deseos.
Por favor prueba de nuevo más tarde
Error al añadir a tu biblioteca
Por favor intenta de nuevo
Error al seguir el podcast
Intenta nuevamente
Error al dejar de seguir el podcast
Intenta nuevamente
-
Narrado por:
-
De:
Acerca de esta escucha
Intro topic: Grills
News/Links:
- You can’t call yourself a senior until you’ve worked on a legacy project
- https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
- Recraft might be the most powerful AI image platform I’ve ever used — here’s why
- https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
- NASA has a list of 10 rules for software development
- https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
- AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
- https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre
Book of the Show
- Patrick:
- The Player of Games (Ian M Banks)
- https://a.co/d/1ZpUhGl (non-affiliate)
- The Player of Games (Ian M Banks)
- Jason:
- Basic Roleplaying Universal Game Engine
- https://amzn.to/3ES4p5i
- https://amzn.to/3ES4p5i
- Basic Roleplaying Universal Game Engine
Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h
Tool of the Show
- Patrick:
- Pokemon Sword and Shield
- Jason:
- Features and Labels ( https://fal.ai )
Topic: Reinforcement Learning
- Three types of AI
- Supervised Learning
- Unsupervised Learning
- Reinforcement Learning
- Online vs Offline RL
- Optimization algorithms
- Value optimization
- SARSA
- Q-Learning
- Policy optimization
- Policy Gradients
- Actor-Critic
- Proximal Policy Optimization
- Value optimization
- Value vs Policy Optimization
- Value optimization is more intuitive (Value loss)
- Policy optimization is less intuitive at first (policy gradients)
- Converting values to policies in deep learning is difficult
- Imitation Learning
- Supervised policy learning
- Often used to bootstrap reinforcement learning
- Policy Evaluation
- Propensity scoring versus model-based
- Challenges to training RL model
- Two optimization loops
- Collecting feedback vs updating the model
- Difficult optimization target
- Policy evaluation
- Two optimization loops
- RLHF & GRPO
adbl_web_global_use_to_activate_T1_webcro805_stickypopup
Todavía no hay opiniones