180: Reinforcement Learning Podcast Por  arte de portada

180: Reinforcement Learning

180: Reinforcement Learning

Escúchala gratis

Ver detalles del espectáculo

Acerca de esta escucha

Intro topic: Grills

News/Links:

  • You can’t call yourself a senior until you’ve worked on a legacy project
    • https://www.infobip.com/developers/blog/seniors-working-on-a-legacy-project
  • Recraft might be the most powerful AI image platform I’ve ever used — here’s why
    • https://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-why
  • NASA has a list of 10 rules for software development
    • https://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htm
  • AMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GRE
    • https://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre

Book of the Show

  • Patrick:
    • The Player of Games (Ian M Banks)
      • https://a.co/d/1ZpUhGl (non-affiliate)
  • Jason:
    • Basic Roleplaying Universal Game Engine
      • https://amzn.to/3ES4p5i


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick:
    • Pokemon Sword and Shield
  • Jason:
    • Features and Labels ( https://fal.ai )

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF & GRPO

★ Support this podcast on Patreon ★
adbl_web_global_use_to_activate_T1_webcro805_stickypopup
Todavía no hay opiniones