Zero Bubble Pipeline Parallelism Podcast Por  arte de portada

Zero Bubble Pipeline Parallelism

Zero Bubble Pipeline Parallelism

Escúchala gratis

Ver detalles del espectáculo
Core idea is think about backward pass into two flows, one to compute grad wrt to parameters, and one to compute grad wrt to output of last layer, schedule so that you are always working instead of waiting (bubble). Read full paper: https://arxiv.org/abs/2401.10241 Tags: Systems and Performance, Deep Learning, Machine Learning
Todavía no hay opiniones