Skip to content

Distributed Training

A frontier training run is a 4D or 5D parallelism mesh: data × tensor × pipeline × sequence × expert. This module walks through each axis, shows what they cost in communication and memory, and ends with how production frameworks (TorchTitan, Megatron-Core) compose them.