Training & RLHF · Mosaic

Modules in this track

Optimization — backprop, optimizers, LR schedules, FP8 training
Distributed Training — data / tensor / pipeline parallel, ZeRO + FSDP2
Post-Training & RLHF — SFT, LoRA/QLoRA, DPO, GRPO on reasoning

What you’ll be able to do after

Read a frontier-model tech report (DeepSeek-V3, Llama-3) and follow every word
Fine-tune a 7B model on your own domain with QLoRA in a Colab notebook
Reason about why a particular training stack picks a particular parallelism config
Implement DPO from scratch and explain when to use it over PPO