RL Systems & Infrastructure
A GRPO update fits on a postcard. Running that update over millions of rollouts on 64 GPUs is what makes RL infrastructure engineers six-figure scarce. This module is the engineering you’d actually do at Anthropic’s RL Engineering team, OpenAI post-training, or DeepSeek: rollout engines, distributed orchestration, the production frameworks, and the subtle off-policy bugs that bite at scale.