05Track 05 · ML Compilers & Hardware
From a high-level graph to optimized hardware.
Nobody hand-writes a CUDA kernel for every new model on every new GPU. Compilers generate the kernels — and compiler skill is what turns one engineer into a force multiplier for an entire hardware vendor or model team.
- — LLVM IR, passes, MLIR, dialects, lowering. The substrate.
- — torch.compile, JAX/Pallas, IREE/ExecuTorch, operator fusion. The compilers people actually run.
- — Triton, CUTLASS, ThunderKittens, the 2026 hardware landscape. Where you drop down when the compiler isn’t enough.
- Read MLIR dumps and understand what each dialect is doing
- Write a Triton kernel and a small MLIR pass
- Recognize when a workload calls for vs hand-written kernels
- Tell which hardware (Blackwell · MI355X · TPU v6 · Cerebras · Groq) is right for which workload