Skip to content

Track 05 · ML Compilers & Hardware

From a high-level graph to optimized hardware.

Nobody hand-writes a CUDA kernel for every new model on every new GPU. Compilers generate the kernels — and compiler skill is what turns one engineer into a force multiplier for an entire hardware vendor or model team.

Modules in this track

  • Foundation — LLVM IR, passes, MLIR, dialects, lowering. The substrate.
  • Production — torch.compile, JAX/Pallas, IREE/ExecuTorch, operator fusion. The compilers people actually run.
  • Kernels & Hardware — Triton, CUTLASS, ThunderKittens, the 2026 hardware landscape. Where you drop down when the compiler isn’t enough.

What you’ll be able to do after

  • Read MLIR dumps and understand what each dialect is doing
  • Write a Triton kernel and a small MLIR pass
  • Recognize when a workload calls for torch.compile vs hand-written kernels
  • Tell which hardware (Blackwell · MI355X · TPU v6 · Cerebras · Groq) is right for which workload