Skip to content

Track 03 · LLM Architecture

Transformers from a systems lens.

A Transformer isn't just a math object — it's a memory allocator, a state machine, and a data pipeline. This track looks at LLMs through the lens of what data lives where and what computation happens at each step, which is what determines whether your model fits, whether it serves, and whether it's fast.

Modules in this track

What you’ll be able to do after

  • Read a Transformer implementation and trace where every byte of memory goes
  • Reason about why a “small” model can OOM at long contexts
  • Pick the right attention variant for a given memory / quality tradeoff