Skip to content

Reading Orders

Mosaic isn’t meant to be read end-to-end (though you can). The lessons are designed to compose. Below are the three orders that form the most coherent arcs through the material — each is a thread you can pull and end up with a real working understanding of one corner of modern AI systems.

There is no required order. Lessons stand alone. Most learners follow a thread for a while, branch into a related one, come back. Use the course map to see the whole graph.


AI Systems

Attention, the KV cache, the inference pipeline, the serving stack — and the contributor-level depth on each that turns “I read about it” into “I’ve shipped a PR on it.”

Architecture

  1. Multi-Head Attention
  2. GQA, MQA & MLA
  3. RoPE + YaRN / LongRoPE
  4. FlashAttention-3
  5. FlashAttention-3 Internals

KV Cache 6. KV Cache Basics 7. PagedAttention 8. Prefix & RadixAttention 9. Disaggregated Serving

Inference-time 10. Sampling 11. Structured Output 12. Chunked Prefill 13. Speculative Decoding 14. Speculative Decoding Internals

Serving stack 15. vLLM & SGLang 16. vLLM Internals 17. SGLang Internals 18. Cost & Latency 19. Observability

Contributor track 20. OSS Contribution Playbook

The capstone in Inference Internals — landing a perf-cited PR in vLLM — is the artifact this thread is built around.


ML Compilers

GPUs from the silicon up; the bedrock-meets-tooling layer (roofline, Tensor Core shapes, NCU); the compilers and DSLs that target them.

Hardware mental model

  1. SM Architecture
  2. Thread Hierarchy
  3. Shared Memory
  4. GEMM (Hopper / Blackwell)
  5. Strides & Layout
  6. TMA & cp.async

Roofline & Profiling — the predict-then-verify discipline 7. Roofline as a Predictive Tool 8. Tensor Core SHAPE Constraints 9. Nsight Compute: The Metric Tree

Compiler theory 10. LLVM IR Tour 11. Passes & Pipelines 12. MLIR Overview 13. Dialects & Lowering

Production compilers + kernel DSLs 14. torch.compile 15. Inductor Fusion Heuristics 16. Operator Fusion 17. Triton 18. CuTe & CUTLASS 4 19. ThunderKittens & TileLang 20. JAX & Pallas 21. IREE & ExecuTorch

Distributed kernels + landscape 22. NCCL & AllReduce Internals 23. Hardware Landscape 2026

The capstone — a fused Triton kernel + roofline writeup — is the artifact. The build guide on that page walks you through it step by step (the same artifact Atlas Capstone 1 ships).


Edge AI

Quantization, on-device runtimes, NPUs, the browser, and swarm inference. The part of the field that runs without the cloud — including a LAN of phones running 70B together.

Quantization schemes + calibration

  1. FP8 Inference
  2. INT4 / AWQ / GPTQ
  3. MXFP4 / NVFP4
  4. Rotation Quantization
  5. Calibration & KV Cache Quantization
  6. On-Device Inference

On-Device runtimes (the four production paths)

  1. llama.cpp Internals
  2. ExecuTorch
  3. Core ML & ANE (intro)
  4. TFLite & LiteRT
  5. WebGPU & WebLLM

NPU programming

  1. Qualcomm Hexagon
  2. Apple Neural Engine — deep dive

Edge formats + small-model recipes

  1. GGUF & i-matrix
  2. Distillation for Edge
  3. Speculative Decoding

Multimodal + distributed at the edge

  1. Whisper.cpp & on-device speech
  2. Mobile VLMs
  3. EXO & Swarm Inference

The capstone arc here is one of the strongest in Mosaic: three runtimes side-by-side on iOS (On-Device Runtimes), a fully-offline voice assistant (Multimodal Edge), and a 4-device LAN running 70B (Distributed Edge) — three end-to-end Edge AI artifacts that compose.


A few notes on how to use this

  • Lessons stand alone. If a prerequisite is missing in your head, the lesson links to it at the top. Read the prereq, come back.
  • Capstones are optional but worth it. Each module has one. They’re sized for a focused weekend and produce a working artifact (a kernel, a benchmark, an app, a server). The build guide is on the module index page.
  • The cheatsheet is the speedrun. Every lesson’s TL;DR aggregates into /cheatsheet — useful for refreshing a concept without re-reading.
  • The map is the index. /map shows every lesson, completed and remaining, organized by track.

If a lesson is unclear or wrong, open an issue . Mosaic is built openly; corrections land within days.