Reading Orders
Mosaic isn’t meant to be read end-to-end (though you can). The lessons are designed to compose. Below are the three orders that form the most coherent arcs through the material — each is a thread you can pull and end up with a real working understanding of one corner of modern AI systems.
There is no required order. Lessons stand alone. Most learners follow a thread for a while, branch into a related one, come back. Use the course map to see the whole graph.
I. AI Systems
Attention, the KV cache, the inference pipeline, the serving stack.
- Multi-Head Attention
- GQA, MQA & MLA
- RoPE + YaRN / LongRoPE
- FlashAttention-3
- KV Cache Basics
- PagedAttention
- Prefix & RadixAttention
- Disaggregated Serving
- Sampling
- Structured Output
- Chunked Prefill
- Speculative Decoding
- vLLM & SGLang
- Cost & Latency
- Observability
The capstone in Inference-Time Architecture — a 200-line continuous-batching server — is the artifact this thread is built around.
II. ML Compilers & Kernels
GPUs from the silicon up; the compilers and DSLs that target them.
- SM Architecture
- Thread Hierarchy
- Shared Memory
- GEMM (Hopper / Blackwell)
- Strides & Layout
- TMA & cp.async
- LLVM IR Tour
- Passes & Pipelines
- MLIR Overview
- Dialects & Lowering
- torch.compile
- Triton
- CuTe & CUTLASS 4
- ThunderKittens & TileLang
- Operator Fusion
- JAX & Pallas
- IREE & ExecuTorch
- Hardware Landscape 2026
The capstone — a Triton kernel that beats cuBLAS at small-N GEMM — is the artifact. The build guide on that page walks you through it step by step.
III. Edge AI
Quantization, on-device runtimes, NPUs, the browser, and swarm inference. The part of the field that runs without the cloud — including a LAN of phones running 70B together.
Foundations
On-Device runtimes (the four production paths)
NPU programming
Edge formats + small-model recipes
Multimodal + distributed at the edge
The capstone arc here is one of the strongest in Mosaic: three runtimes side-by-side on iOS (On-Device Runtimes), a fully-offline voice assistant (Multimodal Edge), and a 4-device LAN running 70B (Distributed Edge) — three end-to-end Edge AI artifacts that compose.
A few notes on how to use this
- Lessons stand alone. If a prerequisite is missing in your head, the lesson links to it at the top. Read the prereq, come back.
- Capstones are optional but worth it. Each module has one. They’re sized for a focused weekend and produce a working artifact (a kernel, a benchmark, an app, a server). The build guide is on the module index page.
- The cheatsheet is the speedrun. Every lesson’s TL;DR aggregates into /cheatsheet — useful for refreshing a concept without re-reading.
- The map is the index. /map shows every lesson, completed and remaining, organized by track.
If a lesson is unclear or wrong, open an issue . Mosaic is built openly; corrections land within days.