Free · Open · Current
Mosaic
The systems behind modern AI — from the C++ memory model up to MLA, FlashAttention-3, on-device inference, and a LAN of phones running 70B.
Seven tracks. Eighty-eight lessons. Every concept the field actually uses today, written like an engineer would explain it to another engineer. Real numbers, runnable code, nothing hand-wavy. Each lesson finishes in 15 minutes.
Three reading orders
Pick a thread to follow.
AI Systems
Attention, KV cache, paged attention, prefix caching, disaggregated serving, sampling, vLLM, observability. The full inference pipeline.
Reading order IIML Compilers
SM architecture, shared memory, TMA, GEMM, LLVM, MLIR, Triton, CUTLASS, ThunderKittens. From transistors to kernel DSLs.
Reading order IIIEdge AI
Quantization, llama.cpp, ExecuTorch, Core ML, Hexagon NPU, GGUF, distillation. Running models off the cloud.
The course map
88 lessons across 7 tracks. Pick any tile.
Every tile is one lesson. Hover or tap a tile to see what it teaches; completed lessons glow in their track's accent color.
Free, forever
Built openly on GitHub. No signup, no paywall, no email gate. Edit any lesson and send a PR.
Modular by design
Tracks → modules → 10–15 minute lessons. Finish each piece in one sitting. Progress saves locally.
Built for revision
Every lesson has a TL;DR pinned at the top. The cheatsheet aggregates them all for fast re-skimming.
Current to the field
Blackwell, DeepSeek-V3, vLLM v1, MLA, FP8. Re-validated quarterly. Each lesson stamps its last review date.
Open a lesson.
The first track teaches the memory model that everything else builds on. Or jump anywhere — the lessons stand alone.
Stack vs Heap →