Free · Open · Current
Mosaic
The systems behind modern AI — from the C++ memory model up to MLA, FlashAttention-3, on-device inference, and a LAN of phones running 70B.
7 tracks. 117 lessons. Every concept the field actually uses today, written like an engineer would explain it to another engineer. Real numbers, runnable code, nothing hand-wavy. Each lesson finishes in 15 minutes.
Three reading orders
Pick a thread to follow.
AI Systems
Attention, FlashAttention-3 internals, KV cache, PagedAttention, prefix caching, disaggregated serving, vLLM/SGLang internals, speculative decoding kernels. The full inference pipeline at contributor depth.
Reading order IIML Compilers
SM architecture, GEMM, roofline as a predictive tool, Tensor Core shape constraints, NCU profiling, LLVM, MLIR, Triton, CUTLASS, ThunderKittens, Inductor fusion. From transistors to kernel DSLs.
Reading order IIIEdge AI
Quantization schemes, calibration methodology, KV cache quantization, llama.cpp, ExecuTorch, Core ML, Hexagon NPU, GGUF, distillation. Running models off the cloud.
The course map
117 lessons across 7 tracks. Pick any tile.
Every tile is one lesson. Hover or tap a tile to see what it teaches; completed lessons glow in their track's accent color.
Free, forever
Built openly on GitHub. No signup, no paywall, no email gate. Edit any lesson and send a PR.
Modular by design
Tracks → modules → 10–15 minute lessons. Finish each piece in one sitting. Progress saves locally.
Built for revision
Every lesson has a TL;DR pinned at the top. The cheatsheet aggregates them all for fast re-skimming.
Current to the field
Blackwell, DeepSeek-V3, vLLM v1, MLA, FP8. Re-validated quarterly. Each lesson stamps its last review date.
Open a lesson.
The first track teaches the memory model that everything else builds on. Or jump anywhere — the lessons stand alone.
Stack vs Heap →