Free · Open · Current

Mosaic

The systems behind modern AI — from the C++ memory model up to MLA, FlashAttention-3, on-device inference, and a LAN of phones running 70B.

7 tracks. 117 lessons. Every concept the field actually uses today, written like an engineer would explain it to another engineer. Real numbers, runnable code, nothing hand-wavy. Each lesson finishes in 15 minutes.

Reading orders Or the full map

7 tracks · 32 modules · 117 lessons

Three reading orders

Pick a thread to follow.

Reading order I

AI Systems

Attention, FlashAttention-3 internals, KV cache, PagedAttention, prefix caching, disaggregated serving, vLLM/SGLang internals, speculative decoding kernels. The full inference pipeline at contributor depth.

Reading order II

ML Compilers

SM architecture, GEMM, roofline as a predictive tool, Tensor Core shape constraints, NCU profiling, LLVM, MLIR, Triton, CUTLASS, ThunderKittens, Inductor fusion. From transistors to kernel DSLs.

Reading order III

Edge AI

Quantization schemes, calibration methodology, KV cache quantization, llama.cpp, ExecuTorch, Core ML, Hexagon NPU, GGUF, distillation. Running models off the cloud.

The course map

117 lessons across 7 tracks. Pick any tile.

Every tile is one lesson. Hover or tap a tile to see what it teaches; completed lessons glow in their track's accent color.

01Systems Foundations0/11 02ML Execution & Quantization0/17 03Training & RLHF0/31 04LLM Architecture0/12 05ML Compilers & Hardware0/12 06Applied AI · Build & Ship0/21 07Edge AI · On-Device0/13

Free, forever

Built openly on GitHub. No signup, no paywall, no email gate. Edit any lesson and send a PR.

Modular by design

Tracks → modules → 10–15 minute lessons. Finish each piece in one sitting. Progress saves locally.

Built for revision

Every lesson has a TL;DR pinned at the top. The cheatsheet aggregates them all for fast re-skimming.

Current to the field

Blackwell, DeepSeek-V3, vLLM v1, MLA, FP8. Re-validated quarterly. Each lesson stamps its last review date.

Open a lesson.

The first track teaches the memory model that everything else builds on. Or jump anywhere — the lessons stand alone.

Stack vs Heap →