Free · Open · Current

Mosaic

The systems behind modern AI — from the C++ memory model up to MLA, FlashAttention-3, on-device inference, and a LAN of phones running 70B.

Seven tracks. Eighty-eight lessons. Every concept the field actually uses today, written like an engineer would explain it to another engineer. Real numbers, runnable code, nothing hand-wavy. Each lesson finishes in 15 minutes.

Reading orders Or the full map

Seven tracks · Twenty-six modules · Eighty-eight lessons

Three reading orders

Pick a thread to follow.

Reading order I

AI Systems

Attention, KV cache, paged attention, prefix caching, disaggregated serving, sampling, vLLM, observability. The full inference pipeline.

Reading order II

ML Compilers

SM architecture, shared memory, TMA, GEMM, LLVM, MLIR, Triton, CUTLASS, ThunderKittens. From transistors to kernel DSLs.

Reading order III

Edge AI

Quantization, llama.cpp, ExecuTorch, Core ML, Hexagon NPU, GGUF, distillation. Running models off the cloud.

The course map

88 lessons across 7 tracks. Pick any tile.

Every tile is one lesson. Hover or tap a tile to see what it teaches; completed lessons glow in their track's accent color.

01Systems Foundations0/11 02ML Execution & Quantization0/12 03Training & RLHF0/12 04LLM Architecture0/12 05ML Compilers & Hardware0/12 06Applied AI · Build & Ship0/16 07Edge AI · On-Device0/13

Free, forever

Built openly on GitHub. No signup, no paywall, no email gate. Edit any lesson and send a PR.

Modular by design

Tracks → modules → 10–15 minute lessons. Finish each piece in one sitting. Progress saves locally.

Built for revision

Every lesson has a TL;DR pinned at the top. The cheatsheet aggregates them all for fast re-skimming.

Current to the field

Blackwell, DeepSeek-V3, vLLM v1, MLA, FP8. Re-validated quarterly. Each lesson stamps its last review date.

Open a lesson.

The first track teaches the memory model that everything else builds on. Or jump anywhere — the lessons stand alone.

Stack vs Heap →