LLVM IR Tour
In a managed-runtime language, you write code, the runtime executes it, and somewhere in the middle a JIT or interpreter does the work — but you never see the boundary. In a compiled language, the boundary is a real artifact: a textual intermediate representation the compiler emits and the optimizer chews on. In the LLVM-based world (Clang, Rustc, Swift, every modern AI compiler) that artifact is .
LLVM IR is a small, RISC-like, strongly-typed virtual instruction set. It has the shape of assembly but the rigor of a typed language: every value is in form (defined exactly once), every integer carries its width, every memory access carries its type. It’s the language all the front-ends speak in common — and the language every modern AI compiler eventually lowers to. Read a .ll dump from clang -O0 -S -emit-llvm, then again from -O2, and you can see the optimizer at work.
This lesson is the reading-fluency version: enough IR to read what torch.compile, IREE, Triton, and any modern AI compiler are actually generating.
TL;DR
- LLVM IR is a strongly-typed, SSA, RISC-like virtual instruction set. The “language” between your source compiler and the actual machine. Once you can read it, you can read what every modern compiler is actually doing.
- It has three forms: the
.lltext format (human-readable), the.bcbitcode binary, and the in-memory C++ModuleAPI. They are all the same IR — round-trip lossless. - Three building blocks: Module → Function → BasicBlock → Instruction. SSA means each value is defined exactly once. PHI nodes pick up control-flow merges.
- The type system is small but rigid:
i1, i8, i32, i64, float, double, half, bfloat, plus pointers, vectors, structs, arrays. No bare ints — every integer carries its width. - Almost every AI compiler eventually lowers to LLVM IR. MLIR’s
llvmdialect is literally this IR. Triton emits LLVM IR. JAX/XLA emits LLVM IR through StableHLO. Knowing LLVM is the universal floor.
Mental model
Every box on the left and right is a different language. The middle is one IR, and it’s the only language all the boxes have in common. Learn it once, read every compiler.
Hello, IR
The smallest interesting program — adding two ints — in C and its LLVM IR:
// add.c
int add(int a, int b) { return a + b; }$ clang -O0 -S -emit-llvm add.cdefine i32 @add(i32 %0, i32 %1) {
%3 = alloca i32
%4 = alloca i32
store i32 %0, ptr %3
store i32 %1, ptr %4
%5 = load i32, ptr %3
%6 = load i32, ptr %4
%7 = add nsw i32 %5, %6
ret i32 %7
}Things to notice immediately:
define i32 @add(i32 %0, i32 %1)— function takes twoi32s, returns one.%3 = alloca i32— stack-allocate space; this is what-O0does for every parameter.storeandloadmove data into and out of memory.add nsw i32 %5, %6—nsw= “no signed wrap”; the front-end is asserting the C semantics that signed overflow is UB.ret i32 %7— return.
Now compile with -O2:
define i32 @add(i32 %0, i32 %1) {
%3 = add nsw i32 %1, %0
ret i32 %3
}The optimizer eliminated the allocas and store/loads — those were never doing useful work. This is what compiler passes (next lesson) do all day.
SSA: every value defined exactly once
LLVM IR is in form. Every register-like value (%3, %5, %7, ...) is the result of exactly one instruction. You never re-assign %5. If a variable changes across blocks, you write a PHI node at the merge point that picks the right version based on which block you came from.
define i32 @abs(i32 %x) {
entry:
%neg = icmp slt i32 %x, 0
br i1 %neg, label %negate, label %done
negate:
%nx = sub i32 0, %x
br label %done
done:
%result = phi i32 [ %x, %entry ], [ %nx, %negate ]
ret i32 %result
}The PHI says: “if you came from entry, take %x; if from negate, take %nx.” Reading PHI nodes is a 5-minute skill that unlocks reading any optimized IR.
The type system
LLVM types are explicit and narrow.
| Family | Examples |
|---|---|
| Integers | i1, i8, i16, i32, i64, i128 |
| Floats | half, bfloat, float, double, fp128 |
| Pointers | ptr (opaque since LLVM 16; before that, i32*, float*, etc.) |
| Vectors | <4 x float>, <8 x i16> (SIMD lanes) |
| Aggregates | [ 16 x i32 ] (array), { i32, float } (struct) |
| Function refs | i32 (i32, i32)* |
Pointers are now opaque in LLVM 16+. This was a big change: the type pointed-to is no longer encoded in the pointer; the loads and stores carry the type. So load i32, ptr %p instead of load i32, i32* %p.
Reading a real ML compiler dump
Every modern AI compiler emits LLVM IR at the bottom. Triton’s:
$ python -c "import triton; ..." 2>/dev/null # generates a .ll
$ cat /tmp/triton-*/your_kernel.llOr for torch.compile’s lowering chain, you can dump TORCH_LOGS=output_code python script.py to see the C++ wrapper, and the underlying CUDA kernels emit PTX (which is itself an LLVM-IR-shaped target language). Every step is a transformation of basically what you’ve seen above.
The single most useful debugging move when a generated kernel is wrong: dump the LLVM IR at every pass boundary (opt --print-after-all), find where it stopped looking like what you expected.
opt — your IR Swiss army knife
The opt tool runs passes on .ll files. You’ll use it constantly:
# Run all -O2 passes
opt -O2 add.ll -S -o add.opt.ll
# Run just one pass
opt -passes='instcombine' add.ll -S -o add.combined.ll
# Print IR after every pass — slow but reveals everything
opt -O2 -print-after-all add.ll 2>&1 | less
# Verify IR is well-formed (catches malformed handwritten IR)
opt -passes='verify' add.ll -o /dev/nullThe next lesson is exactly about which passes exist, how they compose, and how to write your own.
Run it in your browser — IR mini-interpreter
We can’t run actual clang in Pyodide, but we can build a toy SSA interpreter that demonstrates the model.
The phi-node selection logic is the heart of SSA. Once you can write that interpreter, real LLVM IR reads like English.
Quick check
Key takeaways
- LLVM IR is the universal compiler floor. Read it once; read every modern compiler.
- Module → Function → BasicBlock → Instruction. SSA, with PHI at merge points.
- Strict typing — no bare ints. Pointers are opaque since LLVM 16.
-O0is verbose;-O2is what you actually study. The optimizer-removed instructions are the lesson.- Every AI compiler eventually emits LLVM IR. Knowing this language is non-optional for serious compiler work.
Go deeper
- DocsLLVM Language Reference ManualAuthoritative. Sections 4 (high-level structure) and 9 (instruction reference) are the canonical reference. Bookmark this.
- DocsLLVM Tutorial — Implementing KaleidoscopeThe classic introduction. Builds a small language end-to-end through the LLVM C++ API.
- BlogA LLVM IR API tutorialPractical walkthrough of generating IR programmatically (rather than via clang).
- BlogLLVM for Grad StudentsThe clearest non-tutorial overview of why LLVM IR looks the way it does.
- VideoLLVM in 100 SecondsFastest possible motivation for the IR. Watch this before reading anything else.
- Repollvm/llvm-projectThe source. `llvm/lib/IR/` is where the C++ class hierarchy for the IR lives.
TL;DR
- LLVM IR is a strongly-typed, SSA, RISC-like virtual instruction set. The “language” between your source compiler and the actual machine. Once you can read it, you can read what every modern compiler is actually doing.
- It has three forms: the
.lltext format (human-readable), the.bcbitcode binary, and the in-memory C++ModuleAPI. They are all the same IR — round-trip lossless. - Three building blocks: Module → Function → BasicBlock → Instruction. SSA means each value is defined exactly once. PHI nodes pick up control-flow merges.
- The type system is small but rigid:
i1, i8, i32, i64, float, double, half, bfloat, plus pointers, vectors, structs, arrays. No bare ints — every integer carries its width. - Almost every AI compiler eventually lowers to LLVM IR. MLIR’s
llvmdialect is literally this IR. Triton emits LLVM IR. JAX/XLA emits LLVM IR through StableHLO. Knowing LLVM is the universal floor.
Why this matters
If you want to read what torch.compile, IREE, Triton, or any modern AI compiler actually generates — you read LLVM IR. Compiler vendors publish papers in terms of “passes on LLVM IR.” Optimization tools (opt, llc) speak it natively. When a kernel is slow, the path from “I think it’s slow” to “I see why it’s slow” goes through clang -S -emit-llvm or mlir-opt --print-ir-before-all. Knowing the language is non-optional for compiler work.
Mental model
Every box on the left and right is a different language. The middle is one IR, and it’s the only language all the boxes have in common. Learn it once, read every compiler.
Concrete walkthrough
Hello, IR
The smallest interesting program — adding two ints — in C and its LLVM IR:
// add.c
int add(int a, int b) { return a + b; }$ clang -O0 -S -emit-llvm add.cdefine i32 @add(i32 %0, i32 %1) {
%3 = alloca i32
%4 = alloca i32
store i32 %0, ptr %3
store i32 %1, ptr %4
%5 = load i32, ptr %3
%6 = load i32, ptr %4
%7 = add nsw i32 %5, %6
ret i32 %7
}Things to notice immediately:
define i32 @add(i32 %0, i32 %1)— function takes twoi32s, returns one.%3 = alloca i32— stack-allocate space; this is what-O0does for every parameter.storeandloadmove data into and out of memory.add nsw i32 %5, %6—nsw= “no signed wrap”; the front-end is asserting the C semantics that signed overflow is UB.ret i32 %7— return.
Now compile with -O2:
define i32 @add(i32 %0, i32 %1) {
%3 = add nsw i32 %1, %0
ret i32 %3
}The optimizer eliminated the allocas and store/loads — those were never doing useful work. This is what compiler passes (next lesson) do all day.
SSA: every value defined exactly once
LLVM IR is in Static Single Assignment form. Every register-like value (%3, %5, %7, ...) is the result of exactly one instruction. You never re-assign %5. If a variable changes across blocks, you write a PHI node at the merge point that picks the right version based on which block you came from.
define i32 @abs(i32 %x) {
entry:
%neg = icmp slt i32 %x, 0
br i1 %neg, label %negate, label %done
negate:
%nx = sub i32 0, %x
br label %done
done:
%result = phi i32 [ %x, %entry ], [ %nx, %negate ]
ret i32 %result
}The PHI says: “if you came from entry, take %x; if from negate, take %nx.” Reading PHI nodes is a 5-minute skill that unlocks reading any optimized IR.
The type system
LLVM types are explicit and narrow.
| Family | Examples |
|---|---|
| Integers | i1, i8, i16, i32, i64, i128 |
| Floats | half, bfloat, float, double, fp128 |
| Pointers | ptr (opaque since LLVM 16; before that, i32*, float*, etc.) |
| Vectors | <4 x float>, <8 x i16> (SIMD lanes) |
| Aggregates | [ 16 x i32 ] (array), { i32, float } (struct) |
| Function refs | i32 (i32, i32)* |
Pointers are now opaque in LLVM 16+. This was a big change: the type pointed-to is no longer encoded in the pointer; the loads and stores carry the type. So load i32, ptr %p instead of load i32, i32* %p.
Reading a real ML compiler dump
Every modern AI compiler emits LLVM IR at the bottom. Triton’s:
$ python -c "import triton; ..." 2>/dev/null # generates a .ll
$ cat /tmp/triton-*/your_kernel.llOr for torch.compile’s lowering chain, you can dump TORCH_LOGS=output_code python script.py to see the C++ wrapper, and the underlying CUDA kernels emit PTX (which is itself an LLVM-IR-shaped target language). Every step is a transformation of basically what you’ve seen above.
The single most useful debugging move when a generated kernel is wrong: dump the LLVM IR at every pass boundary (opt --print-after-all), find where it stopped looking like what you expected.
opt — your IR Swiss army knife
The opt tool runs passes on .ll files. You’ll use it constantly:
# Run all -O2 passes
opt -O2 add.ll -S -o add.opt.ll
# Run just one pass
opt -passes='instcombine' add.ll -S -o add.combined.ll
# Print IR after every pass — slow but reveals everything
opt -O2 -print-after-all add.ll 2>&1 | less
# Verify IR is well-formed (catches malformed handwritten IR)
opt -passes='verify' add.ll -o /dev/nullThe next lesson is exactly about which passes exist, how they compose, and how to write your own.
Run it in your browser — IR mini-interpreter
We can’t run actual clang in Pyodide, but we can build a toy SSA interpreter that demonstrates the model.
The phi-node selection logic is the heart of SSA. Once you can write that interpreter, real LLVM IR reads like English.
Quick check
Key takeaways
- LLVM IR is the universal compiler floor. Read it once; read every modern compiler.
- Module → Function → BasicBlock → Instruction. SSA, with PHI at merge points.
- Strict typing — no bare ints. Pointers are opaque since LLVM 16.
-O0is verbose;-O2is what you actually study. The optimizer-removed instructions are the lesson.- Every AI compiler eventually emits LLVM IR. Knowing this language is non-optional for serious compiler work.
Go deeper
- DocsLLVM Language Reference ManualAuthoritative. Sections 4 (high-level structure) and 9 (instruction reference) are the canonical reference. Bookmark this.
- DocsLLVM Tutorial — Implementing KaleidoscopeThe classic introduction. Builds a small language end-to-end through the LLVM C++ API.
- BlogA LLVM IR API tutorialPractical walkthrough of generating IR programmatically (rather than via clang).
- BlogLLVM for Grad StudentsThe clearest non-tutorial overview of why LLVM IR looks the way it does.
- VideoLLVM in 100 SecondsFastest possible motivation for the IR. Watch this before reading anything else.
- Repollvm/llvm-projectThe source. `llvm/lib/IR/` is where the C++ class hierarchy for the IR lives.