Passes & Pipelines
Prereq: LLVM IR Tour. Passes operate on the IR.
If LLVM IR is the language of compilers, passes are the verbs. A pass is a function that reads or rewrites IR — mem2reg promotes stack slots into SSA registers, instcombine simplifies x+0 to x, inliner replaces a function call with the called function’s body. Modern compilers don’t optimize in one big sweep; they apply ~150 small passes in sequence, each unlocking the next.
This is the lesson where “the compiler optimized my code” stops being magic. The optimizer is literally a list of passes, run in a specific order, each one assuming the previous ones happened. Inlining unlocks constant folding, which unlocks dead-code elimination, which unlocks register allocation. Skip inlining and the rest doesn’t fire. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this particular thing didn’t get optimized.”
TL;DR
- An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
- Passes compose into a pipeline.
-O0is empty;-O2is ~150 passes;-O3adds aggressive vectorization. Each pass assumes the previous ones ran. - LLVM 14+ uses the New Pass Manager (NPM). Old
legacy::PassManageris deprecated; new code usesPassBuilder+ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong. - Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
- The single most useful flag for understanding a pipeline:
opt --print-after-all— dumps IR after every pass. Scary the first time; indispensable forever after.
Mental model
Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.
Two kinds of passes
Analysis passes compute information without changing IR:
| Analysis | What it computes |
|---|---|
| DominatorTree | Which blocks dominate which (used by SSA construction, GVN) |
| LoopAnalysis | The loop nest structure |
| AliasAnalysis | Whether two pointers may point to the same memory |
| ScalarEvolution | Closed-form expressions for loop induction variables |
| BranchProbabilityInfo | Likely-taken edges (for layout) |
Transformation passes rewrite IR using analyses:
| Transformation | What it does |
|---|---|
mem2reg / sroa | Promote allocas into SSA registers |
instcombine | Local algebraic simplification (e.g. x*1 → x, x+0 → x) |
simplifycfg | Merge basic blocks, eliminate trivial branches |
gvn | Global value numbering — eliminate redundant computations |
licm | Loop-invariant code motion (hoist out of loops) |
inliner | Inline function calls |
loop-vectorize | Turn loops into SIMD vector ops |
dce, adce | Dead code elimination |
The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.
How -O0 becomes -O2
Recall the -O0 add from the previous lesson:
define i32 @add(i32 %0, i32 %1) {
%3 = alloca i32
%4 = alloca i32
store i32 %0, ptr %3
store i32 %1, ptr %4
%5 = load i32, ptr %3
%6 = load i32, ptr %4
%7 = add nsw i32 %5, %6
ret i32 %7
}The optimizer turns this into:
define i32 @add(i32 %0, i32 %1) {
%3 = add nsw i32 %1, %0
ret i32 %3
}The exact pass sequence:
mem2regseesalloca i32whose only uses arestore/load, promotes both to registers. Now there are noallocas.instcombineeliminates deadstore/loadpairs (the value flows directly).simplifycfghas nothing to do here (no branches), but in larger functions it cleans up.adcesweeps any leftover dead instructions (none).
You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.
The New Pass Manager (NPM)
Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:
#include "llvm/Passes/PassBuilder.h"
PassBuilder PB;
ModuleAnalysisManager MAM;
FunctionAnalysisManager FAM;
LoopAnalysisManager LAM;
CGSCCAnalysisManager CGAM;
PB.registerModuleAnalyses(MAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.registerCGSCCAnalyses(CGAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.run(*module, MAM);Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.
The NPM is faster, has cleaner pass dependencies, and supports adaptors (run a function pass over every function in a module, etc.). Memorize the PassBuilder ↔ opt -passes= correspondence — every modern LLVM tutorial uses one or the other.
Writing your first transformation pass
Skeleton of an out-of-tree NPM pass:
#include "llvm/Pass.h"
#include "llvm/IR/PassManager.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
using namespace llvm;
struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> {
PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
bool changed = false;
for (auto &BB : F)
for (auto It = BB.begin(); It != BB.end(); ) {
auto *Inst = &*It++;
// Match: %x = mul i32 %a, 0 → %x = i32 0
if (auto *Mul = dyn_cast<BinaryOperator>(Inst);
Mul && Mul->getOpcode() == Instruction::Mul) {
if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1));
C && C->isZero()) {
Mul->replaceAllUsesWith(C);
Mul->eraseFromParent();
changed = true;
}
}
}
return changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}
};
extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() {
return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1",
[](PassBuilder &PB) {
PB.registerPipelineParsingCallback(
[](StringRef Name, FunctionPassManager &FPM, ...) {
if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; }
return false;
});
}};
}Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.
Why instcombine would have caught this anyway
instcombine already implements x*0 → 0. Real production passes look for transformations the standard pipeline misses — domain-specific opportunities (e.g., recognizing a particular LLM-kernel idiom and lowering it differently), or new hardware idioms.
The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.
Run it in your browser — toy pass pipeline
You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.
Quick check
Key takeaways
- A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
- Pipelines compose.
-O2is ~150 passes; each enables the next. - New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
opt -print-after-allis the universal debugging tool for any “why did/didn’t this optimize” question.- Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.
Go deeper
- DocsLLVM — Using the New Pass ManagerAuthoritative. Includes the migration guide from legacy passes.
- DocsLLVM — All PassesReference for every pass in the tree. Skim once to know what's on offer.
- DocsWriting an LLVM Pass (NPM)The official walkthrough for an out-of-tree NPM pass — what the module capstone is built around.
- BlogThe New Pass ManagerWhy NPM exists; the design tradeoffs vs legacy. Useful background.
- BlogLLVM Optimizations You Should KnowConcise tour of the most-impactful passes with worked examples.
- Repobanach-space/llvm-tutorA maintained, current (2024+) collection of out-of-tree NPM pass examples. The best place to copy from.
Prereq: LLVM IR Tour. Passes operate on the IR.
TL;DR
- An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
- Passes compose into a pipeline.
-O0is empty;-O2is ~150 passes;-O3adds aggressive vectorization. Each pass assumes the previous ones ran. - LLVM 14+ uses the New Pass Manager (NPM). Old
legacy::PassManageris deprecated; new code usesPassBuilder+ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong. - Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
- The single most useful flag for understanding a pipeline:
opt --print-after-all— dumps IR after every pass. Scary the first time; indispensable forever after.
Why this matters
Your code being fast or slow is largely a function of which passes ran on it and in what order. Inlining unlocks constant folding which unlocks dead-code elimination which unlocks register allocation. Skip inlining and the rest doesn’t fire. Compilers aren’t single-shot transformations — they’re pipelines of cooperating passes, each enabling the next. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this didn’t get optimized.”
Mental model
Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.
Concrete walkthrough
Two kinds of passes
Analysis passes compute information without changing IR:
| Analysis | What it computes |
|---|---|
| DominatorTree | Which blocks dominate which (used by SSA construction, GVN) |
| LoopAnalysis | The loop nest structure |
| AliasAnalysis | Whether two pointers may point to the same memory |
| ScalarEvolution | Closed-form expressions for loop induction variables |
| BranchProbabilityInfo | Likely-taken edges (for layout) |
Transformation passes rewrite IR using analyses:
| Transformation | What it does |
|---|---|
mem2reg / sroa | Promote allocas into SSA registers |
instcombine | Local algebraic simplification (e.g. x*1 → x, x+0 → x) |
simplifycfg | Merge basic blocks, eliminate trivial branches |
gvn | Global value numbering — eliminate redundant computations |
licm | Loop-invariant code motion (hoist out of loops) |
inliner | Inline function calls |
loop-vectorize | Turn loops into SIMD vector ops |
dce, adce | Dead code elimination |
The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.
How -O0 becomes -O2
Recall the -O0 add from the previous lesson:
define i32 @add(i32 %0, i32 %1) {
%3 = alloca i32
%4 = alloca i32
store i32 %0, ptr %3
store i32 %1, ptr %4
%5 = load i32, ptr %3
%6 = load i32, ptr %4
%7 = add nsw i32 %5, %6
ret i32 %7
}The optimizer turns this into:
define i32 @add(i32 %0, i32 %1) {
%3 = add nsw i32 %1, %0
ret i32 %3
}The exact pass sequence:
mem2regseesalloca i32whose only uses arestore/load, promotes both to SSA registers. Now there are noallocas.instcombineeliminates deadstore/loadpairs (the value flows directly).simplifycfghas nothing to do here (no branches), but in larger functions it cleans up.adcesweeps any leftover dead instructions (none).
You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.
The New Pass Manager (NPM)
Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:
#include "llvm/Passes/PassBuilder.h"
PassBuilder PB;
ModuleAnalysisManager MAM;
FunctionAnalysisManager FAM;
LoopAnalysisManager LAM;
CGSCCAnalysisManager CGAM;
PB.registerModuleAnalyses(MAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.registerCGSCCAnalyses(CGAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.run(*module, MAM);Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.
The NPM is faster, has cleaner pass dependencies, and supports adaptors (run a function pass over every function in a module, etc.). Memorize the PassBuilder ↔ opt -passes= correspondence — every modern LLVM tutorial uses one or the other.
Writing your first transformation pass
Skeleton of an out-of-tree NPM pass:
#include "llvm/Pass.h"
#include "llvm/IR/PassManager.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
using namespace llvm;
struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> {
PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
bool changed = false;
for (auto &BB : F)
for (auto It = BB.begin(); It != BB.end(); ) {
auto *Inst = &*It++;
// Match: %x = mul i32 %a, 0 → %x = i32 0
if (auto *Mul = dyn_cast<BinaryOperator>(Inst);
Mul && Mul->getOpcode() == Instruction::Mul) {
if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1));
C && C->isZero()) {
Mul->replaceAllUsesWith(C);
Mul->eraseFromParent();
changed = true;
}
}
}
return changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
}
};
extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() {
return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1",
[](PassBuilder &PB) {
PB.registerPipelineParsingCallback(
[](StringRef Name, FunctionPassManager &FPM, ...) {
if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; }
return false;
});
}};
}Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.
Why instcombine would have caught this anyway
instcombine already implements x*0 → 0. Real production passes look for transformations the standard pipeline misses — domain-specific opportunities (e.g., recognizing a particular LLM-kernel idiom and lowering it differently), or new hardware idioms.
The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.
Run it in your browser — toy pass pipeline
You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.
Quick check
Key takeaways
- A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
- Pipelines compose.
-O2is ~150 passes; each enables the next. - New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
opt -print-after-allis the universal debugging tool for any “why did/didn’t this optimize” question.- Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.
Go deeper
- DocsLLVM — Using the New Pass ManagerAuthoritative. Includes the migration guide from legacy passes.
- DocsLLVM — All PassesReference for every pass in the tree. Skim once to know what's on offer.
- DocsWriting an LLVM Pass (NPM)The official walkthrough for an out-of-tree NPM pass — what the module capstone is built around.
- BlogThe New Pass ManagerWhy NPM exists; the design tradeoffs vs legacy. Useful background.
- BlogLLVM Optimizations You Should KnowConcise tour of the most-impactful passes with worked examples.
- Repobanach-space/llvm-tutorA maintained, current (2024+) collection of out-of-tree NPM pass examples. The best place to copy from.