Skip to content

Passes & Pipelines

Prereq: LLVM IR Tour. Passes operate on the IR.

If LLVM IR is the language of compilers, passes are the verbs. A pass is a function that reads or rewrites IR — mem2reg promotes stack slots into SSA registers, instcombine simplifies x+0 to x, inliner replaces a function call with the called function’s body. Modern compilers don’t optimize in one big sweep; they apply ~150 small passes in sequence, each unlocking the next.

This is the lesson where “the compiler optimized my code” stops being magic. The optimizer is literally a list of passes, run in a specific order, each one assuming the previous ones happened. Inlining unlocks constant folding, which unlocks dead-code elimination, which unlocks register allocation. Skip inlining and the rest doesn’t fire. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this particular thing didn’t get optimized.”

TL;DR

  • An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
  • Passes compose into a pipeline. -O0 is empty; -O2 is ~150 passes; -O3 adds aggressive vectorization. Each pass assumes the previous ones ran.
  • LLVM 14+ uses the New Pass Manager (NPM). Old legacy::PassManager is deprecated; new code uses PassBuilder + ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong.
  • Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
  • The single most useful flag for understanding a pipeline: opt --print-after-all — dumps IR after every pass. Scary the first time; indispensable forever after.

Mental model

Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.

Two kinds of passes

Analysis passes compute information without changing IR:

AnalysisWhat it computes
DominatorTreeWhich blocks dominate which (used by SSA construction, GVN)
LoopAnalysisThe loop nest structure
AliasAnalysisWhether two pointers may point to the same memory
ScalarEvolutionClosed-form expressions for loop induction variables
BranchProbabilityInfoLikely-taken edges (for layout)

Transformation passes rewrite IR using analyses:

TransformationWhat it does
mem2reg / sroaPromote allocas into SSA registers
instcombineLocal algebraic simplification (e.g. x*1 → x, x+0 → x)
simplifycfgMerge basic blocks, eliminate trivial branches
gvnGlobal value numbering — eliminate redundant computations
licmLoop-invariant code motion (hoist out of loops)
inlinerInline function calls
loop-vectorizeTurn loops into SIMD vector ops
dce, adceDead code elimination

The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.

How -O0 becomes -O2

Recall the -O0 add from the previous lesson:

define i32 @add(i32 %0, i32 %1) { %3 = alloca i32 %4 = alloca i32 store i32 %0, ptr %3 store i32 %1, ptr %4 %5 = load i32, ptr %3 %6 = load i32, ptr %4 %7 = add nsw i32 %5, %6 ret i32 %7 }

The optimizer turns this into:

define i32 @add(i32 %0, i32 %1) { %3 = add nsw i32 %1, %0 ret i32 %3 }

The exact pass sequence:

  1. mem2reg sees alloca i32 whose only uses are store/load, promotes both to registers. Now there are no allocas.
  2. instcombine eliminates dead store/load pairs (the value flows directly).
  3. simplifycfg has nothing to do here (no branches), but in larger functions it cleans up.
  4. adce sweeps any leftover dead instructions (none).

You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.

The New Pass Manager (NPM)

Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:

#include "llvm/Passes/PassBuilder.h" PassBuilder PB; ModuleAnalysisManager MAM; FunctionAnalysisManager FAM; LoopAnalysisManager LAM; CGSCCAnalysisManager CGAM; PB.registerModuleAnalyses(MAM); PB.registerFunctionAnalyses(FAM); PB.registerLoopAnalyses(LAM); PB.registerCGSCCAnalyses(CGAM); PB.crossRegisterProxies(LAM, FAM, CGAM, MAM); ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2); MPM.run(*module, MAM);

Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.

The NPM is faster, has cleaner pass dependencies, and supports adaptors (run a function pass over every function in a module, etc.). Memorize the PassBuilderopt -passes= correspondence — every modern LLVM tutorial uses one or the other.

Writing your first transformation pass

Skeleton of an out-of-tree NPM pass:

#include "llvm/Pass.h" #include "llvm/IR/PassManager.h" #include "llvm/Passes/PassBuilder.h" #include "llvm/Passes/PassPlugin.h" using namespace llvm; struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> { PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) { bool changed = false; for (auto &BB : F) for (auto It = BB.begin(); It != BB.end(); ) { auto *Inst = &*It++; // Match: %x = mul i32 %a, 0 → %x = i32 0 if (auto *Mul = dyn_cast<BinaryOperator>(Inst); Mul && Mul->getOpcode() == Instruction::Mul) { if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1)); C && C->isZero()) { Mul->replaceAllUsesWith(C); Mul->eraseFromParent(); changed = true; } } } return changed ? PreservedAnalyses::none() : PreservedAnalyses::all(); } }; extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() { return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1", [](PassBuilder &PB) { PB.registerPipelineParsingCallback( [](StringRef Name, FunctionPassManager &FPM, ...) { if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; } return false; }); }}; }

Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.

Why instcombine would have caught this anyway

instcombine already implements x*0 → 0. Real production passes look for transformations the standard pipeline misses — domain-specific opportunities (e.g., recognizing a particular LLM-kernel idiom and lowering it differently), or new hardware idioms.

The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.

Run it in your browser — toy pass pipeline

Python — editableA mini optimizer running three passes (constant fold → DCE → algebraic simplify) over a list-of-instructions IR.
Ctrl+Enter to run

You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.

Quick check

Fill in the blank
The flag that prints LLVM IR after every pass:
It's the canonical debugging flag for any compiler-IR question.
Quick check
A compiler engineer writes a new transformation pass. They run it directly on `clang -O0` output and see no improvement. Most likely cause:

Key takeaways

  1. A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
  2. Pipelines compose. -O2 is ~150 passes; each enables the next.
  3. New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
  4. opt -print-after-all is the universal debugging tool for any “why did/didn’t this optimize” question.
  5. Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.

Go deeper

Prereq: LLVM IR Tour. Passes operate on the IR.

TL;DR

  • An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
  • Passes compose into a pipeline. -O0 is empty; -O2 is ~150 passes; -O3 adds aggressive vectorization. Each pass assumes the previous ones ran.
  • LLVM 14+ uses the New Pass Manager (NPM). Old legacy::PassManager is deprecated; new code uses PassBuilder + ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong.
  • Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
  • The single most useful flag for understanding a pipeline: opt --print-after-all — dumps IR after every pass. Scary the first time; indispensable forever after.

Why this matters

Your code being fast or slow is largely a function of which passes ran on it and in what order. Inlining unlocks constant folding which unlocks dead-code elimination which unlocks register allocation. Skip inlining and the rest doesn’t fire. Compilers aren’t single-shot transformations — they’re pipelines of cooperating passes, each enabling the next. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this didn’t get optimized.”

Mental model

Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.

Concrete walkthrough

Two kinds of passes

Analysis passes compute information without changing IR:

AnalysisWhat it computes
DominatorTreeWhich blocks dominate which (used by SSA construction, GVN)
LoopAnalysisThe loop nest structure
AliasAnalysisWhether two pointers may point to the same memory
ScalarEvolutionClosed-form expressions for loop induction variables
BranchProbabilityInfoLikely-taken edges (for layout)

Transformation passes rewrite IR using analyses:

TransformationWhat it does
mem2reg / sroaPromote allocas into SSA registers
instcombineLocal algebraic simplification (e.g. x*1 → x, x+0 → x)
simplifycfgMerge basic blocks, eliminate trivial branches
gvnGlobal value numbering — eliminate redundant computations
licmLoop-invariant code motion (hoist out of loops)
inlinerInline function calls
loop-vectorizeTurn loops into SIMD vector ops
dce, adceDead code elimination

The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.

How -O0 becomes -O2

Recall the -O0 add from the previous lesson:

define i32 @add(i32 %0, i32 %1) { %3 = alloca i32 %4 = alloca i32 store i32 %0, ptr %3 store i32 %1, ptr %4 %5 = load i32, ptr %3 %6 = load i32, ptr %4 %7 = add nsw i32 %5, %6 ret i32 %7 }

The optimizer turns this into:

define i32 @add(i32 %0, i32 %1) { %3 = add nsw i32 %1, %0 ret i32 %3 }

The exact pass sequence:

  1. mem2reg sees alloca i32 whose only uses are store/load, promotes both to SSA registers. Now there are no allocas.
  2. instcombine eliminates dead store/load pairs (the value flows directly).
  3. simplifycfg has nothing to do here (no branches), but in larger functions it cleans up.
  4. adce sweeps any leftover dead instructions (none).

You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.

The New Pass Manager (NPM)

Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:

#include "llvm/Passes/PassBuilder.h" PassBuilder PB; ModuleAnalysisManager MAM; FunctionAnalysisManager FAM; LoopAnalysisManager LAM; CGSCCAnalysisManager CGAM; PB.registerModuleAnalyses(MAM); PB.registerFunctionAnalyses(FAM); PB.registerLoopAnalyses(LAM); PB.registerCGSCCAnalyses(CGAM); PB.crossRegisterProxies(LAM, FAM, CGAM, MAM); ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2); MPM.run(*module, MAM);

Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.

The NPM is faster, has cleaner pass dependencies, and supports adaptors (run a function pass over every function in a module, etc.). Memorize the PassBuilderopt -passes= correspondence — every modern LLVM tutorial uses one or the other.

Writing your first transformation pass

Skeleton of an out-of-tree NPM pass:

#include "llvm/Pass.h" #include "llvm/IR/PassManager.h" #include "llvm/Passes/PassBuilder.h" #include "llvm/Passes/PassPlugin.h" using namespace llvm; struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> { PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) { bool changed = false; for (auto &BB : F) for (auto It = BB.begin(); It != BB.end(); ) { auto *Inst = &*It++; // Match: %x = mul i32 %a, 0 → %x = i32 0 if (auto *Mul = dyn_cast<BinaryOperator>(Inst); Mul && Mul->getOpcode() == Instruction::Mul) { if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1)); C && C->isZero()) { Mul->replaceAllUsesWith(C); Mul->eraseFromParent(); changed = true; } } } return changed ? PreservedAnalyses::none() : PreservedAnalyses::all(); } }; extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() { return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1", [](PassBuilder &PB) { PB.registerPipelineParsingCallback( [](StringRef Name, FunctionPassManager &FPM, ...) { if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; } return false; }); }}; }

Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.

Why instcombine would have caught this anyway

instcombine already implements x*0 → 0. Real production passes look for transformations the standard pipeline misses — domain-specific opportunities (e.g., recognizing a particular LLM-kernel idiom and lowering it differently), or new hardware idioms.

The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.

Run it in your browser — toy pass pipeline

Python — editableA mini optimizer running three passes (constant fold → DCE → algebraic simplify) over a list-of-instructions IR.
Ctrl+Enter to run

You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.

Quick check

Fill in the blank
The flag that prints LLVM IR after every pass:
It's the canonical debugging flag for any compiler-IR question.
Quick check
A compiler engineer writes a new transformation pass. They run it directly on `clang -O0` output and see no improvement. Most likely cause:

Key takeaways

  1. A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
  2. Pipelines compose. -O2 is ~150 passes; each enables the next.
  3. New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
  4. opt -print-after-all is the universal debugging tool for any “why did/didn’t this optimize” question.
  5. Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.

Go deeper