Passes & Pipelines

Prereq: LLVM IR Tour. Passes operate on the IR.

If LLVM IR is the language of compilers, passes are the verbs. A pass is a function that reads or rewrites IR — mem2reg promotes stack slots into SSA registers, instcombine simplifies x+0 to x, inliner replaces a function call with the called function’s body. Modern compilers don’t optimize in one big sweep; they apply ~150 small passes in sequence, each unlocking the next.

This is the lesson where “the compiler optimized my code” stops being magic. The optimizer is literally a list of passes, run in a specific order, each one assuming the previous ones happened. Inlining unlocks constant folding, which unlocks dead-code elimination, which unlocks register allocation. Skip inlining and the rest doesn’t fire. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this particular thing didn’t get optimized.”

TL;DR

An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
Passes compose into a pipeline. -O0 is empty; -O2 is ~150 passes; -O3 adds aggressive vectorization. Each pass assumes the previous ones ran.
LLVM 14+ uses the New Pass Manager (NPM). Old legacy::PassManager is deprecated; new code uses PassBuilder + ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong.
Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
The single most useful flag for understanding a pipeline: opt --print-after-all — dumps IR after every pass. Scary the first time; indispensable forever after.

Mental model

Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.

Two kinds of passes

Analysis passes compute information without changing IR:

Analysis	What it computes
DominatorTree	Which blocks dominate which (used by SSA construction, GVN)
LoopAnalysis	The loop nest structure
AliasAnalysis	Whether two pointers may point to the same memory
ScalarEvolution	Closed-form expressions for loop induction variables
BranchProbabilityInfo	Likely-taken edges (for layout)

Transformation passes rewrite IR using analyses:

Transformation	What it does
`mem2reg` / `sroa`	Promote `alloca`s into SSA registers
`instcombine`	Local algebraic simplification (e.g. `x*1 → x`, `x+0 → x`)
`simplifycfg`	Merge basic blocks, eliminate trivial branches
`gvn`	Global value numbering — eliminate redundant computations
`licm`	Loop-invariant code motion (hoist out of loops)
`inliner`	Inline function calls
`loop-vectorize`	Turn loops into SIMD vector ops
`dce`, `adce`	Dead code elimination

The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.

How `-O0` becomes `-O2`

Recall the -O0 add from the previous lesson:


define i32 @add(i32 %0, i32 %1) {
  %3 = alloca i32
  %4 = alloca i32
  store i32 %0, ptr %3
  store i32 %1, ptr %4
  %5 = load i32, ptr %3
  %6 = load i32, ptr %4
  %7 = add nsw i32 %5, %6
  ret i32 %7
}

The optimizer turns this into:


define i32 @add(i32 %0, i32 %1) {
  %3 = add nsw i32 %1, %0
  ret i32 %3
}

The exact pass sequence:

mem2reg sees alloca i32 whose only uses are store/load, promotes both to registers. Now there are no allocas.
instcombine eliminates dead store/load pairs (the value flows directly).
simplifycfg has nothing to do here (no branches), but in larger functions it cleans up.
adce sweeps any leftover dead instructions (none).

You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.

The New Pass Manager (NPM)

Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:


#include "llvm/Passes/PassBuilder.h"
 
PassBuilder PB;
 
ModuleAnalysisManager MAM;
FunctionAnalysisManager FAM;
LoopAnalysisManager LAM;
CGSCCAnalysisManager CGAM;
 
PB.registerModuleAnalyses(MAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.registerCGSCCAnalyses(CGAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
 
ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.run(*module, MAM);

Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.

The NPM is faster, has cleaner pass dependencies, and supports adaptors (run a function pass over every function in a module, etc.). Memorize the PassBuilder ↔ opt -passes= correspondence — every modern LLVM tutorial uses one or the other.

Writing your first transformation pass

Skeleton of an out-of-tree NPM pass:


#include "llvm/Pass.h"
#include "llvm/IR/PassManager.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
 
using namespace llvm;
 
struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
    bool changed = false;
    for (auto &BB : F)
      for (auto It = BB.begin(); It != BB.end(); ) {
        auto *Inst = &*It++;
        // Match: %x = mul i32 %a, 0 → %x = i32 0
        if (auto *Mul = dyn_cast<BinaryOperator>(Inst);
            Mul && Mul->getOpcode() == Instruction::Mul) {
          if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1));
              C && C->isZero()) {
            Mul->replaceAllUsesWith(C);
            Mul->eraseFromParent();
            changed = true;
          }
        }
      }
    return changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
  }
};
 
extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() {
  return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1",
    [](PassBuilder &PB) {
      PB.registerPipelineParsingCallback(
        [](StringRef Name, FunctionPassManager &FPM, ...) {
          if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; }
          return false;
        });
    }};
}

Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.

Why `instcombine` would have caught this anyway

instcombine already implements x*0 → 0. Real production passes look for transformations the standard pipeline misses — domain-specific opportunities (e.g., recognizing a particular LLM-kernel idiom and lowering it differently), or new hardware idioms.

The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.

Run it in your browser — toy pass pipeline

Python — editableA mini optimizer running three passes (constant fold → DCE → algebraic simplify) over a list-of-instructions IR.

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

Ctrl+Enter to run

You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.

Quick check

Fill in the blank

The flag that prints LLVM IR after every pass:

It's the canonical debugging flag for any compiler-IR question.

Quick check

A compiler engineer writes a new transformation pass. They run it directly on `clang -O0` output and see no improvement. Most likely cause:

Key takeaways

A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
Pipelines compose. -O2 is ~150 passes; each enables the next.
New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
opt -print-after-all is the universal debugging tool for any “why did/didn’t this optimize” question.
Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.

Go deeper

DocsLLVM — Using the New Pass ManagerAuthoritative. Includes the migration guide from legacy passes.
DocsLLVM — All PassesReference for every pass in the tree. Skim once to know what's on offer.
DocsWriting an LLVM Pass (NPM)The official walkthrough for an out-of-tree NPM pass — what the module capstone is built around.
BlogThe New Pass Manager · LLVM blog, 2021Why NPM exists; the design tradeoffs vs legacy. Useful background.
BlogLLVM Optimizations You Should Know · James BornholtConcise tour of the most-impactful passes with worked examples.
Repobanach-space/llvm-tutorA maintained, current (2024+) collection of out-of-tree NPM pass examples. The best place to copy from.

Prereq: LLVM IR Tour. Passes operate on the IR.

TL;DR

An LLVM pass is a function that reads or rewrites IR. Two flavors: analysis passes (compute information about the IR, like dominator trees or alias info) and transformation passes (rewrite IR, like inlining or dead-code elimination).
Passes compose into a pipeline. -O0 is empty; -O2 is ~150 passes; -O3 adds aggressive vectorization. Each pass assumes the previous ones ran.
LLVM 14+ uses the New Pass Manager (NPM). Old legacy::PassManager is deprecated; new code uses PassBuilder + ModulePassManager. The migration matters because tutorials older than 2022 are usually wrong.
Analysis passes are cached and invalidated: if a transformation changes the IR, dependent analyses get re-run lazily. This is what makes pipelines fast.
The single most useful flag for understanding a pipeline: opt --print-after-all — dumps IR after every pass. Scary the first time; indispensable forever after.

Why this matters

Your code being fast or slow is largely a function of which passes ran on it and in what order. Inlining unlocks constant folding which unlocks dead-code elimination which unlocks register allocation. Skip inlining and the rest doesn’t fire. Compilers aren’t single-shot transformations — they’re pipelines of cooperating passes, each enabling the next. Knowing the pipeline is what separates “the compiler is magic” from “I can debug why this didn’t get optimized.”

Mental model

Each pass takes IR, returns IR (slightly different). Some are no-ops on a given input; some unlock major changes. The whole -O2 pipeline is around 150 such steps.

Concrete walkthrough

Two kinds of passes

Analysis passes compute information without changing IR:

Analysis	What it computes
DominatorTree	Which blocks dominate which (used by SSA construction, GVN)
LoopAnalysis	The loop nest structure
AliasAnalysis	Whether two pointers may point to the same memory
ScalarEvolution	Closed-form expressions for loop induction variables
BranchProbabilityInfo	Likely-taken edges (for layout)

Transformation passes rewrite IR using analyses:

Transformation	What it does
`mem2reg` / `sroa`	Promote `alloca`s into SSA registers
`instcombine`	Local algebraic simplification (e.g. `x*1 → x`, `x+0 → x`)
`simplifycfg`	Merge basic blocks, eliminate trivial branches
`gvn`	Global value numbering — eliminate redundant computations
`licm`	Loop-invariant code motion (hoist out of loops)
`inliner`	Inline function calls
`loop-vectorize`	Turn loops into SIMD vector ops
`dce`, `adce`	Dead code elimination

The pipeline is essentially: promote memory to registers → simplify → inline → propagate → simplify again → vectorize → simplify again → emit code.

How `-O0` becomes `-O2`

Recall the -O0 add from the previous lesson:


define i32 @add(i32 %0, i32 %1) {
  %3 = alloca i32
  %4 = alloca i32
  store i32 %0, ptr %3
  store i32 %1, ptr %4
  %5 = load i32, ptr %3
  %6 = load i32, ptr %4
  %7 = add nsw i32 %5, %6
  ret i32 %7
}

The optimizer turns this into:


define i32 @add(i32 %0, i32 %1) {
  %3 = add nsw i32 %1, %0
  ret i32 %3
}

The exact pass sequence:

mem2reg sees alloca i32 whose only uses are store/load, promotes both to SSA registers. Now there are no allocas.
instcombine eliminates dead store/load pairs (the value flows directly).
simplifycfg has nothing to do here (no branches), but in larger functions it cleans up.
adce sweeps any leftover dead instructions (none).

You can watch this happen yourself: opt -O2 -print-after-all add.ll 2>&1 | less — every pass prints “IR after <pass>” so you see exactly which one fired.

The New Pass Manager (NPM)

Every LLVM tutorial older than 2022 uses the legacy pass manager. Don’t follow them. The current way:


#include "llvm/Passes/PassBuilder.h"
 
PassBuilder PB;
 
ModuleAnalysisManager MAM;
FunctionAnalysisManager FAM;
LoopAnalysisManager LAM;
CGSCCAnalysisManager CGAM;
 
PB.registerModuleAnalyses(MAM);
PB.registerFunctionAnalyses(FAM);
PB.registerLoopAnalyses(LAM);
PB.registerCGSCCAnalyses(CGAM);
PB.crossRegisterProxies(LAM, FAM, CGAM, MAM);
 
ModulePassManager MPM = PB.buildPerModuleDefaultPipeline(OptimizationLevel::O2);
MPM.run(*module, MAM);

Or equivalently from the command line: opt -passes='default<O2>'. Specify individual passes: opt -passes='instcombine,simplifycfg,gvn'.

Writing your first transformation pass

Skeleton of an out-of-tree NPM pass:


#include "llvm/Pass.h"
#include "llvm/IR/PassManager.h"
#include "llvm/Passes/PassBuilder.h"
#include "llvm/Passes/PassPlugin.h"
 
using namespace llvm;
 
struct MultiplyByZeroPass : PassInfoMixin<MultiplyByZeroPass> {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &FAM) {
    bool changed = false;
    for (auto &BB : F)
      for (auto It = BB.begin(); It != BB.end(); ) {
        auto *Inst = &*It++;
        // Match: %x = mul i32 %a, 0 → %x = i32 0
        if (auto *Mul = dyn_cast<BinaryOperator>(Inst);
            Mul && Mul->getOpcode() == Instruction::Mul) {
          if (auto *C = dyn_cast<ConstantInt>(Mul->getOperand(1));
              C && C->isZero()) {
            Mul->replaceAllUsesWith(C);
            Mul->eraseFromParent();
            changed = true;
          }
        }
      }
    return changed ? PreservedAnalyses::none() : PreservedAnalyses::all();
  }
};
 
extern "C" PassPluginLibraryInfo llvmGetPassPluginInfo() {
  return {LLVM_PLUGIN_API_VERSION, "MultiplyByZero", "v0.1",
    [](PassBuilder &PB) {
      PB.registerPipelineParsingCallback(
        [](StringRef Name, FunctionPassManager &FPM, ...) {
          if (Name == "mult-zero") { FPM.addPass(MultiplyByZeroPass()); return true; }
          return false;
        });
    }};
}

Build it as a shared library and run: opt -load-pass-plugin=./MultZero.so -passes='mult-zero' input.ll -S. This is the same pattern as the module capstone — only the matched pattern differs.

Why `instcombine` would have caught this anyway

The lesson: passes compose. The interesting work is finding combinations the existing pipeline misses, or finding optimizations that need cross-function/cross-module visibility.

Run it in your browser — toy pass pipeline

Python — editableA mini optimizer running three passes (constant fold → DCE → algebraic simplify) over a list-of-instructions IR.

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

def is_const(v): return isinstance(v, int)

def constant_fold(prog):
  """If both operands are constants, replace with the result."""
  out = []
  env = {n: v for (op, n, *_) in prog if op == 'const' for v in [_[0]]}
  for instr in prog:
      if instr[0] == 'add' and is_const(env.get(instr[2])) and is_const(env.get(instr[3])):
          v = env[instr[2]] + env[instr[3]]
          env[instr[1]] = v
          out.append(('const', instr[1], v))
      else:
          out.append(instr)
          if instr[0] == 'const':
              env[instr[1]] = instr[2]
  return out

def dce(prog):
  """Remove instructions whose result is never used."""
  used = set()
  for op, name, *args in prog:
      for a in args:
          if isinstance(a, str): used.add(a)
  return [i for i in prog if i[0] == 'ret' or i[1] in used]

def algebraic(prog):
  """x + 0 -> x; x * 0 -> 0; x * 1 -> x."""
  out = []
  env = {n: v for (op, n, *rest) in prog if op == 'const' for v in [rest[0]]}
  rewrites = {}
  for instr in prog:
      if instr[0] == 'add' and env.get(instr[3]) == 0:
          rewrites[instr[1]] = instr[2]
          continue
      if instr[0] == 'mul' and env.get(instr[3]) == 0:
          out.append(('const', instr[1], 0)); continue
      if instr[0] == 'mul' and env.get(instr[3]) == 1:
          rewrites[instr[1]] = instr[2]
          continue
      out.append(tuple(rewrites.get(a, a) if isinstance(a, str) else a for a in instr))
  return out

prog = [
  ('const', 'a', 5),
  ('const', 'b', 0),
  ('add',   'c', 'a', 'b'),       # c = a + 0 → c = a
  ('mul',   'd', 'c', 'a'),       # d = c * a → d = a * a
  ('const', 'one', 1),
  ('mul',   'e', 'd', 'one'),     # e = d * 1 → e = d
  ('const', 'zero', 0),
  ('mul',   'f', 'e', 'zero'),    # f = e * 0 → f = 0
  ('ret',   None, 'f'),
]

def show(label, p):
  print(f"\n--- {label} ({len(p)} instructions) ---")
  for i in p: print(f"  {i}")

show("input", prog)
prog = algebraic(prog);  show("after algebraic", prog)
prog = constant_fold(prog); show("after constant_fold", prog)
prog = dce(prog);          show("after dce", prog)

Ctrl+Enter to run

You should see the program shrink dramatically as each pass fires — exactly what watching opt -print-after-all looks like, in miniature.

Quick check

Fill in the blank

The flag that prints LLVM IR after every pass:

It's the canonical debugging flag for any compiler-IR question.

Quick check

A compiler engineer writes a new transformation pass. They run it directly on `clang -O0` output and see no improvement. Most likely cause:

Key takeaways

A pass takes IR, returns IR. Two kinds: analyses (read-only) and transformations (rewrite).
Pipelines compose. -O2 is ~150 passes; each enables the next.
New Pass Manager (NPM) is the current API. Tutorials before 2022 use the legacy one — don’t.
opt -print-after-all is the universal debugging tool for any “why did/didn’t this optimize” question.
Most interesting work is at pass interactions — finding sequences the standard pipeline misses or domain-specific rewrites the pipeline doesn’t know about.

Go deeper

DocsLLVM — Using the New Pass ManagerAuthoritative. Includes the migration guide from legacy passes.
DocsLLVM — All PassesReference for every pass in the tree. Skim once to know what's on offer.
DocsWriting an LLVM Pass (NPM)The official walkthrough for an out-of-tree NPM pass — what the module capstone is built around.
BlogThe New Pass Manager · LLVM blog, 2021Why NPM exists; the design tradeoffs vs legacy. Useful background.
BlogLLVM Optimizations You Should Know · James BornholtConcise tour of the most-impactful passes with worked examples.
Repobanach-space/llvm-tutorA maintained, current (2024+) collection of out-of-tree NPM pass examples. The best place to copy from.

Passes & Pipelines

TL;DR

Mental model

Two kinds of passes

How -O0 becomes -O2

The New Pass Manager (NPM)

Writing your first transformation pass

Why instcombine would have caught this anyway

Run it in your browser — toy pass pipeline

Quick check

Key takeaways

Go deeper

TL;DR

Why this matters

Mental model

Concrete walkthrough

Two kinds of passes

How -O0 becomes -O2

The New Pass Manager (NPM)

Writing your first transformation pass

Why instcombine would have caught this anyway

Run it in your browser — toy pass pipeline

Quick check

Key takeaways

Go deeper

How `-O0` becomes `-O2`

Why `instcombine` would have caught this anyway

How `-O0` becomes `-O2`

Why `instcombine` would have caught this anyway