Skip to content

KV Cache

You can’t recompute the entire prefix on every generated token; the KV cache is what lets you avoid that. But the cache also dominates LLM serving memory — which is why PagedAttention, prefix caching, and KV quantization are central topics in modern serving systems.