Skip to content

breccia

Block-scaled tensors as a first-class type. Triton FP8 scaled-matmul validated on H100 (cos sim 0.9993 vs FP32). Works on NumPy, PyTorch, MLX, JAX.

PyPI Python License CI Stars

import breccia, numpy as np

# Quantize to FP8 with per-block-K scaling (DeepSeek-v3 style)
x = np.random.randn(8, 256).astype(np.float32)
st = breccia.cast(x, breccia.Float8BlockScaling(block_k=128))

# Scaled matmul: data stays in FP8, scales fold into the FP32 accumulator
A = breccia.cast(np.random.randn(16, 256).astype(np.float32), breccia.Float8CurrentScaling())
W = breccia.cast(np.random.randn(256, 128).astype(np.float32), breccia.Float8BlockScaling(block_k=128))
y = breccia.matmul(A, W)

Get started → GitHub → PyPI →

At a glance

0.9993
Cos sim vs FP32 (Triton on H100)
Memory savings vs FP32 (FP8 / FP4 / INT4)
6
Recipes covering today's fragmentation
4
Backends: NumPy, PyTorch, MLX, JAX
5
Bridges: TE, torchao, HF, DLPack, DeepSeek
250+
Tests across all backends

Why breccia

Block-scaled low-precision is everywhere in modern ML — but every framework carries its own incompatible representation. breccia is the typed primitive that bridges them:

  • NVIDIA TransformerEngine — 4 non-composable recipe classes, NVIDIA-only → breccia.bridges.from_transformer_engine
  • PyTorch torchaoAffineQuantizedTensor, PyTorch-only → breccia.bridges.from_torchao
  • DeepSeek-v3 FP8 weights — private block-scaled format → breccia.bridges.from_deepseek_v3
  • HuggingFace safetensors — no native scale metadata → breccia.bridges.save_safetensors with recipe + layout preserved
  • AMD MI355 / Trainium2 / TPU v6 — incompatible scale semantics → one type, four backends today

The cross-vendor gap is widening through 2026–2027 with FP4. No vendor can be the neutral substrate. breccia is the "safetensors of low-precision."

What you can do today

Workflow Use case Status
FP8 inference Quantize + scaled matmul end-to-end native torch.float8_e4m3fn
FP8 training Forward + STE for gradient flow on PyTorch / JAX cast_ste shipped
DeepSeek-v3 weight loading Bit-exact from_deepseek_v3 round-trip v0.1
Asymmetric INT4 (GPTQ / AWQ) INT4Scaling(symmetric=False) + zero_point v0.1
NVFP4 / MXFP8 quantize Hardware-spec-locked block sizes (16 / 32) v0.1
Triton FP8 scaled matmul on Hopper / Ada / Blackwell DeepSeek-pattern block-scaled GEMM H100 validated
Cross-framework prototyping Same ScaledTensor on NumPy / PyTorch / MLX / JAX 250+ tests verify

Examples

The name

A breccia is a sedimentary rock made of broken angular fragments held together by a cementing matrix. Low-precision data fragments + the scale tensor that gives them meaning — same structure.

It's the natural geological successor to scree: loose fragments (scree) become breccia when cemented together.


v0.1.1 on PyPI. Apache-2.0. Source on GitHub · FAQ · Discussions · Issues