breccia¶
Block-scaled tensors as a first-class type. Triton FP8 scaled-matmul validated on H100 (cos sim 0.9993 vs FP32). Works on NumPy, PyTorch, MLX, JAX.
import breccia, numpy as np
# Quantize to FP8 with per-block-K scaling (DeepSeek-v3 style)
x = np.random.randn(8, 256).astype(np.float32)
st = breccia.cast(x, breccia.Float8BlockScaling(block_k=128))
# Scaled matmul: data stays in FP8, scales fold into the FP32 accumulator
A = breccia.cast(np.random.randn(16, 256).astype(np.float32), breccia.Float8CurrentScaling())
W = breccia.cast(np.random.randn(256, 128).astype(np.float32), breccia.Float8BlockScaling(block_k=128))
y = breccia.matmul(A, W)
At a glance¶
Why breccia¶
Block-scaled low-precision is everywhere in modern ML — but every framework carries its own incompatible representation. breccia is the typed primitive that bridges them:
- NVIDIA TransformerEngine — 4 non-composable recipe classes, NVIDIA-only →
breccia.bridges.from_transformer_engine - PyTorch torchao —
AffineQuantizedTensor, PyTorch-only →breccia.bridges.from_torchao - DeepSeek-v3 FP8 weights — private block-scaled format →
breccia.bridges.from_deepseek_v3 - HuggingFace safetensors — no native scale metadata →
breccia.bridges.save_safetensorswith recipe + layout preserved - AMD MI355 / Trainium2 / TPU v6 — incompatible scale semantics → one type, four backends today
The cross-vendor gap is widening through 2026–2027 with FP4. No vendor can be the neutral substrate. breccia is the "safetensors of low-precision."
What you can do today¶
| Workflow | Use case | Status |
|---|---|---|
| FP8 inference | Quantize + scaled matmul end-to-end | native torch.float8_e4m3fn |
| FP8 training | Forward + STE for gradient flow on PyTorch / JAX | cast_ste shipped |
| DeepSeek-v3 weight loading | Bit-exact from_deepseek_v3 round-trip |
v0.1 |
| Asymmetric INT4 (GPTQ / AWQ) | INT4Scaling(symmetric=False) + zero_point |
v0.1 |
| NVFP4 / MXFP8 quantize | Hardware-spec-locked block sizes (16 / 32) | v0.1 |
| Triton FP8 scaled matmul on Hopper / Ada / Blackwell | DeepSeek-pattern block-scaled GEMM | H100 validated |
| Cross-framework prototyping | Same ScaledTensor on NumPy / PyTorch / MLX / JAX |
250+ tests verify |
Examples¶
- 01 — Quickstart: cast + scaled matmul in 15 lines
- 02 — Recipe-portable training: train MXFP8, ship NVFP4 (same model code)
- 03 — Checkpoint with scale: safetensors round-trip preserving recipe + layout
- 04 — TE migration: bridge TransformerEngine
Float8Tensor→ScaledTensor
The name¶
A breccia is a sedimentary rock made of broken angular fragments held together by a cementing matrix. Low-precision data fragments + the scale tensor that gives them meaning — same structure.
It's the natural geological successor to
scree: loose fragments (scree) become
breccia when cemented together.
v0.1.1 on PyPI. Apache-2.0. Source on GitHub · FAQ · Discussions · Issues