Bridges & migration¶
breccia ships five bridges, one per major external convention. Each one handles a specific direction of interop and degrades gracefully when its optional dep is missing.
| Bridge | Direction | External dep | Status |
|---|---|---|---|
| TransformerEngine | TE Float8Tensor ↔ ScaledTensor | transformer-engine (Linux + CUDA) |
✅ v0.0.1 |
| torchao | AffineQuantizedTensor ↔ ScaledTensor | torchao |
✅ v0.0.1 (symmetric only) |
| HuggingFace safetensors | safetensors file ↔ ScaledTensor dict | safetensors |
✅ v0.0.1 |
| DLPack | zero-copy across NumPy / PyTorch / MLX / JAX | none (built-in) | ✅ v0.0.1 |
| DeepSeek-v3 | (data, scale) buffers ↔ ScaledTensor | none | ✅ v0.0.1 |
TransformerEngine¶
from breccia.bridges import from_transformer_engine, to_transformer_engine
# TE → breccia
st = from_transformer_engine(te_float8tensor)
# breccia → TE (per-tensor recipes only in v0.0.1)
te_t = to_transformer_engine(st)
The mapping is essentially a buffer pass-through: TE's _data becomes
st.data, and TE's _scale_inv becomes st.scale (same dequantization
convention).
Recipe support: bridge defaults to DelayedScaling for from_*. The
to_transformer_engine direction supports DelayedScaling and
Float8CurrentScaling in v0.0.1.
Installation: TransformerEngine installs on Linux + CUDA only. On
other platforms, calling the bridge functions raises a clear
ImportError with install instructions.
torchao¶
from breccia.bridges import from_torchao, to_torchao
# torchao AffineQuantizedTensor → breccia
st = from_torchao(aqt)
# breccia → torchao (INT4 symmetric only)
aqt = to_torchao(st)
v0.0.1 supports symmetric quantization only (zero_point = 0). Asymmetric INT4/INT8 lands in v0.1.
from_torchao infers the group size from int_data.shape[-1] // scale.shape[-1]
when scale is 2-D. Override with the recipe= argument if your layout
differs.
HuggingFace safetensors¶
from breccia.bridges import save_safetensors, load_safetensors
# Save a dict of ScaledTensors to a single file
save_safetensors(
{"w_q": w_quantized, "w_k": k_quantized},
"model.safetensors",
extra_metadata={"model_version": "v0.0.1"},
)
# Load back, recipes + layouts reconstructed from metadata
loaded = load_safetensors("model.safetensors")
# loaded["w_q"] is a ScaledTensor with the original recipe/layout
Format convention¶
For each name in the input dict:
f"{name}.data"— the data buffer (torch tensor in the safetensors file)f"{name}.scale"— the scale bufferf"{name}.config"(in metadata) — JSON of recipe + layout
The breccia metadata lives in safetensors' metadata dict, which other
safetensors readers silently ignore. So a breccia safetensors file is
backwards-compatible with any safetensors loader (it just gets raw
data/scale tensors, no recipe info).
Multiple tensors per file¶
The function packs as many ScaledTensors as you pass. Each name gets its
own .data / .scale / .config triple.
Skipping tensors without config¶
load_safetensors only returns tensors that have all three of
.data, .scale, and .config. Plain tensors in the same file are
silently ignored.
DLPack¶
from breccia.bridges import to_dlpack, from_dlpack
# Move a ScaledTensor's buffers to another framework (zero-copy when possible)
st_torch = from_dlpack(st_numpy, framework="torch")
st_mlx = from_dlpack(st_torch, framework="mlx")
# Raw capsules (for advanced use)
data_capsule, scale_capsule = to_dlpack(st_torch)
framework accepts: "numpy", "torch", "mlx", "jax". Recipe and
layout are unchanged; only data and scale are moved.
DLPack is the standard cross-framework zero-copy protocol. Most
framework from_dlpack implementations want the source tensor (with a
__dlpack__ method) rather than the raw capsule; the from_dlpack
helper above passes the tensor directly.
DeepSeek-v3¶
from breccia.bridges import from_deepseek_v3, to_deepseek_v3
# Raw DeepSeek-v3 FP8 buffers (block_k=128) → ScaledTensor
st = from_deepseek_v3(data, scale, block_k=128, fp8_format="E4M3")
# Inverse
data, scale = to_deepseek_v3(st)
DeepSeek-v3 ships FP8 E4M3 weights with per-128-element block scaling.
That's exactly Float8BlockScaling(block_k=128) + PerBlockK(128).
The bridge is a thin wrapper around from_buffer that picks the right
recipe and layout.
When to write a new bridge¶
If you're integrating breccia with a library that has its own quantized tensor type, the recipe for a new bridge is:
- Identify which breccia recipe matches (or extend the recipe set if yours doesn't fit any). See recipes.md.
- Identify which layout matches (or add a new one to
breccia/layouts.py). - Write a
_yourlib.pyinbreccia/bridges/withfrom_yourlib(...)andto_yourlib(...)functions. Use lazy imports —breccia.bridgesshould be importable even if your library isn't installed. - Add a row to the bridges/init.py exports.
- Add tests in
tests/test_bridges.py. Skip when the external dep is absent (pytest.importorskip). - Document in this file.
See _deepseek.py for the
simplest example (no external dep) and
_huggingface.py for the most
complex (custom file format + metadata).