scree¶
Variable-length tensors as a first-class type. Triton kernels at 1.6× of FlashAttention-2 on H100. Works on NumPy, PyTorch, MLX, JAX.
import scree, numpy as np
# Three sequences of different lengths — no padding.
arr = scree.pack([np.random.randn(n, 8).astype(np.float32) for n in [4, 2, 7]])
# arr.values: (13, 8) ; arr.offsets: [0, 4, 6, 13]
# Run varlen attention. Each sequence attends only to itself.
from scree.kernels.reference import varlen_attention
out = varlen_attention(arr, arr, arr, causal=True)
At a glance¶
1.30×
Forward vs FlashAttention-2 on H100
1.61×
Full training step vs FA-2
85%
Memory saved vs HF padded (inference-style)
4
Backends: NumPy, PyTorch, MLX, JAX
68
Tests passing across all backends
Apache-2.0
License
Why scree¶
Variable-length sequences are everywhere in ML — but every team carries their own incompatible representation. scree is the typed primitive that bridges them:
torch.nested— PyTorch-only, beta since 2021 →scree.bridges.to_torch_nested- FlashAttention
cu_seqlens— convention, not a primitive → zero-copyscree.from_cu_seqlens - HuggingFace
attention_mask— pads then masks → bit-exactscree.bridges.from_hf_padded - vLLM / SGLang packed batches — internal data structures → planned typed adapter
- TF
RaggedTensor— TensorFlow-only →scree.Arrayis the cross-framework version
What you can do today¶
| Workflow | Use case | Status |
|---|---|---|
| Inference forward | Drop into your varlen attention path | ✅ 1.30× of FA-2 |
| Training step | Full backward via Triton (FA-2 style) | ✅ 1.61× of FA-2 |
| HF Transformers migration | Convert at the boundary, save 70–85% memory | ✅ Bit-exact round-trip |
| Apple Silicon training | MLX backend, native Metal kernels | ✅ |
| Cross-framework prototyping | Run the same scree.Array on NumPy/PyTorch/MLX/JAX |
✅ 68 tests verify agreement |
Examples¶
- 01 — Quickstart: pack/unpack + varlen attention in 6 lines
- 02 — No-pad transformer: full pre-norm block, zero padding
- 03 — Training step with autograd: loss drops 80× over 30 steps
- 04 — HuggingFace compat: bit-exact migration recipe
- 05 — Multimodal interleaved: text + image-patch sequences in one Array
The name¶
A scree is the irregular pile of rock fragments accumulated on a mountain slope. Variable-length sequences pack against each other the same way: irregular shapes, fitted by their irregularity, not despite it.
v0.0.1 on PyPI. Apache-2.0. Source on GitHub · FAQ · Discussions · Issues