Skip to content

scree

Variable-length tensors as a first-class type. Triton kernels at 1.6× of FlashAttention-2 on H100. Works on NumPy, PyTorch, MLX, JAX.

PyPI Python License CI Stars

import scree, numpy as np

# Three sequences of different lengths — no padding.
arr = scree.pack([np.random.randn(n, 8).astype(np.float32) for n in [4, 2, 7]])
# arr.values: (13, 8) ; arr.offsets: [0, 4, 6, 13]

# Run varlen attention. Each sequence attends only to itself.
from scree.kernels.reference import varlen_attention
out = varlen_attention(arr, arr, arr, causal=True)

Get started → GitHub → PyPI →

At a glance

1.30×
Forward vs FlashAttention-2 on H100
1.61×
Full training step vs FA-2
85%
Memory saved vs HF padded (inference-style)
4
Backends: NumPy, PyTorch, MLX, JAX
68
Tests passing across all backends
Apache-2.0
License

Why scree

Variable-length sequences are everywhere in ML — but every team carries their own incompatible representation. scree is the typed primitive that bridges them:

  • torch.nested — PyTorch-only, beta since 2021 → scree.bridges.to_torch_nested
  • FlashAttention cu_seqlens — convention, not a primitive → zero-copy scree.from_cu_seqlens
  • HuggingFace attention_mask — pads then masks → bit-exact scree.bridges.from_hf_padded
  • vLLM / SGLang packed batches — internal data structures → planned typed adapter
  • TF RaggedTensor — TensorFlow-only → scree.Array is the cross-framework version

What you can do today

Workflow Use case Status
Inference forward Drop into your varlen attention path ✅ 1.30× of FA-2
Training step Full backward via Triton (FA-2 style) ✅ 1.61× of FA-2
HF Transformers migration Convert at the boundary, save 70–85% memory ✅ Bit-exact round-trip
Apple Silicon training MLX backend, native Metal kernels
Cross-framework prototyping Run the same scree.Array on NumPy/PyTorch/MLX/JAX ✅ 68 tests verify agreement

Examples

The name

A scree is the irregular pile of rock fragments accumulated on a mountain slope. Variable-length sequences pack against each other the same way: irregular shapes, fitted by their irregularity, not despite it.


v0.0.1 on PyPI. Apache-2.0. Source on GitHub · FAQ · Discussions · Issues