Skip to content

On-Device Runtimes

Three runtimes ship 95% of on-device LLM workloads in 2026: llama.cpp (everywhere, GGUF), ExecuTorch (PyTorch-native mobile), Core ML (Apple Neural Engine). Each has a different sweet spot. This module walks all three end to end.