Skip to content

Distributed Edge

A single phone fits a 7B; an iPad fits a 13B; a M2 Air fits a 30B at 4-bit. None of those fits a 70B. But four of them on a Wi-Fi network do — and the math says the LAN is fast enough that pipeline-parallel inference works. EXO (the open-source clustering framework) and Petals (the BitTorrent-style swarm-inference network) make this practical today. This module covers how the work is sharded, what the network has to be like, and the limits.

0 / 1 lessons~16 min total