The Day Local AI Caught the Cloud: ds4, DeepSeek V4 Flash, and What Just Changed for Devs
If you write code for a living and you’ve been watching the local-AI space, May 9, 2026 is the date to circle. Salvatore Sanfilippo (yes, the guy who wrote Redis) shipped ds4 — a few thousand lines of hand-written C with Metal compute kernels, built for exactly one model: DeepSeek V4 Flash . I ran the same prompt through three engines on the same 128 GB MacBook Pro: DeepSeek V4 Flash via ds4 — fully local, off-cloud Cloud Claude through my Max plan Gemma 4 31B via MLX, also local Local DeepSeek beat cloud Claude on wall-clock time. That sentence used to be science fiction. ▶ Watch the companion video — three engines, one prompt, three completely different aurora animations rendered in real time on the same machine. The benchmark, for people who don’t want filler Engine Time Output Where it ran DeepSeek V4 Flash ( ds4 local) 103 s 3,259 tokens Apple Silicon GPU Cloud Claude (Max plan) 192 s ~3,500 tokens Anthropic data center Gemma 4 ...