HumanEval on a MacBook — 81.7% pass@1, Wi-Fi off
The M5 Max MacBook Pro with 128 GB of unified memory is the first laptop that can hold a frontier-class coding agent entirely in RAM. No GPU rack. No cloud. No subscription. I just ran HumanEval on it. Wi-Fi off the entire run. - 81.7% pass@1 on the full 164-problem benchmark - Qwen 3 Coder 30B-A3B-Instruct (8-bit MLX) - 14 minutes wall-clock, $0/month after the model download YouTube walkthrough (three real problems, code streaming live, tests going green): https://www.youtube.com/watch?v=muq7VdgxqRk ## Why this number matters The Qwen team didn't publish HumanEval scores for any Qwen3-Coder variant — they consider the benchmark saturated and went straight to agentic ones (SWE-bench Verified, BFCL, Aider-Polyglot). For the 30B variant — the one that actually fits on a laptop — there were no published HumanEval/MBPP numbers. Until this run. I also ran MBPP (sanitized): 83.3% pass@1 on a 168-problem sample. Pass rate stable since n=120; full 427-run was impractical because a fe...