On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.
On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use.