Cloudflare's Infire inference engine separates the prefill and decode phases of LLM inference across different node pools, allowing each phase to be scaled and hardware-matched independently — prefill is compute-bound while decode is memory-bandwidth-bound. Alongside this, Cloudflare developed Unweight, a compression approach that reduces model weights 15-22% without measurable accuracy regression. The production configuration runs Kimi K2.5 (over one trillion parameters) on eight-H100 nodes and Llama 4 Scout on two-H200 nodes distributed across Cloudflare's global network. The architecture is significant for platform engineers evaluating edge inference deployments because it demonstrates that disaggregated prefill is operationally viable at hyperscale without a centralized GPU cluster. The Unweight compression technique, if made available externally, could reduce both VRAM requirements and inter-node transfer costs. Teams designing multi-region inference pipelines should watch Cloudflare's developer documentation for any SDK or Workers AI API surface updates that expose these capabilities.