Google Releases Gemini 3.1 Flash Lite to General Availability

Shipped
High importance

Google promoted Gemini 3.1 Flash Lite from preview to general availability, making it accessible through Google Cloud AI Platform. The model delivers a 2.5x improvement in time-to-first-token and 45% faster output throughput compared to Gemini 2.5 Flash, while supporting a 1M token context window. Pricing is set at $0.25 per 1M input tokens and $1.50 per 1M output tokens, placing it at the cost-efficient end of the Gemini 3.1 family. Highlighted use cases include translation pipelines, content moderation, UI generation, and simulation workloads. Flash Lite fills the gap between the heavier Gemini 3.1 Pro (shipped May 6) and latency-sensitive production scenarios. The combination of GA status, aggressive cost, and high throughput makes it a strong candidate for high-volume inference workloads. Developers already on Gemini 2.5 Flash should benchmark the upgrade path given the speed gains.

└─Google Blog / Google Cloud

May 8