Model page

Gemini 3.1 Flash-Lite

Stable Gemini 3.1 Flash-Lite model for high-volume agentic tasks, translation, and simple data processing. 1M/65k context.

About Gemini 3.1 Flash-Lite

When throughput and cost per token drive the decision, Gemini 3.1 Flash-Lite is built precisely for that constraint. Released in March 2026, it delivers roughly 363 tokens per second — fast enough to handle classification, labeling, translation, and tool-calling pipelines at production scale without the budget pressure of heavier models. Developers who have put it through its paces consistently praise it as the most cost-effective Gemini model available, calling out its quality step-up over earlier Flash-Lite generations as a genuine surprise at this price tier. The adjustable thinking levels — minimal through high — let you tune reasoning depth per request, which is genuinely useful for routing cost-sensitive workloads. The honest caveat: time-to-first-token sits at 5.27 seconds, well above the cross-model median of 2.09 seconds, so it is not the right choice for real-time conversational interfaces where perceived snappiness matters. For batch, background, and high-volume tasks where that first-token latency is invisible, it offers a hard-to-beat combination of speed, multimodal coverage, and a 1M-token context window.

Best for

  • High-volume translation of chat messages, support tickets, and user reviews at scale
  • Content classification, intent detection, and customer support ticket routing
  • Multimodal labeling — image annotation, video scene classification, and audio transcription at scale
  • Code exploration, API documentation retrieval, and lightweight tool-calling pipelines
  • Production inference workloads where cost per token is a primary constraint and batch latency is acceptable

Specs & capabilities

How Gemini 3.1 Flash-Lite stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Medium

Capability

Speed

Fast

Capability

Context window

1,048,576 tokens

Capability

Max output

65,536 tokens

Capability

Knowledge cutoff

January 2025

Frequently asked questions

How much does Gemini 3.1 Flash-Lite cost?

Input is $0.25 per 1M tokens and output is $1.50 per 1M tokens. With caching (90% discount on cache hits), the blended rate drops to roughly $0.22 per 1M tokens for workloads with repeated context.

What is the context window?

1,048,576 tokens — approximately 1 million tokens — with a maximum output of 64K tokens per response.

What input types does it support?

Text, images, audio, and video. Output is text only.

Is it suitable for real-time chat applications?

Not ideally. Time-to-first-token averages 5.27 seconds, which is notably higher than the cross-model median of 2.09 seconds. It is better suited to batch and background workflows where that initial delay is not user-facing.

How does it compare to Gemini 2.5 Flash?

It matches Gemini 2.5 Flash on many common tasks at a lower cost, but trades off raw reasoning capability. Its AIME 2025 score of 16.7% confirms it is not suited for advanced mathematics or frontier reasoning — use a larger model for those cases.

What makes the reasoning feature different from a simple 'thinking mode'?

Rather than a binary on/off thinking toggle, it offers four granular levels — minimal, low, medium, and high — so you can tune reasoning depth and cost independently per request.

Related models