Question 1

How much does Gemini 3.1 Flash-Lite cost?

Accepted Answer

Input is $0.25 per 1M tokens and output is $1.50 per 1M tokens. With caching (90% discount on cache hits), the blended rate drops to roughly $0.22 per 1M tokens for workloads with repeated context.

Question 2

What is the context window?

Accepted Answer

1,048,576 tokens — approximately 1 million tokens — with a maximum output of 64K tokens per response.

Question 3

What input types does it support?

Accepted Answer

Text, images, audio, and video. Output is text only.

Question 4

Is it suitable for real-time chat applications?

Accepted Answer

Not ideally. Time-to-first-token averages 5.27 seconds, which is notably higher than the cross-model median of 2.09 seconds. It is better suited to batch and background workflows where that initial delay is not user-facing.

Question 5

How does it compare to Gemini 2.5 Flash?

Accepted Answer

It matches Gemini 2.5 Flash on many common tasks at a lower cost, but trades off raw reasoning capability. Its AIME 2025 score of 16.7% confirms it is not suited for advanced mathematics or frontier reasoning — use a larger model for those cases.

Question 6

What makes the reasoning feature different from a simple 'thinking mode'?

Accepted Answer

Rather than a binary on/off thinking toggle, it offers four granular levels — minimal, low, medium, and high — so you can tune reasoning depth and cost independently per request.

Gemini 3.1 Flash-Lite

About Gemini 3.1 Flash-Lite

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Frequently asked questions

Related models