Model page

Gemini 2.5 Flash-Lite

Our smallest and most cost effective model, built for at scale usage.

About Gemini 2.5 Flash-Lite

At 262 tokens per second with a 0.37-second time-to-first-token, Gemini 2.5 Flash-Lite is built for one thing above all else: going fast without draining your budget. It delivers roughly 75% of Gemini 2.5 Flash's capability at 30% of the cost — $0.10 per million input tokens and $0.40 per million output tokens — making it the go-to choice for high-volume production workloads where per-call economics matter. Users consistently highlight the million-token context window as a practical win, letting entire codebases or lengthy documents land in a single request. The multimodal support (text, image, audio, video) is a bonus for pipelines that mix content types. The honest trade-off: reasoning is limited by design. Thinking mode is off by default, and the model isn't suited for complex multi-step logic without external scaffolding. For real-time, scale-first applications where you need speed and cost discipline, it hits the mark.

Best for

  • High-volume content classification, moderation, and document processing where per-call cost control is critical
  • Real-time chatbots and customer support assistants that need sub-half-second response times
  • Multimodal pipelines processing mixed text, image, audio, or video inputs at scale
  • Multilingual applications requiring broad language coverage (84.5% on Multilingual MMLU)
  • Agentic automation workflows with tool calling and Google Search grounding where latency compounds across steps

Specs & capabilities

How Gemini 2.5 Flash-Lite stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Low

Capability

Speed

Fast

Capability

Context window

1,048,576 tokens

Capability

Max output

65,535 tokens

Capability

Knowledge cutoff

January 2025

Frequently asked questions

How much does Gemini 2.5 Flash-Lite cost?

Input is $0.10 per million tokens for text, image, and video, and $0.30 per million for audio. Output is $0.40 per million tokens. Prompt caching drops cached input cost to $0.01 per million tokens — a 90% discount for repeated context.

What is the context window?

One million tokens (1,048,576). This lets you pass entire books, long codebases, or large document sets in a single request without chunking.

Is it good at reasoning and complex tasks?

Not by default. Thinking mode is disabled to prioritize speed and cost. For straightforward classification, extraction, or generation tasks it performs well, but complex multi-step reasoning requires enabling optional thinking budgets or using a more capable model like Gemini 2.5 Flash.

How does it compare to Gemini 2.5 Flash?

Flash-Lite delivers approximately 75% of Gemini 2.5 Flash's capability at 30% of the price. It's faster in raw throughput but trades away deeper reasoning depth. If your workload is latency- or cost-sensitive and doesn't demand heavy logic, Flash-Lite is the better pick.

What are the main limitations to know about?

Output is capped at 65,535 tokens despite the million-token input window, which limits long-form generation. The knowledge cutoff is January 2025. There is no fine-tuning support. Some users have reported occasional mid-sentence response cutoffs, tracked as a known issue.

Who should choose this model?

Teams running high-volume inference pipelines, real-time assistants, or cost-constrained applications where throughput and price per call are the primary constraints — and where tasks are well-defined enough that deep reasoning isn't required.

Related models