Gemini 2.5 Flash
Our first hybrid reasoning model which supports a 1M token context window and has thinking budgets.
About Gemini 2.5 Flash
Gemini 2.5 Flash is Google's answer to the classic production dilemma: how much reasoning quality can you get before the cost and latency become a problem? Quite a lot, it turns out. At $0.30 per million input tokens and 208 tokens per second, it delivers near-Pro analytical depth for a fraction of the price — developers on Hacker News consistently describe it as the go-to choice for production systems where budgets matter. It handles text, images, video, and audio natively, and its one-million-token context window makes it practical for large document analysis or multi-file code work without chunking. The optional "thinking" mode lets you dial up reasoning depth when a task warrants it. That said, Flash is not a substitute for Pro on genuinely complex problems: users note it can sound confident while missing architectural or security-level nuances. And a persistent quirk — responses occasionally stop mid-sentence for no clear reason. For teams building customer-facing features, agentic pipelines, or high-volume document workflows where speed and cost are non-negotiable, it is a well-calibrated choice.
Best for
- Document processing at scale — the 1M-token context window handles contract review, legal analysis, and financial report extraction without chunking
- Code generation and refactoring — particularly strong for boilerplate, format conversions, and multi-file transformations in developer tooling
- Customer-facing chatbots and agents — 208 tokens/second output and 0.59s latency keep response times under two seconds
- Agentic and automated workflows — reliable function calling and tool integration make it well-suited to iterative, multi-step processes
- High-volume classification and content moderation — fast throughput and batch pricing ($0.15/$1.25 per million tokens) keep costs manageable at scale
Specs & capabilities
How Gemini 2.5 Flash stacks up — intelligence, speed, context, and modalities.
Intelligence
Low
Speed
Fast
Context window
1,048,576 tokens
Max output
65,535 tokens
Knowledge cutoff
January 1, 2025
Frequently asked questions
What does Gemini 2.5 Flash cost?
Standard pay-as-you-go pricing is $0.30 per million input tokens and $2.50 per million output tokens. A batch/flex tier halves those rates to $0.15 input and $1.25 output. A free tier exists but Google's terms permit human review of free-tier prompts for up to three years.
How large is the context window?
One million tokens (1,048,576). Maximum output per request is 64K tokens.
What input types does it support?
Text, images, video, and audio. Output is text, with native audio output available in recent updates.
How does it compare to Gemini 2.5 Pro?
Flash is significantly faster (208 tok/s vs. Pro's lower throughput) and cheaper, but trades off depth on complex reasoning tasks. Users report Flash can miss architectural or security nuances that Pro catches. For straightforward production workloads the cost-performance tradeoff typically favors Flash.
What is the thinking mode and when should I use it?
Flash supports an optional reasoning mode controlled via a thinking budget parameter, similar in concept to o1-style step-by-step reasoning. It's useful for math-heavy or multi-step problems where you need more than the model's default one-pass output, without paying for a full Pro call.
Is this model still current?
Gemini 2.5 Flash reached general availability in June 2025 and remains available, but Google has since released Gemini 3.x Flash generations that supersede it in both performance and speed. It is a prior-generation model.