Gemini 3.5 Flash
Gemini 3.5 Flash combines frontier intelligence with fast responses, search grounding, and multimodal strengths. Uses 2 premium requests per send before length multipliers.
About Gemini 3.5 Flash
The first Flash-tier model to outperform a Pro on coding and agentic benchmarks, Gemini 3.5 Flash rewrites expectations for what a speed-optimized model can do. At over 280 tokens per second — roughly 4x faster than comparable frontier models — it sustains the throughput that production agent loops demand, while benchmark results on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) put it ahead of Gemini 3.1 Pro on the tasks developers actually care about. Early users call it "an insane value" for delivering near-frontier intelligence at roughly a third of Pro's cost. The 31-point drop in hallucination rate over its predecessor makes it meaningfully more reliable in practice. The honest caveat: time to first token sits around 19 seconds, which stings in latency-sensitive interactions, and aggressive rate limiting has frustrated users hitting it hard. Deep reasoning, hard analytical problems, and ultra-long context retrieval still favor the Pro. But for teams running iterative coding agents, structured data pipelines, or high-throughput chatbots where cost and speed are the binding constraints, Flash 3.5 is the practical choice.
Best for
- Production agent loops and multi-step tool-use workflows where sustained throughput matters
- Agentic coding — boilerplate generation, unit test writing, pseudocode conversion, and automated refactoring
- Structured data extraction and JSON-mode pipelines with multimodal inputs (text, image, audio, video)
- High-volume enterprise chatbots and conversational AI needing cost-efficient multi-turn reasoning
- Video and audio processing tasks requiring long-context media handling at scale
Specs & capabilities
How Gemini 3.5 Flash stacks up — intelligence, speed, context, and modalities.
Intelligence
High
Speed
Fast
Context window
1,048,576 tokens
Max output
65,536 tokens
Knowledge cutoff
January 2026
Availability notes
2 premium requests per send before length multipliers · Search grounding support · 1M context
Frequently asked questions
What does Gemini 3.5 Flash cost?
Input is $1.50 per million tokens and output is $9.00 per million tokens. Cached input drops to $0.15 per million tokens — a 90% discount that makes repeated-context agent loops significantly cheaper.
How large is the context window?
1,048,576 input tokens (roughly 1 million tokens), with a maximum output of 65,536 tokens (64K).
How does it compare to Gemini 3.1 Pro?
Flash 3.5 actually beats Gemini 3.1 Pro on coding and agentic benchmarks — for example, 76.2% vs 70.3% on Terminal-Bench 2.1. Pro still leads by 3–8 points on academic reasoning tasks, deep analytical problems, and ultra-long context retrieval.
What are the known weaknesses?
Time to first token averages about 19 seconds, which can feel slow in chat-style interactions. It also hits aggressive rate limits under heavy load, and it is not the right pick for deep reasoning, hard math, or precision-critical long-context retrieval.
What modalities does it support?
It accepts text, image, audio, and video inputs. Output is text only — there is no image generation, audio generation, or Computer Use support.
When should I pick Flash 3.5 over a cheaper Flash model?
When your workload involves agentic coding, multi-step tool use, or structured multimodal extraction. The 31-point hallucination reduction and superior agentic benchmark scores justify the higher cost over Gemini 3 Flash for reliability-sensitive production use cases.