Model page

Gemini 3.5 Flash

Gemini 3.5 Flash combines frontier intelligence with fast responses, search grounding, and multimodal strengths. Uses 2 premium requests per send before length multipliers.

About Gemini 3.5 Flash

The first Flash-tier model to outperform a Pro on coding and agentic benchmarks, Gemini 3.5 Flash rewrites expectations for what a speed-optimized model can do. At over 280 tokens per second — roughly 4x faster than comparable frontier models — it sustains the throughput that production agent loops demand, while benchmark results on Terminal-Bench 2.1 (76.2%) and MCP Atlas (83.6%) put it ahead of Gemini 3.1 Pro on the tasks developers actually care about. Early users call it "an insane value" for delivering near-frontier intelligence at roughly a third of Pro's cost. The 31-point drop in hallucination rate over its predecessor makes it meaningfully more reliable in practice. The honest caveat: time to first token sits around 19 seconds, which stings in latency-sensitive interactions, and aggressive rate limiting has frustrated users hitting it hard. Deep reasoning, hard analytical problems, and ultra-long context retrieval still favor the Pro. But for teams running iterative coding agents, structured data pipelines, or high-throughput chatbots where cost and speed are the binding constraints, Flash 3.5 is the practical choice.

Best for

  • Production agent loops and multi-step tool-use workflows where sustained throughput matters
  • Agentic coding — boilerplate generation, unit test writing, pseudocode conversion, and automated refactoring
  • Structured data extraction and JSON-mode pipelines with multimodal inputs (text, image, audio, video)
  • High-volume enterprise chatbots and conversational AI needing cost-efficient multi-turn reasoning
  • Video and audio processing tasks requiring long-context media handling at scale

Specs & capabilities

How Gemini 3.5 Flash stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Fast

Capability

Context window

1,048,576 tokens

Capability

Max output

65,536 tokens

Capability

Knowledge cutoff

January 2026

Features

Availability notes

2 premium requests per send before length multipliers · Search grounding support · 1M context

Frequently asked questions

What does Gemini 3.5 Flash cost?

Input is $1.50 per million tokens and output is $9.00 per million tokens. Cached input drops to $0.15 per million tokens — a 90% discount that makes repeated-context agent loops significantly cheaper.

How large is the context window?

1,048,576 input tokens (roughly 1 million tokens), with a maximum output of 65,536 tokens (64K).

How does it compare to Gemini 3.1 Pro?

Flash 3.5 actually beats Gemini 3.1 Pro on coding and agentic benchmarks — for example, 76.2% vs 70.3% on Terminal-Bench 2.1. Pro still leads by 3–8 points on academic reasoning tasks, deep analytical problems, and ultra-long context retrieval.

What are the known weaknesses?

Time to first token averages about 19 seconds, which can feel slow in chat-style interactions. It also hits aggressive rate limits under heavy load, and it is not the right pick for deep reasoning, hard math, or precision-critical long-context retrieval.

What modalities does it support?

It accepts text, image, audio, and video inputs. Output is text only — there is no image generation, audio generation, or Computer Use support.

When should I pick Flash 3.5 over a cheaper Flash model?

When your workload involves agentic coding, multi-step tool use, or structured multimodal extraction. The 31-point hallucination reduction and superior agentic benchmark scores justify the higher cost over Gemini 3 Flash for reliability-sensitive production use cases.