Model page

Gemini 3 Flash Preview

Preview of Gemini 3 Flash. 1M/64k context. Jan 2025.

About Gemini 3 Flash Preview

Google's Gemini 3 Flash Preview is the rare case where "cheaper and faster" also means "better at the hard stuff." It scores 78% on SWE-bench Verified — outperforming even Gemini 3 Pro on software engineering — while costing 4–6x less and generating responses at roughly 218 tokens per second. Users on Hacker News called the experience "truly stunning," and the 99.7% AIME 2025 score backs up the enthusiasm for math-heavy and agentic work. A 1M-token context window means it can hold an entire codebase or lengthy document in a single session. The catch is real and worth knowing: a 91% hallucination rate on knowledge benchmarks means the model confidently invents answers rather than admitting uncertainty. That disqualifies it for fact-sensitive domains like medical, legal, or research tooling — but for coding, prototyping, creative work, and high-volume multimodal pipelines where raw speed and price-performance matter, it's a strong default choice. Still in preview, so expect continued evolution.

Best for

  • Code generation and software engineering — 78% on SWE-bench Verified, beating Gemini 3 Pro on real-world coding tasks
  • Mathematical reasoning and problem-solving — 99.7% on AIME 2025 makes it reliable for complex quantitative work
  • Long document and codebase analysis — 1M-token context handles entire books, repos, or large datasets in one pass
  • High-volume multimodal pipelines — native text, image, audio, and video input at low cost per token
  • Rapid prototyping and creative content — speed and affordability make it ideal for iterative drafting and brainstorming where perfect factual accuracy is not required

Specs & capabilities

How Gemini 3 Flash Preview stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Fast

Capability

Context window

1,048,576 tokens

Capability

Max output

65,536 tokens

Capability

Knowledge cutoff

January 2025

Frequently asked questions

What does Gemini 3 Flash Preview cost?

Standard pay-as-you-go pricing is $0.50 per 1M input tokens (text/image/video) and $3.00 per 1M output tokens. Cached input drops to $0.05 per 1M tokens — a 90% discount. Batch processing cuts costs roughly in half.

How large is the context window?

1,048,576 tokens (approximately 1 million tokens), with a maximum output of 65,536 tokens per response.

What is the hallucination problem people mention?

On knowledge-heavy benchmarks, the model has a reported 91% hallucination rate — it tends to generate confident-sounding but incorrect answers rather than flagging uncertainty. It is not recommended for medical, legal, or research applications that require factual reliability.

How does it compare to Gemini 3 Pro?

Gemini 3 Flash Preview is 4–6x cheaper and roughly 3x faster than Gemini 3 Pro, and it actually outperforms Pro on SWE-bench Verified (78% vs. lower Pro scores). For most coding and reasoning tasks, Flash is the better value.

Is this model ready for production use?

It is a preview release, so features and stability may change. That said, Google reports it is already processing over 1 trillion tokens per day on the API, indicating substantial real-world deployment.

What modalities does it support?

It accepts text, images, audio, and video as input and produces text and image output. Knowledge cutoff is January 2025.