Gemini 3 Flash Preview
Preview of Gemini 3 Flash. 1M/64k context. Jan 2025.
About Gemini 3 Flash Preview
Google's Gemini 3 Flash Preview is the rare case where "cheaper and faster" also means "better at the hard stuff." It scores 78% on SWE-bench Verified — outperforming even Gemini 3 Pro on software engineering — while costing 4–6x less and generating responses at roughly 218 tokens per second. Users on Hacker News called the experience "truly stunning," and the 99.7% AIME 2025 score backs up the enthusiasm for math-heavy and agentic work. A 1M-token context window means it can hold an entire codebase or lengthy document in a single session. The catch is real and worth knowing: a 91% hallucination rate on knowledge benchmarks means the model confidently invents answers rather than admitting uncertainty. That disqualifies it for fact-sensitive domains like medical, legal, or research tooling — but for coding, prototyping, creative work, and high-volume multimodal pipelines where raw speed and price-performance matter, it's a strong default choice. Still in preview, so expect continued evolution.
Best for
- Code generation and software engineering — 78% on SWE-bench Verified, beating Gemini 3 Pro on real-world coding tasks
- Mathematical reasoning and problem-solving — 99.7% on AIME 2025 makes it reliable for complex quantitative work
- Long document and codebase analysis — 1M-token context handles entire books, repos, or large datasets in one pass
- High-volume multimodal pipelines — native text, image, audio, and video input at low cost per token
- Rapid prototyping and creative content — speed and affordability make it ideal for iterative drafting and brainstorming where perfect factual accuracy is not required
Specs & capabilities
How Gemini 3 Flash Preview stacks up — intelligence, speed, context, and modalities.
Intelligence
High
Speed
Fast
Context window
1,048,576 tokens
Max output
65,536 tokens
Knowledge cutoff
January 2025
Frequently asked questions
What does Gemini 3 Flash Preview cost?
Standard pay-as-you-go pricing is $0.50 per 1M input tokens (text/image/video) and $3.00 per 1M output tokens. Cached input drops to $0.05 per 1M tokens — a 90% discount. Batch processing cuts costs roughly in half.
How large is the context window?
1,048,576 tokens (approximately 1 million tokens), with a maximum output of 65,536 tokens per response.
What is the hallucination problem people mention?
On knowledge-heavy benchmarks, the model has a reported 91% hallucination rate — it tends to generate confident-sounding but incorrect answers rather than flagging uncertainty. It is not recommended for medical, legal, or research applications that require factual reliability.
How does it compare to Gemini 3 Pro?
Gemini 3 Flash Preview is 4–6x cheaper and roughly 3x faster than Gemini 3 Pro, and it actually outperforms Pro on SWE-bench Verified (78% vs. lower Pro scores). For most coding and reasoning tasks, Flash is the better value.
Is this model ready for production use?
It is a preview release, so features and stability may change. That said, Google reports it is already processing over 1 trillion tokens per day on the API, indicating substantial real-world deployment.
What modalities does it support?
It accepts text, images, audio, and video as input and produces text and image output. Knowledge cutoff is January 2025.