Gemini 3.1 Pro Preview
Next iteration of Gemini 3 Pro: performance, behavior, and intelligence improvements. 1M/64k context. Agentic workflows, autonomous coding, complex multimodal. Jan 2025.
About Gemini 3.1 Pro Preview
At the top of the Artificial Analysis Intelligence Index — ahead of every other model evaluated — Gemini 3.1 Pro Preview earns its ranking not just on raw ability but on the economics of getting there. It ran a full benchmark suite at less than half the cost of comparable frontier models, which makes it the clearest answer to the question of whether top-tier intelligence requires top-tier spend. A 38-percentage-point drop in hallucination rate over its predecessor and a 94.3% GPQA Diamond score in graduate-level scientific reasoning make it a serious tool for complex research, deep software engineering, and agentic workflows that need to get things right. Its 1-million-token context window handles entire codebases or lengthy document sets without batching. The honest caveat: time-to-first-token averages nearly 25 seconds, well above the median for comparable models, and some developers report extended waits or reliability issues under high API load. If your work is iterative and deep rather than real-time and conversational, the trade-off is usually worth it.
Best for
- Complex software engineering — code generation, debugging, and whole-repository analysis backed by an 80.6% SWE-bench Verified score
- Graduate-level scientific and technical reasoning in physics, chemistry, and biology
- Agentic and multi-step workflows where reliable tool use and sustained reasoning matter
- Large document and codebase analysis using the full 1-million-token context in a single request
- Cost-sensitive enterprise deployments that need frontier-class accuracy without frontier-class pricing
Specs & capabilities
How Gemini 3.1 Pro Preview stacks up — intelligence, speed, context, and modalities.
Intelligence
High
Speed
Medium
Context window
1,048,576 tokens
Max output
65,536 tokens
Knowledge cutoff
January 2025
Frequently asked questions
What does it cost?
Standard pricing is $2.00 per million input tokens and $12.00 per million output tokens for requests up to 200K tokens. Long-context requests (over 200K tokens) cost $4.00 input and $18.00 output per million tokens. With context caching, the effective blended cost can drop to around $1.74 per million tokens.
How large is the context window?
1,048,576 tokens — roughly one million tokens. You can fit an entire large codebase or a book-length document set in a single request.
How fast does it respond?
Output speed is around 132 tokens per second once it starts, but time-to-first-token averages 24.84 seconds — significantly slower than most comparable models. It is not suited for real-time or latency-sensitive applications.
What kinds of tasks is it weakest at?
It cannot generate images or audio — output is text only. It also shows performance degradation in very long iterative sessions, and API reliability can be inconsistent during high-demand periods.
How does it compare to Gemini 3 Pro?
Gemini 3.1 Pro Preview reduced its hallucination rate by 38 percentage points compared to Gemini 3 Pro and improved significantly on benchmarks including GPQA Diamond and SWE-bench Verified.
Is this a stable production model?
It was released as a public preview in February 2026. The API may change, and Google may deprecate this endpoint when a stable version ships. A variant endpoint (gemini-3.1-pro-preview-customtools) exists for custom tool-heavy workflows.