Question 1

What does it cost?

Accepted Answer

Standard pricing is $2.00 per million input tokens and $12.00 per million output tokens for requests up to 200K tokens. Long-context requests (over 200K tokens) cost $4.00 input and $18.00 output per million tokens. With context caching, the effective blended cost can drop to around $1.74 per million tokens.

Question 2

How large is the context window?

Accepted Answer

1,048,576 tokens — roughly one million tokens. You can fit an entire large codebase or a book-length document set in a single request.

Question 3

How fast does it respond?

Accepted Answer

Output speed is around 132 tokens per second once it starts, but time-to-first-token averages 24.84 seconds — significantly slower than most comparable models. It is not suited for real-time or latency-sensitive applications.

Question 4

What kinds of tasks is it weakest at?

Accepted Answer

It cannot generate images or audio — output is text only. It also shows performance degradation in very long iterative sessions, and API reliability can be inconsistent during high-demand periods.

Question 5

How does it compare to Gemini 3 Pro?

Accepted Answer

Gemini 3.1 Pro Preview reduced its hallucination rate by 38 percentage points compared to Gemini 3 Pro and improved significantly on benchmarks including GPQA Diamond and SWE-bench Verified.

Question 6

Is this a stable production model?

Accepted Answer

It was released as a public preview in February 2026. The API may change, and Google may deprecate this endpoint when a stable version ships. A variant endpoint (gemini-3.1-pro-preview-customtools) exists for custom tool-heavy workflows.

Gemini 3.1 Pro Preview

About Gemini 3.1 Pro Preview

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Frequently asked questions

Related models