Model page

GPT-5.1 latest

GPT-5.1 model used in ChatGPT. Continuously updated for the latest chat improvements.

About GPT-5.1 latest

Where GPT-5.1 earns its keep is efficiency: it dynamically allocates compute based on query complexity, running simple tasks 2-3x faster than its predecessor by skipping unnecessary reasoning steps. The result is a conversational model with a noticeably warmer personality than GPT-5 and markedly improved instruction-following — users consistently report it feels more natural to talk to, and it handles document-heavy work like extraction, comparison, and code review with real precision. Extended prompt caching (up to 24 hours at a 90% discount) makes it cost-effective for workflows with repeated context. Benchmarks are solid but not top-of-class: AIME 2025 at 94% and a strong SWE-bench position, though it trails GPT-5 slightly on raw reasoning scores. One genuine frustration worth knowing: reasoning is disabled by default and must be manually enabled, and the model takes instructions literally — vague prompts produce literal outputs without self-correction. At $1.25 per million input tokens, it sits at a reasonable price point for teams that need reliable, fast, multimodal chat at scale.

Best for

  • Document analysis workflows — extraction, summarization, comparison, and risk flagging across large document sets
  • Code review and software engineering tasks, where its SWE-bench performance and instruction-following translate to practical output
  • High-volume conversational applications where the 2-3x speed boost on simple queries keeps latency low
  • Cost-sensitive API integrations that benefit from 24-hour prompt caching at a 90% input discount
  • Image-grounded tasks such as document processing, equipment inspection, or visual content analysis

Specs & capabilities

How GPT-5.1 latest stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Medium

Capability

Context window

128,000 tokens

Capability

Max output

16,384 tokens

Capability

Knowledge cutoff

September 30, 2024

Frequently asked questions

What does 'gpt-5.1-chat-latest' refer to?

It is the chat-optimized alias for GPT-5.1, a family OpenAI released in two waves in November 2025. The alias always points to the current chat variant; a separate Codex variant (gpt-5.1-codex) exists with a larger 400,000-token context window.

What is the context window and output limit?

The chat variant supports a 128,000-token context window with a maximum output of 16,384 tokens per response.

How is it priced?

Input is $1.25 per million tokens; output is $10.00 per million tokens. Cached input drops to $0.125 per million tokens — a 90% discount — with cache retention up to 24 hours.

Is GPT-5.1 a major capability upgrade over GPT-5?

Not in terms of raw capability. OpenAI characterizes it as an efficiency improvement: faster adaptive reasoning, better instruction-following, and a warmer conversational tone rather than a benchmark leap. GPT-5's AIME score (94.6%) actually edges out GPT-5.1's 94%.

What is the knowledge cutoff?

September 30, 2024 — roughly ten months before the model's November 2025 release, which means recent events may not be reflected in its responses.

Who should choose GPT-5.1 over other models?

Teams building document-heavy or conversational products that prioritize speed, cost efficiency via caching, and reliable instruction-following. If raw reasoning depth is the priority, enabling the reasoning_effort parameter or considering the Thinking variant is advisable, since reasoning is off by default in the chat build.

Related models