Model page

GPT-5.1 codex

GPT-5.1 optimized for agentic coding in Codex. 400K context, 128K max output.

About GPT-5.1 codex

Built specifically for software engineering work rather than general conversation, GPT-5.1 Codex earns its place through benchmark results that hold up in practice: 77.9% on SWE-bench Verified puts it at the top of single-model performance for real-world coding tasks, and users consistently point to its code review and bug-detection instincts as genuinely useful rather than superficially impressive. Where it stands apart from GPT-5 is token efficiency — 93.7% fewer tokens on lightweight interactions — and its ability to reason across multi-file projects, tracking dependencies through refactors and framework migrations without losing context. People also appreciate how literally it follows instructions, delivering exactly what's asked without embellishment. The honest caveat: it still hallucinates APIs and file paths, so generated code needs validation before it ships. For teams running agentic workflows, it also expects interactive steering mid-run rather than operating autonomously. Standard pricing at $1.25 per million input tokens makes it a solid mid-tier option for serious engineering use, with a Mini variant available for routine tasks.

Best for

  • Code review and bug detection across large, multi-file codebases
  • Refactoring projects and framework migrations where cross-file context matters
  • Production debugging with visual inspection support (screenshots, UI analysis)
  • Agentic coding workflows via the OpenAI Responses API with interactive steering
  • Cost-sensitive routine coding tasks using the GPT-5.1-Codex-Mini variant

Specs & capabilities

How GPT-5.1 codex stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Medium

Capability

Speed

Fast

Capability

Context window

272,000 tokens

Capability

Max output

128,000 tokens

Capability

Knowledge cutoff

September 30, 2024

Frequently asked questions

How much does GPT-5.1 Codex cost?

Standard pricing is $1.25 per million input tokens and $10.00 per million output tokens. The Mini variant is significantly cheaper at $0.25 input / $2.00 output per million tokens, making it practical for high-volume or routine tasks.

What is the context window?

The base GPT-5.1 Codex supports a 400K token context window, with a maximum output of 128K tokens. The Mini variant has a smaller 272K context window.

What does it actually excel at?

Real-world coding tasks — particularly code review, bug detection, multi-file refactoring, and framework migrations. It scored 77.9% on SWE-bench Verified, which measures performance on genuine software engineering problems.

What are its known weaknesses?

It can hallucinate API names, file paths, and test coverage, so outputs should be validated. It also works best with interactive human steering during agentic runs and is strongest in mainstream languages like Python, JavaScript, TypeScript, and Java.

How does it compare to GPT-5.1 (the general model)?

GPT-5.1 Codex is purpose-built for coding workflows and is dramatically more token-efficient — using 93.7% fewer tokens than GPT-5 on lightweight interactions. It trades breadth for depth in software engineering tasks.

Is this model still current?

GPT-5.1 Codex was released November 13, 2025 and has since been succeeded by GPT-5.3-Codex (February 2026) and GPT-5.5 Codex variants. It remains available but is no longer the latest in the Codex line.

Related models