Model page

GPT-5 codex

Enhanced code reasoning while staying conversation-friendly.

About GPT-5 codex

Built from the ground up for engineering workflows, GPT-5 Codex earns its place in serious developer toolchains with a 74.5% success rate on SWE-Bench Verified — a benchmark of real-world coding issues, not synthetic puzzles. At 152.4 tokens per second it moves fast enough for tight iteration cycles, and its 400K context window keeps large codebases fully in view without chunking. Developers praise it as a sharp collaborator for agentic coding: long-running refactors, test generation, security-focused code review, and multi-language debugging all land well. The training specialization shows in vulnerability detection, where it was explicitly optimized to find and flag critical flaws. That said, first-response latency sits at nearly 13 seconds — noticeably slow before the stream starts — and output costs at $10 per million tokens feel steep relative to comparable reasoning models. For deep, multi-phase contextual reasoning across massive codebases, alternatives may edge it out. But for fast, focused engineering work where security and code quality matter, it is a natural fit.

Best for

  • Security-focused code review and vulnerability detection in production codebases
  • Agentic engineering sessions: long-running refactors, feature scaffolding, and tool-using workflows
  • Automated test generation across multiple programming languages
  • Rapid bug isolation and targeted debugging with fast token throughput
  • Documentation generation and structured knowledge work alongside coding tasks

Specs & capabilities

How GPT-5 codex stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Fast

Capability

Context window

400,000 tokens

Capability

Max output

128,000 tokens

Capability

Knowledge cutoff

September 2024

Frequently asked questions

What does GPT-5 Codex cost?

Input tokens are priced at $1.25 per million and output tokens at $10.00 per million — affordable on the input side, but output costs are on the higher end for this tier of reasoning model.

How large is the context window?

400,000 tokens. That is large enough to hold a substantial codebase in a single session, though newer Codex variants and some alternatives offer up to 1M tokens.

Is this the most current Codex model?

No. OpenAI has since released GPT-5.1-Codex, GPT-5.2-Codex, and GPT-5.3-Codex with iterative improvements. Artificial Analysis recommends the newer variants for most use cases.

What is the main drawback compared to other reasoning models?

Time to first token averages nearly 13 seconds, which is high relative to other models in the same price bracket. Verbosity in outputs has also been noted as a minor friction point.

How does it compare to Claude Opus for coding tasks?

GPT-5 Codex is faster and uses significantly fewer output tokens (roughly 72% fewer than Opus on similar tasks), making it more economical at volume. Opus tends to outperform it on deeply contextual, multi-phase reasoning tasks that span very long chains of logic.

What modalities does it support?

It accepts text and image input and produces text output. It also supports function calling, structured output, and a reasoning mode.

Related models