Model page

o3

Reasoning-focused o-series model optimized for long horizon tasks.

About o3

Where previous reasoning models traded speed for depth, o3 arrives 87% cheaper than its predecessor o1 while posting benchmark scores that made the research community take notice — 96.7% on AIME 2024, 95.5% on SWE-Bench Verified, and a new record on Frontier Math. It earns those numbers by genuinely deliberating before it answers: hidden "thinking" steps let it decompose multi-stage problems in STEM, law, finance, and software engineering that shorter-context models trip over. People who work with o3 daily point to its software engineering instincts as the clearest differentiator — catching architectural issues and writing production-quality code that feels a step ahead. The honest caveat: o3 occasionally fabricates facts and inserts confident-sounding false details, so outputs in factual domains need verification. And because reasoning tokens are billed as output, real-world costs run 3–10x higher than the per-token list price implies. If you need deliberate, step-by-step problem-solving and can work with those tradeoffs, o3 sits in a strong position.

Best for

  • Complex software engineering — code generation, debugging, and architecture decisions that benefit from multi-step reasoning
  • Advanced STEM problem-solving across mathematics, physics, and chemistry, including hypothesis generation for research
  • Multi-step business analysis — financial modeling, legal reasoning, and strategic planning that requires chaining many logical steps
  • Large document analysis — with a 200K-token context window, it handles lengthy codebases, contracts, or research corpora in a single pass
  • Multimodal technical tasks combining code execution, file analysis, and image interpretation via tool use

Specs & capabilities

How o3 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Medium

Capability

Speed

Medium

Capability

Context window

200,000 tokens

Capability

Max output

100,000 tokens

Capability

Knowledge cutoff

Approximately June 2024

Frequently asked questions

What does o3 cost?

List price is $2.00 per million input tokens and $8.00 per million output tokens. However, hidden reasoning tokens are billed as output, so effective cost per task is typically 3–10x higher than those rates suggest. Prompt caching can reduce input costs by 60–80%.

How large is the context window?

200,000 tokens input, with up to 100,000 tokens of output.

What is o3 genuinely best at?

Step-by-step reasoning in STEM and software engineering. It scored 95.5% on SWE-Bench Verified and 96.7% on AIME 2024, and users consistently highlight its code quality and ability to handle multi-stage technical problems.

Does o3 hallucinate?

Yes — users report that o3 sometimes invents false facts and inserts fabricated details with apparent confidence. Outputs in factual or research contexts should be verified. The o3-pro variant shows even higher hallucination rates.

How does o3 compare to o3-mini?

o3 is the full-size model with higher benchmark scores and a 200K context window. o3-mini is a distilled, lower-cost variant with reduced latency but lower overall capability. Choose o3-mini when speed and cost matter more than peak reasoning depth.

Who should skip o3?

Users who need fast responses or up-to-the-minute information should look elsewhere. o3's knowledge cuts off around June 2024, and its reasoning latency can be significant — the pro variant sometimes takes up to 15 minutes to respond.

Related models