Model page

GPT-4.1

GPT-4 refinement designed for coding with broad tool compatibility.

About GPT-4.1

GPT-4.1 was built with developers in mind, and it shows. Where GPT-4o struggled with partial code edits — hitting 18.3% accuracy — GPT-4.1 reaches 52.9%, and its SWE-Bench Verified score of 54.6% represents a 21-point jump over its predecessor. Paired with a 1 million token context window, it can hold entire codebases in a single conversation without losing the thread across files or modules. Developers report needing fewer re-prompts, less manual correction, and meaningfully tighter instruction-following in real agentic workflows. The speed story is also real: output runs at roughly 130 tokens per second — more than double the field median — with latency under 0.8 seconds. At $2 input / $8 output per million tokens, it lands noticeably cheaper than previous-gen equivalents. The honest caveat: hallucinations remain a documented limitation, and like prior GPT models it can fabricate sources. Also worth knowing — the full 1M context is API-only; the ChatGPT web interface caps at 32K tokens. For teams building code agents or processing large technical documents, GPT-4.1 is a meaningful step up in both capability and economics.

Best for

  • Software engineering agents — resolving GitHub issues, refactoring, and generating optimized code across multiple languages
  • Long-context document analysis — processing full codebases, lengthy specs, or multi-document research without chunking
  • Rapid prototyping — cost-effective option for developers adding AI-powered features to their products
  • Enterprise automation workflows — autonomous task execution in operations and engineering pipelines
  • Multilingual coding — debugging and generating code across programming languages with strong instruction adherence

Specs & capabilities

How GPT-4.1 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Low

Capability

Speed

Medium

Capability

Context window

1,000,000 tokens

Capability

Max output

32,768 tokens

Capability

Knowledge cutoff

May 2024

Frequently asked questions

What does GPT-4.1 cost?

$2.00 per million input tokens and $8.00 per million output tokens via the OpenAI API.

How large is the context window?

1 million tokens (1,047,576) via the API. Note: the ChatGPT web interface is capped at 32,000 tokens, so the full window is only available programmatically.

How does it compare to GPT-4o for coding?

It scores 54.6% on SWE-Bench Verified versus GPT-4o's 33.2% — a 21-percentage-point improvement — and hits 52.9% accuracy on partial code edits compared to GPT-4o's 18.3%.

What are its main limitations?

Hallucinations and source fabrication remain documented issues. Some users also report inconsistent latency under heavy workloads.

Who should choose GPT-4.1 over GPT-4.1 mini?

Teams that need maximum coding accuracy and long-context reasoning at scale. GPT-4.1 mini offers a balanced trade-off for lighter workloads at lower cost.

What is the knowledge cutoff?

Approximately May 2024, with some sources indicating June 2024 for certain variants in the family.

Related models