GPT-5.4 mini
Higher-capability GPT-5.4 mini for high-volume coding, computer use, and subagent workflows.
About GPT-5.4 mini
Speed and economy are the defining story of GPT-5.4 mini. Running more than twice as fast as its predecessor GPT-5 mini, this March 2026 model scores 54.4% on SWE-Bench Pro — within striking distance of the full GPT-5.4 at 57.7% — while costing roughly six times less. Its 400,000-token context window and strong computer-use performance (72.1% on OSWorld-Verified) make it a credible workhorse for agentic pipelines that need to interpret dense UIs and handle long documents without expensive truncation. Users consistently praise its throughput and its ability to punch well above its price class on coding tasks. The honest trade-off: instruction-following can be inconsistent under sustained load, with the model sometimes ignoring prompt rules or losing thread of constraints mid-conversation. It is also capped at reasoning_effort "high," so workloads requiring deep chain-of-thought should route to GPT-5.4 instead. For high-volume production systems where cost discipline matters and most tasks do not require peak reasoning, GPT-5.4 mini sets the current bar.
Best for
- High-volume coding assistance and terminal agents — 54.4% SWE-Bench Pro at a fraction of flagship cost
- Computer use and UI automation where visual understanding of screenshots and dense interfaces is required
- Production chat systems needing low latency and competitive quality without premium per-token spend
- Long-document analysis and retrieval tasks that benefit from the 400,000-token context window
- Multi-model routing as the default tier, escalating to GPT-5.4 only for reasoning-heavy exceptions
Specs & capabilities
How GPT-5.4 mini stacks up — intelligence, speed, context, and modalities.
Intelligence
High
Speed
Fast
Context window
400,000 tokens
Max output
128,000 tokens
Knowledge cutoff
August 31, 2025
Supported endpoints
v1/chat/completions · v1/responses · v1/realtime · v1/assistants · v1/batch
Input and output
Input: Text, Image
Output: Text
Availability notes
Cached input: $0.075 / 1M tokens · Web search, file search, image generation, code interpreter, hosted shell, apply patch, skills, computer use, MCP, and tool search supported · Fine-tuning not supported; distillation supported
Frequently asked questions
What does GPT-5.4 mini cost?
$0.75 per million input tokens and $4.50 per million output tokens. Cached input drops to $0.075 per million — a 90% discount — making repeated-context workloads significantly cheaper.
How large is the context window?
400,000 tokens input with up to 128,000 tokens of output, suitable for long documents, large codebases, or extended multi-turn sessions.
How does it compare to GPT-5.4?
GPT-5.4 mini scores 54.4% on SWE-Bench Pro versus GPT-5.4's 57.7%, while costing roughly six times less. The main gap is maximum reasoning depth — GPT-5.4 mini caps at reasoning_effort 'high' while GPT-5.4 supports 'pro'.
What is it genuinely not good at?
Deep reasoning tasks with little margin for error, complex spatial reasoning (such as interpreting 3D shapes from 2D patterns), and strict instruction-following under high load — it can inconsistently apply or ignore prompt rules.
Does it support images and tool use?
Yes. It accepts text and image inputs, and supports tool use, function calling, web search, file search, computer use, and extended thinking up to reasoning_effort 'high'.
Who should choose GPT-5.4 mini over GPT-5 mini?
Anyone upgrading from GPT-5 mini will get meaningfully better quality — the Arena ELO gap is 61 points — at similar or lower cost with noticeably faster output speeds (around 180 tokens per second).