GPT-5 mini
Responsive, budget-friendly member of the GPT-5 family.
About GPT-5 mini
GPT-5 mini occupies a deliberate middle position in OpenAI's August 2025 lineup: more capable than GPT-4o mini, far cheaper than full GPT-5, and purpose-built for high-volume workloads where every API dollar is accounted for. At $0.25 per million input tokens — with an 88% discount on cached input — it brings GPT-5-class instruction-following and multimodal reasoning to applications that need to run at scale without budget blowout. A 400,000-token context window and adjustable reasoning effort (minimal through extensive) mean you can tune cost versus depth per request. Developers building production chatbots, content moderation pipelines, and subagent backends consistently praise its throughput of over 100 tokens per second and its reliable safety tuning. That said, its time-to-first-token of nearly 75 seconds — far above the typical reasoning-model median — makes it a poor fit for interactive, real-time endpoints. The knowledge cutoff of May 2024 is also showing its age. And with GPT-5.4 mini now available at roughly 2x the speed, teams starting fresh should weigh whether that generation better fits their needs.
Best for
- High-volume API workloads: customer support, content moderation, and summarization pipelines where cost per request matters more than peak capability
- Production chatbots and conversational interfaces that need predictable costs and GPT-5-class instruction-following at scale
- Subagent and tool-use workflows where a reliable, well-priced backend model handles well-defined tasks and hands off results
- Code generation and debugging assistants that benefit from multimodal input and structured output support without the cost of full GPT-5
- Cost-sensitive batch processing with cache-friendly workloads that exploit the 88% cached-input discount
Specs & capabilities
How GPT-5 mini stacks up — intelligence, speed, context, and modalities.
Intelligence
Medium
Speed
Medium
Context window
400,000 tokens
Max output
128,000 tokens
Knowledge cutoff
May 31, 2024
Frequently asked questions
How much does GPT-5 mini cost?
Input is $0.25 per million tokens ($0.025 with cached input — an 88% discount) and output is $2.00 per million tokens. At a 7:2:1 cache-hit ratio the blended rate works out to roughly $0.27 per million tokens.
What is the context window?
400,000 tokens, roughly equivalent to 800 pages of text. Maximum output per response is 128,000 tokens.
What is GPT-5 mini good at?
High-throughput batch work, production chat applications, content moderation, subagent task execution, and code generation — scenarios where cost efficiency and reliability matter more than maximum reasoning depth.
What are the main limitations?
Time-to-first-token is around 75 seconds, making it unsuitable for interactive real-time endpoints. The knowledge cutoff is May 2024, and fine-tuning is not supported. Output pricing at $2.00 per million tokens is also higher than some mid-tier alternatives.
How does it compare to GPT-5.4 mini?
GPT-5.4 mini (released March 2026) is approximately 2x faster and performs better on several coding and agent benchmarks. GPT-5 mini is roughly 2.3x cheaper, so it remains competitive for cache-heavy or batch workloads where latency is not critical.
Should I still choose GPT-5 mini for a new project?
For interactive or latency-sensitive applications, GPT-5.4 mini is generally the better starting point. GPT-5 mini is still a practical choice for high-volume, non-real-time workflows where its lower cost per token outweighs the speed difference.