Question 1

How much does GPT-5 mini cost?

Accepted Answer

Input is $0.25 per million tokens ($0.025 with cached input — an 88% discount) and output is $2.00 per million tokens. At a 7:2:1 cache-hit ratio the blended rate works out to roughly $0.27 per million tokens.

Question 2

What is the context window?

Accepted Answer

400,000 tokens, roughly equivalent to 800 pages of text. Maximum output per response is 128,000 tokens.

Question 3

What is GPT-5 mini good at?

Accepted Answer

High-throughput batch work, production chat applications, content moderation, subagent task execution, and code generation — scenarios where cost efficiency and reliability matter more than maximum reasoning depth.

Question 4

What are the main limitations?

Accepted Answer

Time-to-first-token is around 75 seconds, making it unsuitable for interactive real-time endpoints. The knowledge cutoff is May 2024, and fine-tuning is not supported. Output pricing at $2.00 per million tokens is also higher than some mid-tier alternatives.

Question 5

How does it compare to GPT-5.4 mini?

Accepted Answer

GPT-5.4 mini (released March 2026) is approximately 2x faster and performs better on several coding and agent benchmarks. GPT-5 mini is roughly 2.3x cheaper, so it remains competitive for cache-heavy or batch workloads where latency is not critical.

Question 6

Should I still choose GPT-5 mini for a new project?

Accepted Answer

For interactive or latency-sensitive applications, GPT-5.4 mini is generally the better starting point. GPT-5 mini is still a practical choice for high-volume, non-real-time workflows where its lower cost per token outweighs the speed difference.

GPT-5 mini

About GPT-5 mini

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Frequently asked questions

Related models