GPT audio mini · Oct. 2025
Pinned October 2025 snapshot of GPT Audio mini for stable behavior.
You always get the exact model you pick — we never silently route you to another.
About GPT audio mini · Oct. 2025
When voice is the interface and per-request cost is the constraint, GPT Audio Mini earns its place. This October 2025 snapshot slashes audio processing costs to $0.60 per million input tokens — a 98% reduction from the standard GPT Audio model at $32 — making production-scale voice applications genuinely viable without compromising on natural-sounding output. The upgraded decoder generates human-like speech, and native speech-to-speech processing eliminates the cascading errors that come from chaining separate transcription and synthesis steps. Developers building customer support hotlines, IVR flows, and high-volume voice bots have found the cost efficiency transformative. Custom voice consistency is notably improved, and acoustic robustness holds up well against background noise. The honest tradeoff: some developers have hit HTTP 500 errors in standard Chat Completion calls despite its Generally Available status, and the knowledge cutoff is frozen at October 2023. It is a delivery mechanism for audio interaction, not a reasoning engine — deep inference tasks belong elsewhere. Note that gpt-audio-mini-2025-12-15 is a newer snapshot with further reliability improvements.
Best for
- Customer support voice hotlines and IVR systems where natural speech quality and low per-call cost both matter
- High-volume AI phone bots and appointment or FAQ voice bots at production scale
- Voice assistants embedded in applications where audio cost is a critical budget factor
- Speech-to-speech pipelines that benefit from native multimodal processing over chained STT and TTS steps
- Audio summarization — converting text content into spoken output at scale
Specifications
| Provider | OpenAI |
|---|---|
| Released | 2025-10 |
| Context window | 128,000 tokens |
| Max output | 16,384 tokens |
| Knowledge cutoff | October 1, 2023 |
| Input price | $0.60 / 1M tokens |
| Output price | $2.40 / 1M tokens |
| Request cost | 3 base requests |
| Plan tier | Base |
| Model ID | gpt-audio-mini-2025-10-06 |
Frequently asked questions
$0.60 per million audio input tokens and $2.40 per million audio output tokens at OpenAI list pricing — roughly 98% cheaper than the standard GPT Audio model.
128,000 tokens, with a maximum of 16,384 output tokens per response.
It accepts both audio and text inputs and can generate both audio and text outputs. It works through the Chat Completions, Responses, and Realtime APIs.
Streaming, structured outputs, and fine-tuning are not available. It is also not designed for complex multi-step reasoning — that belongs to general-purpose models.
They are different models. gpt-4o-mini is a text-focused model; gpt-audio-mini is optimized specifically for audio input and output and should not be used as a text-only model.
No. gpt-audio-mini-2025-10-06 is an October 2025 snapshot. A newer snapshot, gpt-audio-mini-2025-12-15, was released in December 2025 with improved voice quality and reliability fixes.