GPT audio mini · Dec. 2025
Pinned December 2025 snapshot of GPT Audio mini for stable behavior.
You always get the exact model you pick — we never silently route you to another.
About GPT audio mini · Dec. 2025
GPT audio mini is OpenAI's budget-tier native audio model — the first in the mini class to handle both audio input and output directly, without stitching together separate speech-to-text and text-to-speech pipelines. That integrated approach matters: the December 2025 snapshot cuts transcription hallucinations by 89% compared to Whisper and shaves 35% off word error rates on Common Voice and FLEURS benchmarks, making it a credible workhorse for batch audio processing, voiceover generation, and spoken summarization at $0.60 per million input tokens. Developers building scaled async workflows consistently point to the cost-efficiency and improved instruction following (up 22% in the December release) as the main draws. The honest caveat: production deployments have hit real HTTP 500 errors, and the same December snapshot that improved accuracy also introduced regressions in tone and style adherence that have forced some teams to revise their prompts. It is an affordable, capable audio model that rewards careful error handling — not a drop-in solution for latency-sensitive or mission-critical voice apps.
Best for
- Batch audio summarization and transcription where real-time response is not required
- Voiceover and podcast post-production pipelines that benefit from native audio-in/audio-out processing
- Cost-sensitive voice applications that need to scale without the expense of the full gpt-audio model
- Multi-language audio processing, with optimized support for Chinese, Japanese, Indonesian, Hindi, Bengali, and Italian
- Asynchronous audio sentiment analysis and spoken summary generation from text or audio sources
Specifications
| Provider | OpenAI |
|---|---|
| Released | 2025-12 |
| Context window | 128,000 tokens |
| Max output | 16,384 tokens |
| Knowledge cutoff | October 1, 2023 |
| Input price | $0.60 / 1M tokens |
| Output price | $2.40 / 1M tokens |
| Request cost | 3 base requests |
| Plan tier | Base |
| Model ID | gpt-audio-mini-2025-12-15 |
Frequently asked questions
$0.60 per million input tokens and $2.40 per million output tokens — roughly 10x cheaper than the full gpt-audio model.
128,000 tokens, with a maximum of 16,384 output tokens per request.
Text and audio for both input and output. It does not support image or video inputs, and does not offer streaming, structured outputs, fine-tuning, or predicted outputs.
No. GPT audio mini targets asynchronous batch workflows. For low-latency, real-time voice applications, OpenAI's gpt-realtime-mini is the more appropriate choice.
October 1, 2023. Applications requiring current information must implement retrieval augmentation — the model cannot reason about events after that date on its own.
The gpt-audio-mini alias automatically points to the 2025-12-15 snapshot, which is the current recommended version. This snapshot delivers the hallucination and word error rate improvements noted above, though it also introduced some tone and style adherence regressions worth testing against your prompts.