Question 1

What does GPT audio mini cost?

Accepted Answer

Input is priced at $0.60 per million tokens and output at $2.40 per million tokens. Note that audio tokens carry roughly a 6.4x premium over standard text tokens in OpenAI's pricing model, so factor that in for high-volume audio workloads.

Question 2

What is the context window?

Accepted Answer

128,000 tokens with a maximum of 16,384 output tokens per request.

Question 3

What modalities does it support?

Accepted Answer

It accepts audio and text as inputs and produces both audio and text as outputs. It does not support image, video, or structured output formats, and does not offer fine-tuning.

Question 4

Is it suitable for real-time conversations?

Accepted Answer

It is optimized for low-latency back-and-forth voice exchanges and works with the Chat Completions and Responses APIs, but it does not support full-duplex audio (simultaneous listening and speaking) and cannot be used with the Realtime API streaming endpoint.

Question 5

What is the knowledge cutoff?

Accepted Answer

October 1, 2023 — meaning it has no awareness of events or developments after that date. This limits its usefulness for tasks requiring current information.

Question 6

How does it compare to the full GPT Audio model?

Accepted Answer

GPT audio mini is the cost-efficient variant, trading some capability for significantly lower pricing. It targets scaled voice applications and routine transcription rather than complex or nuanced audio reasoning tasks.

Provider	OpenAI
Released	2025-12
Context window	128,000 tokens
Max output	16,384 tokens
Knowledge cutoff	October 1, 2023
Input price	$0.60 / 1M tokens
Output price	$2.40 / 1M tokens
Request cost	3 base requests
Plan tier	Base
Model ID	gpt-audio-mini

GPT audio mini

About GPT audio mini

Best for

Specifications

Frequently asked questions

Related models