About GPT-4o
Speed is GPT-4o's defining trait. Where comparable models average 61 tokens per second, GPT-4o delivers nearly 200 — and its native audio pipeline hits 320ms response latency, making it the practical choice for voice interfaces and real-time chat. It also collapses text, image, and audio processing into a single unified model rather than routing across separate systems, which produces more coherent multimodal reasoning without the awkward handoffs. Users feel this difference acutely. When OpenAI tried to retire GPT-4o in early 2026, the backlash was fierce enough to reverse the decision — petitions, mass unsubscribe threats, and user surveys suggesting 95% found no adequate replacement. That kind of loyalty comes from how the model feels in practice: snappy, versatile, fluent across 50+ languages, and capable of web search that reasoning-focused models like o1 lack. The honest caveat: GPT-4o trades raw reasoning depth for speed. It scores below average on Artificial Analysis's Intelligence Index and struggles with complex multi-step logic. For hard reasoning or large-document tasks, newer models outclass it. For fast, general-purpose, multimodal work, few match it.
Best for
- Real-time voice assistants and live chat applications where sub-400ms response latency matters
- Multimodal workflows combining text, images, and audio in a single coherent pipeline
- General-purpose tasks — creative writing, translation, summarization, and Q&A across 50+ languages
- Web-enabled research and browsing tasks (unavailable in o1/o3 reasoning models)
- Rapid prototyping and iteration cycles where response speed accelerates the feedback loop
Specs & capabilities
How GPT-4o stacks up — intelligence, speed, context, and modalities.
Intelligence
Low
Speed
Fast
Context window
128,000 tokens
Knowledge cutoff
October 2023
GPT‑4o retirement (ChatGPT)
OpenAI retired GPT‑4o inside ChatGPT on February 13, 2026. It remains available through the OpenAI API.
Frequently asked questions
What does GPT-4o cost?
Input is $2.50 per million tokens, output is $10.00 per million tokens. Cached input drops to $1.25/M. A blended real-world rate works out to roughly $2.46/M tokens.
How large is the context window?
128,000 tokens. That covers long documents and extended conversations, though newer models like GPT-4.1 offer up to 1 million tokens if you need to process very large files.
Is GPT-4o good at coding?
It handles everyday coding tasks well, but it is not OpenAI's strongest model for complex multi-step code and automation workflows — GPT-4.5 or GPT-5 are better choices for those.
Can GPT-4o browse the web?
Yes. Unlike OpenAI's o1 and o3 reasoning models, GPT-4o supports web search, making it suitable for tasks that require current information beyond its October 2023 knowledge cutoff.
How does GPT-4o compare to GPT-4.1?
GPT-4o is significantly faster and cheaper but scores below average on reasoning benchmarks. GPT-4.1 offers a 1M-token context window and stronger performance on complex tasks at a higher cost.
Was GPT-4o really almost retired?
Yes. OpenAI scheduled it for removal in February 2026 but reversed course after widespread user backlash. It remains available in both ChatGPT and the API alongside newer models.