Model page

GPT-4o · Aug. 2024

August 2024 checkpoint of gpt-4o with enhanced capabilities.

About GPT-4o · Aug. 2024

The August 2024 checkpoint of GPT-4o is the version that made structured outputs — native JSON Schema enforcement — a first-class API feature, making it the go-to choice for developers building typed, reliable integrations rather than parsing freeform text. Running at roughly 198 tokens per second with a 16,400-token output ceiling, it handles long-form generation tasks that would have choked GPT-4 Turbo, at half the price. Users consistently single out the combination of speed, output length, and multimodal breadth as genuinely practical rather than merely impressive: processing text, images, audio, and video through a single model simplifies pipelines considerably. The honest caveat is that output consistency isn't ironclad — repeating the same task can yield meaningfully different results — so workflows that demand deterministic precision may need additional scaffolding. For cost-sensitive production systems or any application that benefits from structured data extraction, this checkpoint remains a well-calibrated middle ground between economy and capability.

Best for

  • Structured data extraction and typed API responses using native JSON Schema support
  • Long-form content generation taking advantage of the 16,400-token output limit
  • Multimodal pipelines that process images, audio, or video alongside text in a single call
  • High-volume production workloads where the 50% Batch API discount makes scale economics work
  • Multilingual applications and global products requiring strong non-English reasoning

Specs & capabilities

How GPT-4o · Aug. 2024 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Low

Capability

Speed

Fast

Capability

Context window

128,000 tokens

Capability

Max output

16,400 tokens

Capability

Knowledge cutoff

October 2023

ChatGPT

GPT‑4o retirement (ChatGPT)

OpenAI retired GPT‑4o inside ChatGPT on February 13, 2026. It remains available through the OpenAI API.

Frequently asked questions

What makes the 2024-08-06 checkpoint different from other GPT-4o versions?

This specific checkpoint introduced native structured outputs (JSON Schema enforcement) and was fine-tuned for API reliability, including improved function calling and instruction following. It also became the first GPT-4o checkpoint to support full fine-tuning.

What does it cost?

$2.50 per million input tokens and $10.00 per million output tokens. Non-urgent workloads can use the Batch API for a 50% discount: $1.25 input / $5.00 output per million tokens.

How large is the context window?

128,000 tokens input context with a maximum of 16,400 output tokens per response — significantly higher than the 4,000-token output cap on GPT-4 Turbo.

What are its weaknesses?

Output consistency can vary when the same prompt is run multiple times, which matters for deterministic workflows. For extremely complex analytical reasoning, users have noted it is a step below GPT-4 Turbo in nuanced depth.

Is it slower than earlier GPT-4o releases?

Yes — community benchmarks found this checkpoint 50–80% slower than the May 2024 original release, a trade-off from fine-tuning for structured outputs and API reliability rather than raw throughput.

Who should choose this model over a cheaper option like GPT-4o mini?

Teams that need GPT-4-level reasoning, multimodal inputs, or native structured outputs at scale. GPT-4o mini is faster and cheaper for simple text tasks, but this checkpoint is the better fit when output quality, format reliability, or long document generation is the priority.

Related models