Model page

GPT-5 nano

Ultra light-touch assistant for simple interactions.

About GPT-5 nano

When the goal is to process millions of tokens at the lowest possible cost, GPT-5 nano delivers. At $0.05 per million input tokens — five times cheaper than GPT-5 mini — it is purpose-built for high-volume, latency-tolerant workloads: classifying support tickets, extracting structured fields from documents, routing messages, and running bulk summarization across 400K-token contexts without truncation. Developers running these pipelines appreciate the throughput — 163.8 tokens per second, above average for this tier — and the 80% prompt-caching discount that makes repeated-context jobs even cheaper. The tradeoff is real: time-to-first-token sits around 84 seconds, far above the median of roughly one second, so anything interactive is a poor fit. The model also tends toward verbosity, generating far more tokens than similarly-sized peers. If your workload is batch-oriented and the unit cost per operation matters more than conversational snappiness, nano earns its place. For tasks requiring nuanced reasoning or real-time responses, step up to GPT-5 mini.

Best for

  • High-volume document classification and content labeling at scale
  • Multi-document summarization using its full 400K-token context window
  • Structured data extraction from invoices, contracts, and unstructured text
  • Automated content routing and support-ticket triage
  • Bulk translation and text transformation pipelines where cost per token is the primary constraint

Specs & capabilities

How GPT-5 nano stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Low

Capability

Speed

Fast

Capability

Context window

400,000 tokens

Capability

Max output

128,000 tokens

Capability

Knowledge cutoff

May 30, 2024

Frequently asked questions

How much does GPT-5 nano cost?

Input is $0.05 per million tokens and output is $0.40 per million tokens. Cached input hits receive an 80% discount, dropping to $0.01 per million tokens — a significant saving for workflows that reuse long context repeatedly.

What is the context window?

GPT-5 nano supports up to 400,000 input tokens and can generate up to 128,000 output tokens, making it well suited for processing full documents or long conversation histories in a single call.

Is GPT-5 nano good for real-time chat applications?

No. Its time-to-first-token averages around 84 seconds, which is far above the median of about one second for comparable models. It is better suited to batch and asynchronous workloads than interactive, low-latency use cases.

How does it compare to GPT-5 mini?

GPT-5 nano is roughly 5x cheaper on blended pricing and faster in raw token throughput, but trails GPT-5 mini by around 10 percentage points on structured tasks and is meaningfully weaker at multi-step reasoning. Choose nano for bulk pipelines where cost dominates; choose mini when accuracy or reasoning depth matters more.

What is GPT-5 nano's knowledge cutoff?

Training data ends May 30, 2024. Events and developments from mid-2024 onward are outside its knowledge, which is worth considering for tasks that depend on current information.

Can I use GPT-5 nano in the ChatGPT web interface?

No. Unlike GPT-5 mini, nano is only accessible via the OpenAI API. It is not available in the ChatGPT consumer interface.

Related models