Model page

Grok-4.20 Reasoning

xAI's flagship Grok 4.20 reasoning model with a 2M-token context window, stronger multi-step reasoning, and native tool support.

About Grok-4.20 Reasoning

Grok-4.20 Reasoning is built on xAI's experimental multi-agent architecture — four AI instances that debate, specialize, and synthesize results before returning an answer — making it unusual among reasoning models in how it arrives at conclusions. Users consistently praise its precision on complex logic, multi-step math, and scientific problems, and its factual accuracy stands out: it scored 78% on the AA-Omniscience benchmark, a record among tested models at launch. It also defies the usual reasoning-model speed penalty, delivering 170–197 tokens per second against an industry median near 62. The 2 million token context window lets you feed entire codebases or multi-year document archives in a single request. The honest caveat: reasoning tokens accumulate quickly with no user-side control over thinking depth, so costs rise fast on simpler queries. Responses also run verbose by default — expect more words than you'd get from Claude or ChatGPT on the same prompt. At $1.25 per million input tokens, it sits at a competitive price for a frontier reasoning model, though the output cost of $2.50/M climbs when extended thinking kicks in.

Best for

  • Graduate-level math and scientific reasoning — multi-step proofs, physics problems, and chemical simulations where precision matters more than brevity
  • Large-codebase analysis — loading entire repositories into the 2M token context window for architectural review, debugging, or cross-file refactoring
  • Legal and technical document review — processing complete contracts or compliance specifications in a single pass without chunking
  • Multi-agent agentic workflows — function calling and structured outputs for orchestrating complex, multi-step task pipelines
  • Research synthesis — analyzing long papers, policy documents, or datasets with full context retention across millions of tokens

Specs & capabilities

How Grok-4.20 Reasoning stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Fast

Capability

Context window

2,000,000 tokens

Capability

Max output

2,000,000 tokens

Capability

Knowledge cutoff

Not disclosed

API

Supported endpoints

v1/chat/completions · v1/responses · v1/batch

Modalities

Input and output

Input: Text, Image
Output: Text

Features

Availability notes

10M TPM · 1,800 RPM · Cached input: $0.20 / 1M tokens · Higher context pricing applies above 200K context · Structured outputs, web search, X search, function calling, and code execution supported

Frequently asked questions

What does the reasoning variant do differently from standard Grok-4.20?

It adds extended thinking: the model works through a reasoning trace before responding, improving accuracy on complex problems at the cost of higher token usage and longer time-to-first-token (around 10 seconds).

How much does it cost?

xAI lists $1.25 per million input tokens and $2.50 per million output tokens, with cached input at $0.20/M. Because reasoning traces add output tokens automatically, real costs on hard problems run higher than the headline rate suggests.

What is the context window?

2 million tokens for input. Maximum output per query is capped at 131,000 tokens in practice, even though the model spec lists up to 2M output.

Is it fast for a reasoning model?

Yes — 170 to 197 tokens per second, roughly three times the median speed of comparable reasoning models benchmarked by Artificial Analysis.

What should I NOT use it for?

Simple, low-stakes queries where reasoning overhead is wasteful — you can't dial down thinking depth, so routine tasks will cost more than they need to. Its verbosity also makes it a poor fit when concise output matters.

How does it compare to the non-reasoning Grok-4.20?

The reasoning variant adds the extended thinking layer and carries higher latency and cost. Choose it when accuracy on hard problems is the priority; use the non-reasoning variant when speed and cost efficiency matter more.

Related models