Model page

Grok-4.20 Non-Reasoning

Latency-optimized Grok 4.20 variant with a 2M-token context window, image understanding, and native tool support.

About Grok-4.20 Non-Reasoning

Speed without the wait. Grok-4.20 Non-Reasoning is xAI's fastest production model — built for workloads where latency matters more than visible deliberation. At over 235 tokens per second, it handles real-time chat, high-volume content generation, and structured data extraction at a pace few models can match. What sets it apart further is factual accuracy: a 22% hallucination rate, the lowest recorded among leading models tested in 2026, driven by its native 4-agent cross-checking architecture that validates answers internally even when no reasoning trace is shown. Users consistently praise its speed and reliability for customer-facing APIs and live news commentary through its Harper agent integration with X data. The trade-off is real: skip chain-of-thought and you skip explainability — debugging a complex logical failure means working without breadcrumbs. Complex coding tasks also draw criticism; community consensus is that it trails Claude-family models for code generation. For teams building production systems that demand throughput and factual grounding over step-by-step transparency, Grok-4.20 Non-Reasoning delivers a compelling balance.

Best for

  • Real-time conversational AI and customer-facing chat APIs where sub-200ms response times are required
  • High-throughput content generation — blog posts, product descriptions, summaries — at production scale
  • Structured data extraction and classification via tool-calling in API-driven workflows
  • Live news analysis and event commentary using the Harper agent's real-time X (Twitter) data integration
  • Fact-sensitive research and report generation where hallucination rate is a key selection criterion

Specs & capabilities

How Grok-4.20 Non-Reasoning stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Fast

Capability

Context window

2,000,000 tokens

Capability

Knowledge cutoff

November 2024 + live X

API

Supported endpoints

v1/chat/completions · v1/responses · v1/batch

Modalities

Input and output

Input: Text, Image
Output: Text

Features

Availability notes

10M TPM · 1,800 RPM · Cached input: $0.20 / 1M tokens · Higher context pricing applies above 200K context · Structured outputs, web search, X search, function calling, and code execution supported

Frequently asked questions

What does it cost?

Pricing ranges from $1.25–$2.00 per 1M input tokens and $2.50–$6.00 per 1M output tokens depending on tier. Cached input is $0.20 per 1M tokens. Check xAI's current documentation, as pricing has shifted since the March 2026 launch.

How large is the context window?

The official xAI spec for the non-reasoning variant lists 1 million tokens. Some sources cite 2M for the full Grok 4.20 architecture, but 1M is the conservative documented figure for this specific model.

What is the model genuinely bad at?

Multi-step logical reasoning is a documented weakness — a long-context reasoning score of 18% and limited chain-of-thought transparency make it a poor fit for problems that require auditable step-by-step logic. Community feedback also highlights gaps in complex code generation.

How does it differ from the Grok-4.20 reasoning variant?

The reasoning variant runs extended chain-of-thought and is better suited for difficult logical or mathematical problems. This non-reasoning variant disables that process entirely, trading explainability for significantly higher output speed and lower cost per token.

Does it have access to current information?

Yes. The Harper agent provides live integration with X (Twitter) data, allowing the model to draw on real-time event feeds. Its base knowledge cutoff is November 2024.

Who should choose this model?

Teams building latency-sensitive production systems — streaming APIs, real-time chat, live content pipelines — who need strong factual accuracy but do not require visible reasoning traces or complex multi-step logical outputs.

Related models