Model page

Claude Haiku 4.5

Anthropic's fastest Claude model, optimized for speed and cost efficiency.

About Claude Haiku 4.5

Speed and cost are the headline, but Claude Haiku 4.5 earns them without gutting capability. At 94.8 tokens per second — more than twice as fast as Claude Sonnet 4 — and priced at one-third of its sibling, it's the obvious choice when you need to run many turns without burning through your budget. What makes this generation of Haiku different is breadth: it's the first in the Haiku line to support extended thinking (with adjustable reasoning depth) and computer use, features that used to require a heavier model. A 73.3% score on SWE-bench Verified puts its coding ability genuinely close to Sonnet. Users consistently praise how fast and cost-effective it feels for real-time chat, pair programming, and agentic loops. The honest caveat: its 64,000-token output ceiling can be limiting for long document synthesis, and it trails on some graduate-level academic benchmarks compared to the full Sonnet class. For most product use cases, those tradeoffs are easy to live with.

Best for

  • Real-time chat and conversational AI where low latency (0.76s to first token) is critical
  • Agentic and multi-agent workflows where many turns make cost-per-call the deciding factor
  • Pair programming and code generation at scale, with SWE-bench Verified performance approaching Sonnet 4.5
  • Computer use and GUI automation tasks such as form filling and screenshot-driven workflows
  • High-volume batch inference and customer support automation where throughput and economics matter

Specs & capabilities

How Claude Haiku 4.5 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

Medium

Capability

Speed

Medium

Capability

Context window

200,000 tokens

Capability

Max output

64,000 tokens

Capability

Knowledge cutoff

February 2025

Frequently asked questions

How much does Claude Haiku 4.5 cost?

Input is $1.00 per million tokens and output is $5.00 per million tokens. Cached input drops to $0.10 per million tokens — a 90% discount that makes repeated or long-context queries significantly cheaper.

What is the context window and output limit?

It supports a 200,000-token context window. Maximum output per response is 64,000 tokens, which may be a constraint for very long document generation tasks.

How does it compare to Claude Sonnet 4.5?

Haiku 4.5 delivers roughly 90% of Sonnet 4.5's agentic coding capability at one-third the price and more than twice the speed. For most product workloads, the gap is hard to notice; for graduate-level reasoning benchmarks, Sonnet 4.5 still leads.

Does it support reasoning and vision?

Yes. It's the first Haiku model to support extended thinking (controllable reasoning depth) and computer use. It also accepts image inputs alongside text.

What is the knowledge cutoff?

February 2025. For most practical use cases this is recent enough, though a small number of competing models have a slightly more recent cutoff.

Who should choose Haiku 4.5 over a larger model?

It's the right pick for teams building high-throughput products — chat, customer service, coding assistants, agentic pipelines — where response speed and token cost are primary constraints and raw academic benchmark scores are secondary.

Related models