Model page

GLM 5.1

Z.ai's GLM-5.1 via Fireworks: next-generation flagship for agentic engineering, stronger coding, and sustained long-horizon task performance. 202k context. Uses 2 premium requests per send before length multipliers. Does not support web search, image input, or function calling.

About GLM 5.1

GLM-5.1 from Z.ai is built for one thing above all else: software engineering that runs on its own. A 754-billion parameter Mixture-of-Experts model, it tops the SWE-Bench Pro leaderboard at 58.4%, edging out both GPT-5.4 and Claude Opus 4.6 on real-world coding tasks. What sets it apart in practice is stamina — it can pursue a single engineering goal autonomously for up to eight hours, sustaining hundreds of iterations and thousands of tool calls without human intervention. Users consistently praise this long-horizon execution for agent-based workflows where other models stall. It also delivers fast responses, with a time-to-first-token of 1.33 seconds against a class median of 2.37 seconds. The honest trade-off: GLM-5.1 accepts text only, with no image input, making it a poor fit for visual debugging or UI-centric tasks. It also tends toward verbosity in practice, which can inflate token costs. For teams building autonomous coding pipelines, though, it earns its place at the top of the leaderboard.

Best for

  • Autonomous software engineering agents that run multi-step tasks for hours without human checkpoints
  • Fixing and resolving real-world GitHub issues and pull request workflows (SWE-Bench-class tasks)
  • Large codebase analysis and multi-file repository generation from natural language descriptions
  • Security research and terminal-based coding challenges requiring sustained iterative problem-solving
  • Long-context document processing and retrieval across 200K+ token sessions

Specs & capabilities

How GLM 5.1 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Medium

Capability

Context window

202,800 tokens

Capability

Max output

128,000 tokens

Modalities

Input and output

Input: Text
Output: Text

Features

Availability notes

Cached input: $0.26 / 1M tokens · 2 premium requests per send before length multipliers · Function calling not supported on Fireworks serverless · Fine-tuning not supported on Fireworks serverless

Frequently asked questions

What does GLM-5.1 cost?

Via OpenRouter, pricing is $0.98 per million input tokens and $3.08 per million output tokens. With prompt caching enabled (supported on DeepInfra), effective input costs can drop 60–80%. Some providers like FriendliAI charge more ($1.40/$4.40 per million tokens), so provider choice matters.

How large is the context window?

GLM-5.1 supports a 200,000-token context window by default, with some providers offering up to 1 million tokens. Maximum output length is 128,000 tokens.

What is GLM-5.1 best at?

It is purpose-built for agentic software engineering. It leads the SWE-Bench Pro leaderboard (58.4%), scores 63.5% on Terminal-Bench 2.0, and is designed to run autonomously on complex coding tasks for up to eight hours with continuous tool calls and iterative refinement.

What are its main limitations?

GLM-5.1 accepts text input only — no images — which rules it out for visual debugging, UI review, or diagram analysis. It also tends toward verbosity, which can increase token consumption and API costs in practice. On reasoning-heavy benchmarks like GPQA Diamond, it scores 86.2% versus Claude Opus 4.6's 94.3%.

How does it compare to Claude Opus 4.6?

GLM-5.1 edges out Claude Opus 4.6 on SWE-Bench Pro (58.4% vs. 57.3%) and matches it broadly on engineering tasks. Claude Opus 4.6 holds the advantage on scientific reasoning (GPQA Diamond: 94.3% vs. 86.2%) and supports image input, which GLM-5.1 does not.

Is GLM-5.1 open source?

Yes. Z.ai released GLM-5.1 as open source on April 7, 2026, making it available for self-hosted and on-premises deployment in addition to the hosted API.

Related models