Model page

MiniMax M2.7

MiniMax M2.7 via Fireworks: 228B MoE model for complex agent harnesses, productivity tasks, Agent Teams, Skills, and dynamic tool search. 196k context. Supports image input and function calling. Does not support web search.

About MiniMax M2.7

M2.7 is the model that partially trained itself — MiniMax's self-evolving architecture has the model handling 30–50% of its own training loop, which shows up most clearly in iterative code refinement and autonomous agent chains. On SWE-bench Verified it scores 78%, well above Claude Opus at 55%, making it a credible choice for real-world software engineering tasks at a fraction of flagship pricing: $0.30 per million input tokens and $1.20 per million output. That value proposition comes with a meaningful catch. The model is remarkably verbose — generating roughly four times the token output of comparable models — which can quietly erode the pricing advantage on longer tasks. Output speed also trails the field at 44.8 tokens per second. Teams who meter cost per completed task rather than per token will want to profile their actual workloads before committing. That said, users working in agentic coding harnesses consistently find it earns its place, with one real-world evaluation finding it delivers about 90% of top-tier output quality at roughly 7% of the cost.

Best for

  • Agentic coding workflows and multi-turn tool-calling chains where the model drives autonomous, multi-step execution
  • Large-scale software engineering tasks and bug triage, where its 78% SWE-bench Verified score translates to real-world repo work
  • Feature scaffolding and boilerplate generation for teams with high code-output volume and cost pressure
  • Iterative code refinement where the model's self-evolving training produces strong self-correction mid-generation
  • Office document automation involving Excel, PowerPoint, and Word manipulation

Specs & capabilities

How MiniMax M2.7 stacks up — intelligence, speed, context, and modalities.

Capability

Intelligence

High

Capability

Speed

Slow

Capability

Context window

196,600 tokens

Capability

Max output

131,072 tokens

Capability

Knowledge cutoff

Not specified in available sources

Modalities

Input and output

Input: Text, Image
Output: Text

Features

Availability notes

Cached input: $0.06 / 1M tokens · Function calling supported · Fine-tuning not supported on Fireworks serverless

Frequently asked questions

What does it cost?

Direct MiniMax API pricing is $0.30 per million input tokens and $1.20 per million output tokens. Cache reads drop to $0.06 per million. OpenRouter offers slightly lower rates at $0.25 input / $1.00 output. Note that the model's verbosity can make effective per-task costs higher than raw per-token rates suggest.

How large is the context window?

The native context window is 192,000–205,000 tokens depending on the source. However, the model uses full attention across the entire window, which creates a performance bottleneck on very long inputs. Practical effective context is considerably lower for latency-sensitive use cases.

What is M2.7 best at?

Agentic and software engineering tasks. It scores 78% on SWE-bench Verified (versus 55% for Claude Opus) and 93.2% on HumanEval. Its self-evolving training makes it particularly strong at multi-step reasoning, tool-calling chains, and correcting its own code mid-generation.

What are its main limitations?

Output verbosity is the biggest practical issue — the model generates about four times the token output of median models, which offsets the low per-token pricing. Output speed is also below average at 44.8 tokens per second. Earlier versions had tool-calling quirks including premature task halts and reasoning loops.

How does M2.7 compare to M2.5?

M2.7 pulls ahead of M2.5 on iterative refinement and agentic tasks due to self-evolving training. It scores higher on MMLU (89.4%) and HumanEval (93.2%). The key tradeoff is that M2.5 and the earlier M2 were open-weight under Apache 2.0, while M2.7 is proprietary with no self-hosting option.

Who should choose M2.7 over a frontier model?

Teams running high-volume coding or agentic pipelines where cost per completed task matters more than raw speed or breadth. It is not the right pick for general-purpose chat, creative writing, or tasks requiring fast token throughput. Evaluate it on your actual workload — the verbosity tax can make it more expensive than it looks.

Related models