Question 1

What does it cost?

Accepted Answer

Direct MiniMax API pricing is $0.30 per million input tokens and $1.20 per million output tokens. Cache reads drop to $0.06 per million. OpenRouter offers slightly lower rates at $0.25 input / $1.00 output. Note that the model's verbosity can make effective per-task costs higher than raw per-token rates suggest.

Question 2

How large is the context window?

Accepted Answer

The native context window is 192,000–205,000 tokens depending on the source. However, the model uses full attention across the entire window, which creates a performance bottleneck on very long inputs. Practical effective context is considerably lower for latency-sensitive use cases.

Question 3

What is M2.7 best at?

Accepted Answer

Agentic and software engineering tasks. It scores 78% on SWE-bench Verified (versus 55% for Claude Opus) and 93.2% on HumanEval. Its self-evolving training makes it particularly strong at multi-step reasoning, tool-calling chains, and correcting its own code mid-generation.

Question 4

What are its main limitations?

Accepted Answer

Output verbosity is the biggest practical issue — the model generates about four times the token output of median models, which offsets the low per-token pricing. Output speed is also below average at 44.8 tokens per second. Earlier versions had tool-calling quirks including premature task halts and reasoning loops.

Question 5

How does M2.7 compare to M2.5?

Accepted Answer

M2.7 pulls ahead of M2.5 on iterative refinement and agentic tasks due to self-evolving training. It scores higher on MMLU (89.4%) and HumanEval (93.2%). The key tradeoff is that M2.5 and the earlier M2 were open-weight under Apache 2.0, while M2.7 is proprietary with no self-hosting option.

Question 6

Who should choose M2.7 over a frontier model?

Accepted Answer

Teams running high-volume coding or agentic pipelines where cost per completed task matters more than raw speed or breadth. It is not the right pick for general-purpose chat, creative writing, or tasks requiring fast token throughput. Evaluate it on your actual workload — the verbosity tax can make it more expensive than it looks.

MiniMax M2.7

About MiniMax M2.7

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Input and output

Availability notes

Frequently asked questions

Related models