Model guide · updated 2026
Fastest AI models
When latency matters — real-time chat, agent loops, high-volume pipelines — raw output speed wins. These models are ranked by measured output throughput in tokens per second.
- 1
GPT OSS 120B Fast
OpenAITop pickAgentic workflows and multi-tool automation requiring reliable function calling and chain-of-thought reasoning
1846Tokens / sec - 2
GLM 4.7 Fast
Zhipu AICode generation and bug fixing
1000Tokens / sec - 3
GPT OSS 120b
OpenAIAgentic coding and engineering workflows
344.3Tokens / sec - 4
Gemini 3.1 Flash-Lite
GoogleHigh-volume translation of chat messages
325.5Tokens / sec - 5
Gemini 3.1 Flash-Lite Preview
GoogleHigh-volume content classification and moderation
325.5Tokens / sec - 6
Gemini 3.5 Flash
GoogleProduction agent loops and multi-step tool-use workflows where sustained throughput matters
280Tokens / sec - 7
Gemini 2.5 Flash-Lite
GoogleHigh-volume content classification
262.1Tokens / sec - 8
MiniMax M2.5
MiniMaxHigh-volume software engineering and code generation across Web
234.2Tokens / sec
Ranked by measured output speed (tokens per second) from Artificial Analysis, fastest first.
Frequently asked questions
Two things: output speed (tokens generated per second) and time-to-first-token (how quickly the first word appears). This list ranks by output speed; reasoning models can be fast at generating but slow to start because they "think" first.
Often the fastest models are smaller or "flash"/"mini" variants tuned for throughput, so they trade some peak reasoning for speed. For chat, drafting and high-volume tasks that trade-off is usually worth it.
Agentic workflows make many model calls in a loop, so per-call latency compounds. A faster model can cut an agent run from minutes to seconds.