Model guide · updated 2026

Fastest AI models

When latency matters — real-time chat, agent loops, high-volume pipelines — raw output speed wins. These models are ranked by measured output throughput in tokens per second.

1
GPT OSS 120B Fast
OpenAITop pick
Agentic workflows and multi-tool automation requiring reliable function calling and chain-of-thought reasoning
1846
Tokens / sec
2
GLM 4.7 Fast
Zhipu AI
Code generation and bug fixing
1000
Tokens / sec
3
GPT OSS 120b
OpenAI
Agentic coding and engineering workflows
344.3
Tokens / sec
4
Gemini 3.1 Flash-Lite
Google
High-volume translation of chat messages
325.5
Tokens / sec
5
Gemini 3.1 Flash-Lite Preview
Google
High-volume content classification and moderation
325.5
Tokens / sec
6
Gemini 3.5 Flash
Google
Production agent loops and multi-step tool-use workflows where sustained throughput matters
280
Tokens / sec
7
Gemini 2.5 Flash-Lite
Google
High-volume content classification
262.1
Tokens / sec
8
MiniMax M2.5
MiniMax
High-volume software engineering and code generation across Web
234.2
Tokens / sec

Ranked by measured output speed (tokens per second) from Artificial Analysis, fastest first.

Compare the top picks →All models

Frequently asked questions

Two things: output speed (tokens generated per second) and time-to-first-token (how quickly the first word appears). This list ranks by output speed; reasoning models can be fast at generating but slow to start because they "think" first.

Often the fastest models are smaller or "flash"/"mini" variants tuned for throughput, so they trade some peak reasoning for speed. For chat, drafting and high-volume tasks that trade-off is usually worth it.

Agentic workflows make many model calls in a loop, so per-call latency compounds. A faster model can cut an agent run from minutes to seconds.

Fastest AI models

GPT OSS 120B Fast

GLM 4.7 Fast

GPT OSS 120b

Gemini 3.1 Flash-Lite

Gemini 3.1 Flash-Lite Preview

Gemini 3.5 Flash

Gemini 2.5 Flash-Lite

MiniMax M2.5

Frequently asked questions