Model guide · updated 2026

Fastest AI models

When latency matters — real-time chat, agent loops, high-volume pipelines — raw output speed wins. These models are ranked by measured output throughput in tokens per second.

  1. 1

    GPT OSS 120B Fast

    OpenAITop pick

    Agentic workflows and multi-tool automation requiring reliable function calling and chain-of-thought reasoning

    1846
    Tokens / sec
  2. 2

    GLM 4.7 Fast

    Zhipu AI

    Code generation and bug fixing

    1000
    Tokens / sec
  3. 3

    GPT OSS 120b

    OpenAI

    Agentic coding and engineering workflows

    344.3
    Tokens / sec
  4. 4

    Gemini 3.1 Flash-Lite

    Google

    High-volume translation of chat messages

    325.5
    Tokens / sec
  5. 5

    Gemini 3.1 Flash-Lite Preview

    Google

    High-volume content classification and moderation

    325.5
    Tokens / sec
  6. 6

    Gemini 3.5 Flash

    Google

    Production agent loops and multi-step tool-use workflows where sustained throughput matters

    280
    Tokens / sec
  7. 7

    Gemini 2.5 Flash-Lite

    Google

    High-volume content classification

    262.1
    Tokens / sec
  8. 8

    MiniMax M2.5

    MiniMax

    High-volume software engineering and code generation across Web

    234.2
    Tokens / sec

Ranked by measured output speed (tokens per second) from Artificial Analysis, fastest first.

Frequently asked questions

Two things: output speed (tokens generated per second) and time-to-first-token (how quickly the first word appears). This list ranks by output speed; reasoning models can be fast at generating but slow to start because they "think" first.

Often the fastest models are smaller or "flash"/"mini" variants tuned for throughput, so they trade some peak reasoning for speed. For chat, drafting and high-volume tasks that trade-off is usually worth it.

Agentic workflows make many model calls in a loop, so per-call latency compounds. A faster model can cut an agent run from minutes to seconds.