GPT OSS 20b
OpenAI's open-weight 21B MoE via Fireworks. Lower latency, local or specialized use-cases. 131k context. Does not support web search, image input, or function calling.
About GPT OSS 20b
OpenAI's first openly licensed model since GPT-2, GPT OSS 20b is a 20-billion parameter Mixture-of-Experts model that activates only 3.6 billion parameters per forward pass — which is how it fits on a 16GB laptop while scoring 98.7% on AIME 2025 with tools. It punches well above its weight on math, function calling, and code: LiveCodeBench at 80.4% and SWE-bench at 60.7% are genuine frontier-adjacent numbers for a model of this size and cost. Developers appreciate the Apache 2.0 license most of all — it means local deployment, fine-tuning, and commercial use without API key lock-in or per-token fees. Reasoning effort is configurable, and chain-of-thought is fully visible, which makes debugging agentic pipelines far easier than with closed models. The honest caveat: long-context reasoning is a real weak spot (14% on AA-LCR), and in extended tool-use conversations it can lose track of which functions are available. Text only, no vision.
Best for
- Mathematical and technical reasoning — AIME and GPQA scores put it among the best small models for proofs, derivations, and STEM problem-solving
- Agentic and function-calling workflows — strong tool-use benchmark results make it a cost-effective backbone for agent pipelines where per-call inference cost matters
- Local and on-device deployment — 12.8 GB quantized weight file runs on consumer hardware; ideal for privacy-sensitive or air-gapped applications
- Code generation and review — LiveCodeBench (80.4%) and SWE-bench (60.7%) make it a solid choice for automated PR review, code completion, and debugging
- Cost-sensitive production at scale — at roughly $0.05 per million input tokens via third-party providers, it covers a wide range of general-purpose tasks at a fraction of frontier model pricing
Specs & capabilities
How GPT OSS 20b stacks up — intelligence, speed, context, and modalities.
Intelligence
Low
Speed
Fast
Context window
131,072 tokens
Max output
20,000 tokens
Knowledge cutoff
May 2024
Frequently asked questions
What does GPT OSS 20b cost?
OpenAI does not publish official API pricing for this model. Third-party providers charge roughly $0.05 per million input tokens and $0.14–$0.20 per million output tokens. The Apache 2.0 license also allows free local deployment with no per-token cost.
What is the context window?
131,072 tokens (128k). Note that long-context reasoning is a documented weak point — the model scores only 14% on the AA-LCR benchmark, so it is not well-suited for complex multi-step reasoning spread across large documents.
Does it support images or other media?
No. GPT OSS 20b is text-only. There is no vision, audio, or file-upload support.
How does it compare to o3-mini?
On mathematical benchmarks and TauBench tool use it matches or approaches o3-mini, but in complex multi-step agentic workflows — especially extended conversations with many available tools — o3-mini remains more reliable. GPT OSS 20b's advantage is that it can run locally and costs far less.
What is the knowledge cutoff?
May 2024. It cannot answer questions about events after that date.
Can I fine-tune or modify the weights?
Yes. The model is released under Apache 2.0, which permits fine-tuning, modification, and commercial redistribution without copyleft restrictions.