GPT OSS 120b
OpenAI's open-weight 117B MoE via Fireworks. Production-grade reasoning, agentic tasks, function calling. 131k context. Does not support web search or image input.
About GPT OSS 120b
OpenAI's first open-weight reasoning model, GPT OSS 120b brings near-o4-mini reasoning performance under an Apache 2.0 license — a combination that didn't exist in the market before August 2025. Its Mixture-of-Experts architecture activates only 5.1 billion of its 117 billion parameters per token, letting it run on a single 80GB GPU while hitting 97.9% on AIME 2025 with tools and 62.4% on SWE-Bench Verified at high reasoning level. Three configurable reasoning depths (low, medium, high) let you tune compute spend against accuracy on a per-request basis. Users consistently highlight the combination of strong math and coding ability with the freedom to self-host, fine-tune, and deploy without vendor lock-in — making it a genuine option for organizations with data sovereignty requirements. The honest caveat: inference isn't plug-and-play. The model requires the Harmony response format, and running it without an optimized framework like vLLM or SGLang produces degraded output. Plan your stack accordingly. For teams that can handle the setup, the price-to-reasoning ratio across third-party providers is hard to match.
Best for
- Agentic coding and engineering workflows — high-reasoning level with tool use scores 62.4% on SWE-Bench Verified
- Competitive math and scientific reasoning — 97.9% on AIME 2025 with tools, among the highest publicly reported scores
- On-premises and air-gapped deployments where data cannot leave the organization
- Fine-tuning and custom model development under Apache 2.0 without OpenAI dependency
- Long-context document and codebase analysis across a 128k token context window
Specs & capabilities
How GPT OSS 120b stacks up — intelligence, speed, context, and modalities.
Intelligence
Medium
Speed
Fast
Context window
131,072 tokens
Max output
20,000 tokens
Knowledge cutoff
June 2024
Frequently asked questions
What does it cost?
OpenAI's official pricing is $0.15 per million input tokens and $0.60 per million output tokens. Third-party providers vary widely — OpenRouter routes as low as $0.039 input / $0.18 output per million tokens, making it one of the cheaper high-reasoning options available.
What is the context window?
128k tokens (131,072 tokens native). There is no publicly confirmed maximum output token figure.
Does it support images or other modalities?
No. GPT OSS 120b is text-only. There is no image input support, which rules it out for multimodal tasks.
What is the knowledge cutoff?
June 2024, which is notably older than some proprietary alternatives released around the same period. Factor this in for tasks requiring recent world knowledge.
How does it compare to o4-mini?
On core reasoning benchmarks, performance is competitive — near-parity in several evaluations. The key difference is that GPT OSS 120b is open-weight and can be self-hosted or fine-tuned, while o4-mini is a closed API-only model.
Is there anything tricky about deploying it?
Yes. The model requires OpenAI's Harmony response format; standard inference produces degraded or nonsensical output. Optimized frameworks like vLLM or SGLang are strongly recommended. Full-precision weights need roughly 80GB VRAM, though MXFP4 quantization reduces the footprint to around 60GB for single H100 deployment.