Question 1

What does it cost?

Accepted Answer

OpenAI's official pricing is $0.15 per million input tokens and $0.60 per million output tokens. Third-party providers vary widely — OpenRouter routes as low as $0.039 input / $0.18 output per million tokens, making it one of the cheaper high-reasoning options available.

Question 2

What is the context window?

Accepted Answer

128k tokens (131,072 tokens native). There is no publicly confirmed maximum output token figure.

Question 3

Does it support images or other modalities?

Accepted Answer

No. GPT OSS 120b is text-only. There is no image input support, which rules it out for multimodal tasks.

Question 4

What is the knowledge cutoff?

Accepted Answer

June 2024, which is notably older than some proprietary alternatives released around the same period. Factor this in for tasks requiring recent world knowledge.

Question 5

How does it compare to o4-mini?

Accepted Answer

On core reasoning benchmarks, performance is competitive — near-parity in several evaluations. The key difference is that GPT OSS 120b is open-weight and can be self-hosted or fine-tuned, while o4-mini is a closed API-only model.

Question 6

Is there anything tricky about deploying it?

Accepted Answer

Yes. The model requires OpenAI's Harmony response format; standard inference produces degraded or nonsensical output. Optimized frameworks like vLLM or SGLang are strongly recommended. Full-precision weights need roughly 80GB VRAM, though MXFP4 quantization reduces the footprint to around 60GB for single H100 deployment.

GPT OSS 120b

About GPT OSS 120b

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Frequently asked questions

Related models