Question 1

What does DeepSeek V4 Flash cost?

Accepted Answer

Input tokens are $0.14 per million and output tokens are $0.28 per million. Cache hits drop to $0.003 per million input tokens — a 98% discount — bringing the blended rate to roughly $0.06 per million tokens at a typical usage mix.

Question 2

How large is the context window?

Accepted Answer

One million tokens, roughly equivalent to 1,500 pages of standard text. Long-context recall scores 78.7% on MRCR 1M in Think Max mode.

Question 3

How does Flash compare to DeepSeek V4 Pro?

Accepted Answer

Flash and Pro sit within 1.6 percentage points of each other on coding benchmarks. Flash is faster and cheaper; Pro is the better choice for tasks requiring deeper or more sustained reasoning.

Question 4

What are the model's main weaknesses?

Accepted Answer

Real-world testers describe it as 'benchmark maxed' — strong on standard tests but prone to subtle logical flaws in complex algorithms. It also generates unusually verbose output, which can inflate costs on high-volume workloads despite the low per-token rate.

Question 5

Is V4 Flash stable for production use?

Accepted Answer

It is explicitly labeled a preview release on the Hugging Face model card. API behavior and pricing are subject to change without notice, so production deployments should build in tolerance for breaking changes.

Question 6

Can I self-host or fine-tune it?

Accepted Answer

Yes. V4 Flash is open-weights under an MIT license and available on Hugging Face, with no restrictions on commercial or research use.

DeepSeek V4 Flash

About DeepSeek V4 Flash

Best for

Specs & capabilities

Intelligence

Speed

Context window

Max output

Knowledge cutoff

Input and output

Availability notes

Frequently asked questions

Related models