breakingMay 2, 2026

Developers report DeepSeek V4 Flash handles 32M-token coding runs for $0.25

Users reported moving long coding sessions from Claude to DeepSeek V4 Flash and seeing tens of millions of tokens cost only cents. Hacker News discussion also leaned toward Flash over Pro for day-to-day use, so teams should test whether the low published prices hold in their own workflows.

5 min read

Developers report DeepSeek V4 Flash handles 32M-token coding runs for $0.25

TL;DR

DeepSeek's official V4 preview shipped two 1M-context models, DeepSeek-V4-Pro at 1.6T total and 49B active params, and DeepSeek-V4-Flash at 284B total and 13B active params, with OpenAI- and Anthropic-style API compatibility baked in from day one, according to the release summary.
DeepSeek's pricing page puts V4 Flash at $0.14 per million cache-miss input tokens, $0.28 per million output tokens, and just $0.0028 per million cache-hit input tokens after the April 26 cut, while V4 Pro is still discounted through May 31, according to Models & Pricing and WesRoth's promo screenshot.
In the clearest real-world cost report so far, jbhuang0604's follow-up says an existing coding codebase burned 32 million tokens on V4 Flash for about $0.25, after their earlier post started with a smaller 10M-plus-token run.
Community reaction has split the lineup pretty cleanly: the HN discussion summary says commenters kept calling Flash the practical model because Pro was slow or rate-limited, while teortaxesTex's hands-on note and another long-context test both describe Flash as steerable enough for big coding sessions.

DeepSeek buried the most interesting detail in Models & Pricing: cache-hit input is now one tenth of launch pricing. You can also read the release note, skim the technical report, and dig through the huge HN thread, where the practical argument quickly turned from benchmarks to whether Flash is the better daily driver.

What shipped

DeepSeek's official release note is straightforward: V4 Preview is live, open-sourced, and centered on a 1M-token context window. The company positions Pro for agentic coding and reasoning, and Flash as the fast, cheap tier.

Hacker News

DeepSeek V4 Preview Release

2.1k upvotes · 1.6k comments

A few deployment details matter more than the marketing copy:

Both models expose OpenAI-format and Anthropic-format base URLs, per Models & Pricing.
Both support tool calls, JSON output, chat prefix completion, and beta FIM completion, again per Models & Pricing.
The legacy deepseek-chat and deepseek-reasoner slugs are headed for deprecation, because the pricing doc says they now map to non-thinking and thinking modes of deepseek-v4-flash.
The release note says older model IDs retire on July 24, 2026, which makes this more than a side-by-side preview.

Cache economics

The headline price gap is large, but the cache-hit numbers are the real story. DeepSeek's pricing page says V4 Flash costs $0.0028 per million cache-hit input tokens, versus $0.14 for cache misses and $0.28 for output, after the April 26 change that cut cache-hit pricing to one tenth of launch levels.

That pricing model is what makes the cents-level screenshots plausible. In teortaxesTex's cost breakdown, 58.3 million of 60.5 million tokens were cache hits, which is why the bill landed at $0.47 instead of something much uglier. OpenRouter's retweet points to another Hermes-agent user claiming roughly one cent per million tokens on V4 Flash.

Pro is cheaper than it first looked, but only because the discount is doing heavy lifting. Models & Pricing lists V4 Pro at $0.435 per million cache-miss input tokens and $0.87 per million output tokens during a 75 percent promo, with the same page saying the offer now runs through May 31. WesRoth's screenshot matches that extension window.

Flash over Pro

The benchmark story is still muddy, but the usability story is not. The HN summary says one top commenter flagged DeepSeek's own benchmark claims as something to treat cautiously until more third-party testing lands, while another said Pro was hard to test because of rate limits and timeouts.

Hacker News

Discussion around DeepSeek v4

2.1k upvotes · 1.6k comments

The same thread kept circling back to Flash. According to that HN summary, one commenter called Flash the model to pay attention to because it was cheap, effective, and fast, while Pro felt slow and unreliable. Another HN commenter, surfaced in the HN core summary, reported V4 handling a gnarly refactoring task well inside a homemade coding harness.

That lines up with the broader open-model mood. mbusigin argued that DeepSeek helped push open-weight models into the "substitutable" bucket for many applications, especially when you can buy more test-time compute instead of paying frontier-model prices.

Coding runs in practice

The viral data point came from one developer bailing on Claude rate limits. In jbhuang0604's first post, the switch to DeepSeek V4 produced a 10M-plus-token coding run that looked shockingly cheap, then their follow-up clarified the fuller number: 32 million tokens for about a quarter on an existing codebase, with roughly the same quality and no more rate limits.

Other hands-on reports paint Flash as a model you can lean on for long noisy agent loops, as long as you tolerate some weirdness:

teortaxesTex called V4-Flash plus Hermes "very very good at being steered" over a 170K context run.
the same tester's follow-up said tool calling felt strong, even if the model still did random nonsense.
another long-context note described a 13B-active Flash run that kept building a fairly complex app past 270K tokens.

Those reports do not settle quality. They do explain why engineers are suddenly talking about token volume first.

Vision and throughput

DeepSeek's V4 story is already spilling beyond text. teortaxesTex's vision note pointed out that a V4-Flash-priced multimodal model would make image-heavy and even short video workloads look unusually cheap at 1M context, right as DeepSeek published its Thinking with Visual Primitives paper.

The catch is availability. ZhihuFrontier's summary of a gray test describes a native vision model that can handle calligraphy, code screenshots, posters, formulas, and handwriting, but also hallucinates plausible text on dense blurry layouts. teortaxesTex speculated DeepSeek has not opened Vision broadly yet because demand could spike too hard, which fits with the same pricing-and-throughput tension already showing up around Pro.