DeepSeek cuts input cache-hit price 90% to $0.003625 per 1M tokens
DeepSeek said cache-hit pricing across its API series is now one-tenth of launch levels, on top of the temporary V4-Pro discount through May 5. The cut lowers costs for cache-heavy long-context and agent workloads, so teams should recheck spend assumptions.

TL;DR
- DeepSeek said in deepseek_ai's pricing post that input cache hits across its API lineup now cost one tenth of their launch price, putting DeepSeek-V4-Pro cache hits at $0.003625 per 1M tokens during the current promo and DeepSeek-V4-Flash cache hits at $0.0028.
- The cut stacks with the temporary V4-Pro discount through May 5, 2026, as DeepSeek's graphic and victor207755822's breakdown both note.
- In one cache-heavy agent run, teortaxesTex's usage screenshot showed 6.2M cache-hit tokens costing about $0.0224, versus roughly ten times more under the previous cache-hit price.
- According to the main Hacker News thread and its fresh comment roundup, practitioners already see V4-Flash as the more deployable model, while V4-Pro is described as slower and more rate-limited when it works.
- DeepSeek's own infrastructure story matters here too: teortaxesTex's screenshots of Liang Wenfeng remarks and V4 docs tie the lower prices to reduced serving cost, including sharply smaller KV-cache footprints than V3.2.
You can check DeepSeek's announcement page, skim the live Hacker News thread, and jump straight to practitioner comments on benchmark cost/performance, Flash vs Pro reliability, and active parameter efficiency.
Input cache pricing
The concrete change is simple: cache-hit input pricing across the series drops to 10 percent of launch levels, according to deepseek_ai's post. AiBattle_'s comparison graphic shows the same footnote in product terms, saying the older deepseek-chat and deepseek-reasoner names will eventually map to the non-thinking and thinking modes of deepseek-v4-flash.
For the currently advertised V4 models, the table in DeepSeek's graphic breaks out like this:
- DeepSeek-V4-Pro, input cache hit: $0.003625 per 1M tokens during the promo, down from $0.0145 and far below the $0.145 launch reference.
- DeepSeek-V4-Pro, input cache miss: $0.435 per 1M tokens during the promo, down from $1.74.
- DeepSeek-V4-Pro, output: $0.87 per 1M tokens during the promo, down from $3.48.
- DeepSeek-V4-Flash, input cache hit: $0.0028 per 1M tokens, down from $0.028.
- DeepSeek-V4-Flash, input cache miss: $0.14 per 1M tokens.
- DeepSeek-V4-Flash, output: $0.28 per 1M tokens.
The temporary part is the V4-Pro 75 percent promotion. The cache-hit cut is the permanent part, as victor207755822's post spells out.
Cache-heavy agent runs
The interesting bit is not the headline number. It is how quickly cache-hit tokens dominate long runs once an agent keeps reusing a swollen context.
In the usage snapshot from teortaxesTex, 6,199,808 of 6,612,473 total tokens were cache hits. That post prices the cache-hit portion at roughly $0.0224 after the cut, versus about $0.22 under the prior cache-hit price, and says the old bill would have been more than half the total run cost.
That lines up with the broader reaction in scaling01's post, which fixates on the new $0.003625 figure rather than the already-discounted miss and output rates. For agent workloads that repeatedly resend history, the cheap part of the bill just got much cheaper.
Flash and Pro in practice
DeepSeek v4
2.1k upvotes · 1.6k comments
The strongest community signal is that V4-Flash looks like the easy model to love, while V4-Pro still carries serving friction. The summary in the main HN thread says commenters are reporting strong cost-performance and fast agentic behavior from Flash, while Pro is harder to serve and test because of rate limits and timeouts.
The thread's linked comments add the concrete examples:
- In cmitsakis's benchmark comment, V4-Flash is described as scoring 90.2 percent on a customer-support benchmark, ahead of Qwen3.5-27B and roughly level with Gemini-3-Flash-Preview at much lower cost.
- In gertlabs's hands-on comment, V4-Flash is called cheap, effective, and really fast, while V4-Pro is described as slow, unreliable, and too rate-limited to be very useful.
- In wolttam's coding comment, a messy refactoring run reportedly ended with a significantly nicer codebase.
Fresh discussion on DeepSeek v4
2.1k upvotes · 1.6k comments
The fresher roundup in impact_sy's delta report says the new comments mostly reinforce that same split. Flash is emerging as the practical deployment choice, while Pro is still the model people want to benchmark more than the one they want to sit behind a harness all day.
Cost structure behind the cut
One reason the price move looks less like a one-off promo is the infrastructure claim attached to it. In the interview excerpt captured by teortaxesTex, Liang Wenfeng says the company's pricing principle is not subsidy and not excessive margin, adding that the lower API price keeps a slight profit margin on top of cost.
The second screenshot in the same post ties that to V4's serving profile. It says DeepSeek-V4-Pro reaches 27 percent of V3.2's single-token FLOPs and 10 percent of its KV-cache size, while V4-Flash drops to 10 percent of the FLOPs and 7 percent of the KV-cache size.
That is also why the HN thread keeps returning to active parameters and served efficiency. In latentframe's comment on active parameters, the headline 1.6T parameter count matters less than how few parameters are active in practice. The cheaper cache-hit line item is the visible pricing change, but the more durable story is that DeepSeek is arguing the underlying memory bill fell first.