DeepSeek V4 releases MIT weights and 1M context, retires old IDs on July 24
HN coverage added deployment details for DeepSeek V4: published weights, 1M context on both Pro and Flash, and legacy model IDs retiring on July 24. Watch for rate limits and timeouts on V4-Pro if you're planning self-hosted or API use.

TL;DR
- DeepSeek shipped two preview V4 models, and the release page says both DeepSeek-V4-Pro and DeepSeek-V4-Flash come with a standard 1M-token context window.
- According to the release page, API users need to move to
deepseek-v4-proordeepseek-v4-flash, while the HN coverage notes the old IDs retire on July 24. - The main HN thread surfaced the practical deployment angle fast: published Hugging Face weights under MIT, plus early questions about hosting, rate limits, and whether Flash is the better default.
- In the discussion roundup, one HN commenter reported V4-Pro timeouts and heavy rate limiting, while another said V4 handled a gnarly refactoring task well in a DIY coding harness.
You can read DeepSeek's official release, check the models and pricing page, and pull the open weights collection. The changelog also hides a migration detail worth knowing: DeepSeek's updates page says deepseek-chat and deepseek-reasoner currently map to V4-Flash modes before they disappear in July.
What shipped
DeepSeek-V4 Preview Release and 1M Context Standard Announced
DeepSeek has launched and open-sourced its DeepSeek-V4 preview, featuring two new models—DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters)—both supporting a 1M context window as standard. The models utilize a new attention mechanism combining token-wise compression and DeepSeek Sparse Attention (DSA) to improve long-context efficiency. Users can access these models via API or chat; API integration requires updating to the new model names while maintaining the current base URL. Support for legacy model identifiers (deepseek-chat and deepseek-reasoner) will be discontinued on July 24, 2026.
DeepSeek's preview release shipped two MoE models:
deepseek-v4-pro: 1.6T total parameters, 49B active, per the official releasedeepseek-v4-flash: 284B total parameters, 13B active, per the official release- both expose a 1,000,000-token context window, per the official release
- both are available through chat.deepseek.com and the API, according to the release page
DeepSeek v4
Relevant for engineers building with frontier models: the release changes API model names, adds a standard 1M-token context, and appears to target agentic and coding workflows. The discussion also highlights practical questions around benchmark validity, rate limits, hosting, and whether Pro vs Flash is the better choice in real deployments.
The official docs say the weights are open-sourced, and the HN thread added the practical confirmation most engineers actually wanted: the Hugging Face repos are published and MIT-licensed.
API migration
DeepSeek-V4 Preview Release and 1M Context Standard Announced
DeepSeek has launched and open-sourced its DeepSeek-V4 preview, featuring two new models—DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters)—both supporting a 1M context window as standard. The models utilize a new attention mechanism combining token-wise compression and DeepSeek Sparse Attention (DSA) to improve long-context efficiency. Users can access these models via API or chat; API integration requires updating to the new model names while maintaining the current base URL. Support for legacy model identifiers (deepseek-chat and deepseek-reasoner) will be discontinued on July 24, 2026.
DeepSeek kept the base URL the same and changed the model slugs. The new names are deepseek-v4-pro and deepseek-v4-flash, according to the official release and the changelog.
The retirement date is concrete. As the release page states, deepseek-chat and deepseek-reasoner become inaccessible after July 24, 2026 at 15:59 UTC.
A small gotcha lives in the changelog: the updates page says those legacy names currently map to V4-Flash modes, not to V4-Pro. deepseek-chat points at V4-Flash non-thinking mode, while deepseek-reasoner points at V4-Flash thinking mode.
Context and early usage reports
Discussion around DeepSeek v4
Thread discussion highlights: - simonw on model usage and comparison: I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro... Both generated using OpenRouter. - XCSme on benchmark skepticism and serving limits: Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA... V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. - wolttam on coding workflow success: I'm impressed! I've been giving the various open-weight models a particularly gnarly... refactoring/cleanup task in my DIY coding harness... DS V4 went hard, fixed its issues along the way, and left me with a significantly nicer codebase!
The headline spec is the 1M context window, but the more interesting bit is what DeepSeek says it did to make that usable. The Hugging Face model card says V4 uses a hybrid attention design, Compressed Sparse Attention plus Heavily Compressed Attention, and claims big efficiency gains versus V3.2 at 1M tokens.
Early hands-on reports were mixed. In the HN discussion roundup, one commenter said V4-Pro was heavily rate-limited and prone to timeouts, while another said DeepSeek V4 performed well on a difficult refactoring and cleanup task in a coding harness. The main HN thread also captured a smaller but telling split: Simon Willison preferred a sample from Flash over Pro for one image-generation prompt relayed through OpenRouter.
Pricing and limits
The pricing page adds a few deployment details the announcement skipped:
- both models support OpenAI-format and Anthropic-format endpoints, per the models and pricing page
- both list a maximum output of 384K tokens, per the models and pricing page
- Flash is capped at 2,500 concurrency, while Pro is capped at 500, per the models and pricing page
- listed output pricing is $0.28 per 1M tokens for Flash and $0.87 for Pro, per the models and pricing page
Those limits line up with the HN reports better than the launch post does. The discussion summary already had users calling out Pro slowdowns on day one.