Skip to content
AI Primer
release

DeepSeek V4 releases MIT weights and 1M context, retires old IDs on July 24

HN coverage added deployment details for DeepSeek V4: published weights, 1M context on both Pro and Flash, and legacy model IDs retiring on July 24. Watch for rate limits and timeouts on V4-Pro if you're planning self-hosted or API use.

3 min read
DeepSeek V4 releases MIT weights and 1M context, retires old IDs on July 24
DeepSeek V4 releases MIT weights and 1M context, retires old IDs on July 24

TL;DR

  • DeepSeek shipped two preview V4 models, and the release page says both DeepSeek-V4-Pro and DeepSeek-V4-Flash come with a standard 1M-token context window.
  • According to the release page, API users need to move to deepseek-v4-pro or deepseek-v4-flash, while the HN coverage notes the old IDs retire on July 24.
  • The main HN thread surfaced the practical deployment angle fast: published Hugging Face weights under MIT, plus early questions about hosting, rate limits, and whether Flash is the better default.
  • In the discussion roundup, one HN commenter reported V4-Pro timeouts and heavy rate limiting, while another said V4 handled a gnarly refactoring task well in a DIY coding harness.

You can read DeepSeek's official release, check the models and pricing page, and pull the open weights collection. The changelog also hides a migration detail worth knowing: DeepSeek's updates page says deepseek-chat and deepseek-reasoner currently map to V4-Flash modes before they disappear in July.

What shipped

DeepSeek-V4 Preview Release and 1M Context Standard Announced

DeepSeek has launched and open-sourced its DeepSeek-V4 preview, featuring two new models—DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters)—both supporting a 1M context window as standard. The models utilize a new attention mechanism combining token-wise compression and DeepSeek Sparse Attention (DSA) to improve long-context efficiency. Users can access these models via API or chat; API integration requires updating to the new model names while maintaining the current base URL. Support for legacy model identifiers (deepseek-chat and deepseek-reasoner) will be discontinued on July 24, 2026.

DeepSeek's preview release shipped two MoE models:

DeepSeek v4

Relevant for engineers building with frontier models: the release changes API model names, adds a standard 1M-token context, and appears to target agentic and coding workflows. The discussion also highlights practical questions around benchmark validity, rate limits, hosting, and whether Pro vs Flash is the better choice in real deployments.

The official docs say the weights are open-sourced, and the HN thread added the practical confirmation most engineers actually wanted: the Hugging Face repos are published and MIT-licensed.

API migration

DeepSeek-V4 Preview Release and 1M Context Standard Announced

DeepSeek has launched and open-sourced its DeepSeek-V4 preview, featuring two new models—DeepSeek-V4-Pro (1.6T total/49B active parameters) and DeepSeek-V4-Flash (284B total/13B active parameters)—both supporting a 1M context window as standard. The models utilize a new attention mechanism combining token-wise compression and DeepSeek Sparse Attention (DSA) to improve long-context efficiency. Users can access these models via API or chat; API integration requires updating to the new model names while maintaining the current base URL. Support for legacy model identifiers (deepseek-chat and deepseek-reasoner) will be discontinued on July 24, 2026.

DeepSeek kept the base URL the same and changed the model slugs. The new names are deepseek-v4-pro and deepseek-v4-flash, according to the official release and the changelog.

The retirement date is concrete. As the release page states, deepseek-chat and deepseek-reasoner become inaccessible after July 24, 2026 at 15:59 UTC.

A small gotcha lives in the changelog: the updates page says those legacy names currently map to V4-Flash modes, not to V4-Pro. deepseek-chat points at V4-Flash non-thinking mode, while deepseek-reasoner points at V4-Flash thinking mode.

Context and early usage reports

Discussion around DeepSeek v4

Thread discussion highlights: - simonw on model usage and comparison: I like the pelican I got out of deepseek-v4-flash more than the one I got from deepseek-v4-pro... Both generated using OpenRouter. - XCSme on benchmark skepticism and serving limits: Something is odd with this model, their blog posts shows REALLY good results, but in most other third-party benchmarks, people realize it's not really SOTA... V4-Pro is heavily rate-limited and gives a lot of timeout errors when I try to test it. - wolttam on coding workflow success: I'm impressed! I've been giving the various open-weight models a particularly gnarly... refactoring/cleanup task in my DIY coding harness... DS V4 went hard, fixed its issues along the way, and left me with a significantly nicer codebase!

The headline spec is the 1M context window, but the more interesting bit is what DeepSeek says it did to make that usable. The Hugging Face model card says V4 uses a hybrid attention design, Compressed Sparse Attention plus Heavily Compressed Attention, and claims big efficiency gains versus V3.2 at 1M tokens.

Early hands-on reports were mixed. In the HN discussion roundup, one commenter said V4-Pro was heavily rate-limited and prone to timeouts, while another said DeepSeek V4 performed well on a difficult refactoring and cleanup task in a coding harness. The main HN thread also captured a smaller but telling split: Simon Willison preferred a sample from Flash over Pro for one image-generation prompt relayed through OpenRouter.

Pricing and limits

The pricing page adds a few deployment details the announcement skipped:

Those limits line up with the HN reports better than the launch post does. The discussion summary already had users calling out Pro slowdowns on day one.

Share on X