releaseApril 29, 2026

DeepSeek releases Vision beta for image understanding in DeepSeek Chat

DeepSeek began rolling out Vision beta as a new image-understanding mode in Chat, and early testers reported fast OCR and strong object recognition. The rollout appears limited or staggered, so watch for broader access and formal docs before relying on it.

4 min read

DeepSeek releases Vision beta for image understanding in DeepSeek Chat

TL;DR

DeepSeek started rolling out a dedicated Vision tab inside DeepSeek Chat, where testingcatalog's UI screenshot and WesRoth's follow-up screenshot both show Instant, Expert, and Vision as separate modes, with Vision labeled "Image understanding (Beta)."
The rollout looks staggered, not universal. While teortaxesTex's failed early test still hit a text-only response, testingcatalog's later screenshot showed the beta UI live.
Early testers described the model as fast and generally strong on image tasks. According to teortaxesTex's first impressions, V4-Vision did well on the examples they had seen and was "VERY FAST."
Some of the most interesting screenshots are not about plain OCR. In teortaxesTex's knowledge-base screenshot, the model exposes a structured chain with steps like "Analyze the Image" and "Evaluate Knowledge Base," which the same user argued in a follow-up post looks aligned with agentic execution.
Official DeepSeek surfaces have not caught up yet. The DeepSeek homepage still promotes V4 Preview generally, and the chat completion API docs list only deepseek-v4-flash and deepseek-v4-pro, with no public Vision model documented.

You can see the new chat entrypoint, browse reasoning-trace screenshots, and compare that to DeepSeek's still text-only API model list. The odd bit is that some early screenshots suggest a more agent-shaped internal workflow than a basic upload-and-caption feature. The equally odd bit is that the cleanest official evidence is still UI leakage and employee teasing, not a formal product post.

Vision mode

DeepSeek Chat now has a dedicated Vision mode in the UI, separate from its existing fast and expert options.

The screenshots line up on the same surface: Vision appears as a first-class tab, and the label says "Image understanding (Beta)." That makes this a product rollout inside Chat, not just a hidden model slug or API-only experiment.

DeepSeek's own public site is looser. The homepage says V4 Preview is available on web, app, and API, but it does not mention image understanding or Vision by name.

Early tests

The first public reports were messy, which is what a gray rollout usually looks like.

In one early attempt, the system still answered like a text model that could read prompt text but not inspect the uploaded image. A little later, teortaxesTex said reports so far suggested V4-Vision was strong, not perfect, and very fast.

That timing lines up with niallohiggins's rollout note, which frames vision as the missing piece for frontend and design-heavy work and says image recognition appeared to be starting to roll out in beta.

Reasoning traces

The most revealing screenshots are the ones that show how the model explains itself.

Those traces break work into explicit stages, including image analysis, knowledge-base lookup, and connection steps. In one screenshot, the model recognized the test account and produced a long profile-style summary tied to the handle.

That is why teortaxesTex's thread keeps describing V4-Vision as something built for DeepSeek's own agent stack, with structured reasoning traces that look more like routine tool use than a generic consumer-facing VLM. The same user later cooled the hype in a later prediction, calling it a good cheap VLM rather than anything magical.

Official gaps

The public documentation still stops short of a real launch note.

The closest official signal in the evidence pool is victor207755822's post, which says "The little whale can now see" and shows the DeepSeek whale logo with a glowing eye. That matches the product direction, but it is still a teaser, not a spec.

The harder gap is in docs. DeepSeek's chat completion API reference still exposes only deepseek-v4-flash and deepseek-v4-pro, and DeepSeek's V4 technical report is a text-model report that frames multimodal capability as future work rather than day-one V4 scope. So the feature is visible in Chat before it is clearly documented as a supported model surface.

TL;DR

Vision mode

Early tests

Reasoning traces

Official gaps

Discussion across the web