Skip to content
AI Primer
release

Google launches Deep Research Max with MCP, native charts, and 85.9% BrowseComp

Google added Deep Research and Deep Research Max to the Gemini API with collaborative planning, multimodal inputs, MCP support, and native charts. The agents push cited web-plus-private-data reports into developer workflows, and Max is tuned for slower overnight runs.

5 min read
Google launches Deep Research Max with MCP, native charts, and 85.9% BrowseComp
Google launches Deep Research Max with MCP, native charts, and 85.9% BrowseComp

TL;DR

You can read the official blog post, skim the API docs, and inspect Phil Schmid's code sample for the new deep-research-preview-04-2026 and deep-research-max-preview-04-2026 model IDs. Google's thread also makes the product split unusually explicit: one agent for interactive surfaces, another for jobs that can keep searching and reasoning for hours.

What shipped

Google turned its internal deep research stack into a developer-facing API surface. Google's availability post says the same autonomous infrastructure already powers Gemini, NotebookLM, Search, and Google Finance, and now sits behind a single API call.

The launch breaks into two agents:

  • Deep Research: lower latency, lower cost, intended for interactive product surfaces, per Google's mode split.
  • Deep Research Max: slower, more exhaustive, built to spend extra time searching and reasoning, per Google's mode split.
  • Grounded output: reports are cited and can draw from the open web plus private files and data sources, according to Google's launch thread.
  • Public preview: available now via the Gemini API, with Google Cloud startup and enterprise availability coming later, per Google's availability post.

The notable bit is not just that Google added another “research agent.” It exposed the knobs that were mostly hidden in consumer deep research products.

Planning and tools

Google's capability list lays out five concrete controls the API now exposes:

  1. Collaborative planning: review and edit the research plan before execution.
  2. Extended tooling: run Google Search, MCP servers, URL context, code execution, and file search together.
  3. Web-off mode: disable web access and search only custom data.
  4. Multimodal inputs: start from PDFs, CSVs, audio, video, and images.
  5. Native visuals and streaming: generate charts or infographics during the run, and stream reasoning plus text and image output in real time.

That combination makes the agent look less like a fixed report writer and more like a configurable research harness. rohanpaul_ai's summary usefully frames MCP as the connective tissue for company docs, file stores, and specialist data feeds.

Two modes, two workloads

Google is positioning the two tiers around runtime budgets rather than around different UI skins.

  • Deep Research is the faster tier for interactive experiences where latency matters, according to Google's mode split.
  • Deep Research Max is the overnight tier for comprehensive background jobs, according to Google's mode split.
  • Phil Schmid's summary thread adds two operating details missing from Google's main thread: a run can consult more than 100 sources, and Max uses roughly 160 search queries per task.
  • The Rundown AI's recap puts a rough price band on that longer run shape, at about $2 to $5 per report, though Google did not include pricing details in the primary launch tweets.

The overnight framing is straight from Google's own examples. Google's launch thread describes kicking off a task before bed and getting a finished report with visuals by morning.

Benchmarks

Google's benchmark slide compares the April 2026 Deep Research agents against the December 2025 version, GPT-5.4 Thinking (xhigh), and Opus 4.6 Thinking (Max).

  • DeepSearchQA: Deep Research Max scored 93.3%, while GoogleDeepMind's chart puts the faster Deep Research tier at 81.8%, the December release at 66.1%, GPT-5.4 at 88.5%, and Opus 4.6 at 76.8%.
  • Humanity's Last Exam: GoogleDeepMind's chart shows Deep Research Max at 54.6%, Deep Research at 50.4%, December Deep Research at 46.4%, GPT-5.4 at 53.4%, and Opus 4.6 at 43.3%.
  • BrowseComp: GoogleDeepMind's chart shows the biggest spread, with Deep Research Max at 85.9%, Deep Research at 61.9%, December Deep Research at 59.2%, GPT-5.4 at 58.9%, and Opus 4.6 at 45.1%.

One buried caveat sits in Google's own benchmark footer: the results were evaluated by Google DeepMind using public model APIs, and the Humanity's Last Exam result used the full text-plus-multimodal set, according to GoogleDeepMind's benchmark post.

Availability and model IDs

The launch landed on more than one surface on day one.

The model naming is unusually concrete for a launch this early. Between Phil Schmid's code sample and the API docs, developers can already target the preview slugs directly instead of waiting for a vague “soon” rollout.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
Planning and tools1 post
Two modes, two workloads1 post
Benchmarks1 post
Availability and model IDs1 post