releaseApril 9, 2026

Anthropic adds beta advisor tool to Messages API for Opus calls

Anthropic added a beta advisor tool to the Messages API so Sonnet or Haiku can call Opus mid-run inside one request. Anthropic says Sonnet plus Opus scored 2.7 points higher on SWE-bench Multilingual while cutting per-task cost 11.9%.

5 min read

Anthropic adds beta advisor tool to Messages API for Opus calls

TL;DR

Anthropic has put its new advisor tool into beta, letting a Sonnet or Haiku executor call Opus for guidance mid-run inside the same Messages API request, according to Anthropic's launch thread and the official blog post.
In Anthropic's evals, Sonnet plus an Opus advisor beat Sonnet alone on SWE-bench Multilingual by 2.7 points while cutting per-task cost 11.9%, as Anthropic's benchmark post and the launch writeup both report.
The architecture is simple enough to feel slightly rude to a lot of agent stacks: Anthropic's API description says the executor decides when to consult Opus, while the [launch diagram in TestingCatalog's post](src:5|TestingCatalog's screenshot) shows both models reading the same shared context.
Anthropic says the same pattern also improves BrowseComp and Terminal-Bench, and Alex Albert's chart shows Sonnet 4.6 plus Opus beating Sonnet solo on both while also lowering task cost.
The beta ships behind an anthropic-beta: advisor-tool-2026-03-01 header in Anthropic's API release notes, with advisor usage billed at Opus rates and executor usage billed at Sonnet or Haiku rates in the pricing docs.

You can read the full launch post, check the API release note, and the interesting bit is how little machinery Anthropic is asking for. The product thread frames it as one tool added to a Messages call, TestingCatalog's diagram shows the shared-context handoff, and scaling01's benchmark screenshots surface the weirdest number in the launch, Haiku more than doubling its BrowseComp score when Opus gets pulled in selectively.

One request, one extra tool

Anthropic is packaging the advisor pattern as a server-side tool, not as a separate orchestration product. In the launch post, Sonnet or Haiku stays in the driver's seat as the executor, handling tools, reading results, and iterating until it hits a decision point it cannot clear cheaply.

At that point, according to Anthropic's API description, the executor consults Opus and continues inside the same /v1/messages request. The blog's example request adds advisor_20260301 to the tools array with an Opus model name and a max_uses cap, which makes this look more like a targeted escalation hook than a new agent framework.

Anthropic also leans hard on the contrast with the usual sub-agent pattern in the official post: instead of a larger model decomposing work and delegating downward, a cheaper executor runs end to end and only escalates upward when it gets stuck.

Shared context, no user-facing Opus output

The diagram circulating with the launch captures the important implementation detail. As TestingCatalog's screenshot shows, the executor and advisor read the same conversation, tool state, and history, so Anthropic is not asking developers to serialize state into a second request or manage a side channel for planning.

That matches the official wording in the blog post: Opus reviews curated shared context, returns a plan, correction, or stop signal, and then the executor resumes. Opus does not call tools itself and does not produce the user-facing answer.

That design explains why the cost story can work at all. As Lance Martin's thread puts it, the expensive model gets used for the hard branch points instead of grinding through the entire trajectory.

Benchmark deltas by executor

Anthropic led with the SWE-bench Multilingual number, but the launch material actually shows two different stories depending on executor.

For Sonnet 4.6 plus Opus advisor, Anthropic reports:

SWE-bench Multilingual: +2.7 percentage points over Sonnet alone, with 11.9% lower cost per task, according to Anthropic's benchmark post and the blog post
BrowseComp: 60.4% versus 58.1%, with cost dropping from $7.00 to $6.13, as shown in Alex Albert's chart
Terminal-Bench 2.0: 63.4% versus 59.6%, with cost dropping from $0.94 to $0.88, again in Alex Albert's chart

For Haiku 4.5 plus Opus advisor, the gains are larger in absolute score, but so is the added spend versus Haiku solo. scaling01's screenshots show:

BrowseComp: 41.2% versus 19.7%
Terminal-Bench 2.0: 49.0% versus 35.7%
BrowseComp cost: $1.07 versus $0.20
Terminal-Bench cost: $0.55 versus $0.35

The official post adds one more useful comparison: Haiku plus Opus still trails Sonnet solo in score, but Anthropic says it costs 85% less per task, which gives the launch a second persona beyond coding agents, cheap high-volume workloads that occasionally need a smarter plan.

Beta header, spend controls, and billing split

The API release note is more specific than the tweet thread about rollout. In Anthropic's release notes, the advisor tool is listed as a public beta feature launched on April 9, with the required header anthropic-beta: advisor-tool-2026-03-01.

The launch post says advisor tokens are reported separately in the usage block and that developers can cap invocations with max_uses. Pricing follows the underlying models rather than a separate tool fee: Anthropic's pricing page lists Opus 4.6 at $5 per million input tokens and $25 per million output tokens, Sonnet 4.6 at $3 and $15, and Haiku 4.5 at $1 and $5.

One small but revealing line in Lance Martin's follow-up is that Anthropic is selling the advisor pattern as token efficiency, not just model mixing. The bet is that a more capable model can be cheaper when it only gets called for the forks where a weaker model would otherwise waste turns.

TL;DR

One request, one extra tool

Shared context, no user-facing Opus output

Benchmark deltas by executor

Beta header, spend controls, and billing split

Discussion across the web