Anthropic put Claude Managed Agents into public beta with hosted sandboxes, vaults, memory filesystems, and long-running sessions. Use the managed setup if you want explicit controls for tools, credentials, and completion criteria instead of custom harness code.

claude update and a /claude-api managed-agents-onboarding subcommand.You can jump straight to the engineering writeup, skim the managed agents overview, and then fan out into the docs for environments, vaults, and outcomes. The weirdly useful bit is that Anthropic already documents a grader running in a separate context window for outcomes, and Box's launch-day demo shows partners treating this as background automation infrastructure, not a toy agent wrapper.
Anthropic's launch post sells Managed Agents as the way to stop rebuilding the same scaffolding around every serious agent. The announcement promises a path from prototype to launch, while the engineering blog tweet points to a deeper argument: harnesses encode assumptions that go stale as models improve.
The engineering post says Anthropic tried to design around stable interfaces rather than a fixed harness implementation. That is the interesting claim here. Managed Agents is less a new model feature than Anthropic productizing the control plane around long-horizon runs.
The diagram gives the four nouns that matter:
The overview docs make the product split explicit. Messages API is still the direct model interface. Managed Agents is the pre-built harness for long-running and asynchronous work.
Anthropic exposed the runtime as a reusable environment object instead of burying it inside each request. trq212 called out package selection and networking controls, and the environment docs say you create an environment once, then reference its ID when you start sessions.
That same page adds two practical details that will matter to anyone comparing this with self-hosted agent loops:
managed-agents-2026-04-01 beta header.ant beta:environments create CLI examples.The release notes also mention server-sent event streaming, which suggests Anthropic expects developers to watch long jobs as streams rather than poll a thin completion endpoint.
The docs push two state primitives alongside the sandbox: credentials and files. Vaults are Anthropic's answer to user-scoped secrets, and trq212's thread context also points readers to filesystem-backed memory for persistence across sessions.
The vaults docs describe vaults as a place to register third-party credentials once, then attach them to sessions by ID. Anthropic explicitly says this avoids sending tokens on every call and helps track which end user an agent is acting for.
The separate memory tool docs describe a persistent directory where Claude can create, read, update, and delete files across sessions. That gives Managed Agents a native way to carry forward learned context without stuffing everything back into the prompt.
The most opinionated part of the launch is outcomes. The docs link in trq212's thread describes an outcome as a target artifact plus a rubric, not another chat turn.
The outcomes page says the harness provisions a grader automatically, and that grader runs in a separate context window from the main agent. It returns a criterion-by-criterion breakdown of what passed and what is still missing.
That moves the product a step closer to workflow execution than chat orchestration. Anthropic is not just hosting an agent loop, it is hosting the judge for when the loop can stop.
Anthropic also used Claude Code as the front door. According to Lance Martin's launch thread, the easiest starting path is updating Claude Code and running a managed-agents onboarding command from inside the CLI.
The quickstart mirrors that product shape with three core objects:
That is a neat packaging move. Anthropic is turning the same agent ergonomics it taught in Claude Code into an API product with hosted runtime underneath.
The first partner demo already points at the workload Anthropic wants. Aaron Levie's post shows Box wiring Managed Agents into document review, data extraction, and content workflows through the Box API or MCP.
That matters because it is more specific than the launch post's generic "agents at scale" line. The earliest public example is background knowledge work over enterprise content, with Box pitching setup in minutes rather than months of infrastructure buildout.