Agent Infrastructure
Backend primitives and platform services designed for autonomous agents as the primary consumer — agent-native storage, sandboxes, queues, and runtime infra.
Stories
Filter storiesTanStack AI added MCP support for single or multiple servers, standalone clients or pooled servers, and a CLI for type generation. The release gives app builders a typed integration path for MCP-managed tools inside chat and agent workflows.
Vercel made the skills.sh API generally available, exposing more than 600,000 skills as a registry-style service for agents and platforms. The launch gives teams a discoverable capability layer for reuse across agent surfaces.
Browser Use launched synced cloud profiles for logged-in sessions, added geo-targeted proxies, and showed a 484-browser startup demo that finished in under two seconds. The update matters because hosted browser agents can now keep authenticated state and regional routing without custom session-management work.
Weaviate introduced Engram, a dedicated agent memory service with async writes, semantic topic grouping, tenant scopes, and composable pipelines. It matters because teams can add a hosted memory layer for agent stacks without stitching custom memory workflows into each application.
LangSmith added sandboxed execution, spend-aware gateway routing, and Engine to surface recurring agent failures from traces. The bundle gives teams one place to run agents, control token spend, and turn production issues into debugging and eval loops.
CopilotKit shipped v1.59.2 with threads, Vue packages, a React Native SDK, and updated AG-UI building blocks for fullstack agent apps. The release makes it easier to ship Cursor- and Claude-like interfaces, with new work extending generative UI into Slack, Teams, and other chat surfaces.
Multiple agent-infra vendors shipped copy-on-write branches, checkpoints, snapshots, forks, or rollback primitives on the same day. That matters because long-running agents can now explore, retry, and recover state without relying only on Git or full sandbox rebuilds.
NVIDIA released Cosmos 3 as an open omnimodel family with 16B and 64B variants, plus code, datasets, and a coalition around physical AI. The release matters because it ships with serving support and top open-weight image and video rankings, so teams can use it beyond a research teaser.
OpenAI made GPT-5.4, GPT-5.5, and Codex generally available through Amazon Bedrock. AWS shops can now use OpenAI models inside existing IAM, compliance, and procurement workflows instead of adopting a separate vendor stack.
Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.
Perplexity replaced one-shot search calls with Search as Code, a Python-based search runtime in its Agent API that is also now the default in Computer. The change matters because agents can batch, rank, filter, and aggregate search steps inside code, and Perplexity says the system scored 0.386 on WANDR versus 0.152 for the next system.
OpenAI shipped a Python SDK and app-server support for Codex with thread creation, streamed turns, session resume, image inputs, and sandbox controls. That gives teams a supported way to embed Codex inside internal tools and automation instead of driving it only through the CLI or desktop app.
Browser Use rebuilt its runtime around a custom Chromium fork, Firecracker fork, and custom Linux kernel, claiming $0.02 per hour pricing with subsecond cold starts. The shift targets the infrastructure bottlenecks behind browser agents rather than model quality alone.
Independent developers shipped sidecars that let Claude Code, Cursor, and Codex share memory, hot-swap model providers, package local projects as apps, and automate browser QA. Try these reusable tools if you want memory, routing, QA automation, and app packaging outside editor-specific features.
CopilotKit shipped an AG-UI integration that streams Claude Agent SDK agents into web and mobile frontends with generative UI and approval checkpoints. The adapter lets teams embed terminal-first Claude agents in React, Vue, Angular, and React Native without rewriting transport or state plumbing.
Builders released a chat-first Web UI and a multi-agent Control Room template around Hermes Agent, while core updates cut read_file input tokens by 14% and fixed TUI startup hangs. Use the new controls to manage local multi-agent setups while reducing routine token burn.
Prime Intellect launched Hosted Evaluations to manage harnesses, sandboxes, and rollout inspection for model testing. The service packages eval infrastructure while still supporting local runs against arbitrary engines, so teams can centralize testing without losing flexibility.
Gemini Managed Agents can spin up a sandboxed Linux environment with code execution, web access, and file I/O from one API call, and early examples now include W&B and LlamaIndex workflows. That gives builders a higher-level runtime for long tasks while third-party templates start to define the first production use cases.
Vercel Sandbox can now build and run Docker containers, persist images and installs across sessions, and host databases or full apps inside the sandbox. That broadens what coding agents and preview environments can validate without leaving Vercel.
Nous Research released Hermes Agent v0.15.0 with skill bundles, MCP Catalog, new model support, and major performance and security work. The update cuts load times 50%, speeds session search 750x, and adds Bitwarden plus prompt-injection defenses.
Hermes Agent added a built-in MCP Catalog while separate builders shipped Qwen3.7 Max support, Venice private-model workflows, and Krea 2 image generation. The cluster shows Hermes moving beyond a single-model assistant toward a broader agent shell with tool, model, and media providers.
Trajectory launched a platform that turns agent traces and user corrections into post-deployment model updates instead of prompt-only fixes. Baseten and Tinker described live A/B post-training, 397B-model deployment work, and an off-policy recipe for stabilizing the loop.
Cua Driver said its Windows backend is now stable, letting Claude Code, Codex, Hermes, or custom agents drive real Windows apps through MCP or CLI. The release targets Windows-only line-of-business software while keeping the desktop usable with multi-pointer support.
Firecrawl is now available through Vercel Marketplace and Agent Marketplace for apps and agents that need live web data. The integration reduces setup friction for teams adding scraping, search, and structured retrieval to deployed AI workflows.
A new MeMo paper and several community memory systems converged on keeping knowledge outside the base model through recipe files, semantic and autobiographical stores, and background reconsolidation. The pattern matters because engineers are treating context loss as a systems problem instead of only asking for larger context windows.
Datasette 1.0a30 introduced a slash-triggered Jump To menu plus a hook for plugin-supplied search items. Simon Willison used it in datasette-agent 0.1a4 to start agent chats from the same menu, so plugin authors can wire in their own actions.
LangChain opened a private beta for Managed Deep Agents, a model-agnostic deployment layer built on deepagents with durable execution, sandboxes, and a context hub. The release turns deep-agent rollout into a single config-and-deploy flow and adds an auth proxy boundary for agent actions.
Anthropic added self-hosted sandboxes in public beta and MCP tunnels in research preview to Claude Managed Agents. Use the new options to keep agent execution inside your perimeter or private cloud and reach internal MCP servers without public exposure.
Google launched Antigravity 2.0 as a desktop app plus CLI/SDK stack for multi-agent workflows, and added Managed Agents to the Gemini API with persistent Linux sandboxes. Try it for agent orchestration and API-based sandboxing, but verify harness costs and runtime fit.
Warp launched Oz orchestration across Claude Code, Codex, and Warp Agent, with subagent delegation, isolated worktrees or containers, and beta multi-harness control. Try the new '&' handoff and Agent Memory if you run long sessions that need cloud continuation.
A day after leaks previewed Spark, Google officially launched Gemini Spark as a persistent personal agent that runs on dedicated cloud VMs and will connect to MCP tools. It matters because Google is moving Gemini from chat responses toward long-running delegated work across consumer and enterprise surfaces.
Anthropic said it is acquiring Stainless, the SDK and MCP server platform behind Anthropic’s own official SDKs across major languages. The deal matters because Anthropic is bringing a key part of its API and agent-connectivity toolchain in-house while developers reassess alternative codegen stacks.
Files SDK 1.4 shipped nine new storage adapters, a CLI for agents, an installable skill, and optional peer dependencies. The update broadens storage coverage while sharply shrinking install weight, though adapter dependencies now need explicit installation.
Practitioners said skills and workflows were porting from OpenClaw to Hermes Agent with fewer surprises around approvals, job control, and mobile use. That matters because teams choosing a self-hosted agent stack are now comparing operational clarity and migration friction, not just model support.
OpenClaw added end-to-end RTT tests and new auditable guardrails while community builders shipped Clawpatch, credential brokers, and ARC harnesses. The stack now has clearer safety and benchmarking primitives for long-lived coding agents.
Files SDK 1.3 shipped 12 new storage adapters, an exists() helper, and a Files.file(key) handler. It expands the number of storage backends agents and sandboxed jobs can address through one file abstraction.
Anthropic will move Claude Agent SDK, claude -p, GitHub Actions, and third-party agent apps onto separate monthly credits on June 15. Watch the new bucket closely, since it changes the cost model for autonomous runs and subscription-backed harnesses.
LangChain unveiled SmithDB, LangSmith Engine, Managed Deep Agents, and GA sandboxes at Interrupt. The stack gives agent teams a purpose-built trace database, autonomous failure triage, and managed execution environments for production workflows.
Notion opened a developer platform with an External Agents API plus Workers, webhooks, and a headless CLI. The release lets external agents query Notion, extend workflows, and stay in sync with other systems.
holaOS shipped Beta 0.1, adding Multi Workspaces, Sub Agents, a dashboard, and a kickoff flow on top of its agent-computer base. The release targets long-running workstreams that need persistent context instead of one-chat sessions.
Cursor added reusable cloud development environments for agents with multi-repo setup, rollback, and scoped secrets. The update moves cloud agents closer to laptop-style setups while keeping long-running work isolated and auditable.
OpenAI launched the OpenAI Deployment Company and tied it to Tomoro’s acquisition, giving the unit 150 forward-deployed engineers and $4 billion in initial backing from 19 partners. It matters because OpenAI is packaging services, deployment help, and organizational integration as part of the product stack instead of leaving enterprise rollout to outside consultancies.
Anthropic made Claude Platform on AWS generally available, exposing the native Claude API with AWS authentication, billing, CloudTrail, and commitment retirement. It lets teams use Managed Agents and related Claude features inside existing AWS governance workflows.
Independent developers shipped new control-plane tools for long-running coding agents, including Agent FM audio monitoring, Mate phone-first remote control, and ntm for provider-agnostic multi-agent workflows. It matters because teams running many Claude Code and Codex sessions still need better visibility, handoff, and checkpointing than a single built-in session list provides.
Files SDK launched a unified storage API across 18 backends including S3, R2, Vercel Blob, and Google Drive. It also ships tool bindings for OpenAI, Vercel AI, and Claude agent SDKs across Node, Bun, Deno, edge runtimes, and browsers.
Builders shipped pi-treebase, a Miko voice mode for pi-listens, devrage support, and a Japanese OpenCode Go guide after the first Pi extension burst. The releases arrive as Pi’s provider abstraction gets stress-tested by OpenClaw-scale multi-provider use.
Hyperbrowser shipped a CLI that exposes sandbox lifecycle, web fetch/search/crawl, and snapshotting from the terminal. The tool matters because it turns browser automation and forkable state into shell primitives for agent workflows.
Google is replacing the Gemini Interactions API’s older outputs-and-roles structure with a steps schema for multi-step agent workflows. The change matters because SDK upgrades, migration work, and schema assumptions in existing tooling may break before the new interface reaches GA.
OpenAI updated its Agents SDK with TypeScript support, sandbox agents, and an open-source harness. The release broadens support for JS workflows and gives teams a standard way to run isolated agents.
Raindrop launched Triage, a Slack-based agent that finds traces, summarizes recurring failures, runs recurring briefs, and opens experiments from production conversations. Teams using Claude Code, Cursor, or Devin can plug it into agent ops to shorten debugging loops.