Durable Execution
Checkpointing, resumability, long-running agent workflows.
Stories
Filter storiesBuilders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.
Codex can now hand off an in-progress thread between local and remote machines and bring it back later. It matters because the handoff carries Git history, branches, and uncommitted changes while leaving the destination checkout untouched.
Vercel introduced eve in public preview with durable workflows, sandboxed compute, subagents, and evals. It also added Connect and Passport for scoped tokens and identity-gated deployments, giving teams one path for runtime, auth, and enterprise access control.
Flue 1.0 Beta reorganizes the framework around workflows, autonomous agents, and channel connectors while keeping model-agnostic deployment. The release gives TypeScript teams a more opinionated base for durable, long-running agents.
Vercel opened a preview that lets Functions run for up to 30 minutes on its Fluid microVM compute platform. Use it for longer-running server tasks without moving to a separate runtime product.
OpenAI said it will acquire Ona and fold its secure cloud execution and orchestration stack into Codex. The change targets agent jobs that need to keep running for hours or days after the original laptop session ends.
Anthropic opened scheduled deployments and environment-variable vaults in Claude Managed Agents public beta, and Dynamic Workflows is now generally available in Claude Code. The update adds cron-style jobs, secret injection, and deeper parallel orchestration for long-running agents.
LangSmith added sandboxed execution, spend-aware gateway routing, and Engine to surface recurring agent failures from traces. The bundle gives teams one place to run agents, control token spend, and turn production issues into debugging and eval loops.
Multiple agent-infra vendors shipped copy-on-write branches, checkpoints, snapshots, forks, or rollback primitives on the same day. That matters because long-running agents can now explore, retry, and recover state without relying only on Git or full sandbox rebuilds.
Conductor moved its parallel coding agents from local-only execution onto Vercel Sandboxes. That matters because teams can run isolated remote agent workspaces with near-local startup and feedback instead of depending on a developer laptop.
Cognition added a desktop control surface that can run Devin, Codex, Claude, and other ACP-compatible agents across local and cloud contexts. The app turns Devin from a single hosted agent into a broader orchestration surface.
Independent Codex users published Obsidian memory setups, reusable skill prompts, auto-triage flows, and Cloudflare-backed runners for longer jobs. That matters because Codex is being wrapped into persistent workspaces and operator-defined subagents instead of one-shot chats.
LangChain opened a private beta for Managed Deep Agents, a model-agnostic deployment layer built on deepagents with durable execution, sandboxes, and a context hub. The release turns deep-agent rollout into a single config-and-deploy flow and adds an auth proxy boundary for agent actions.
Manus upgraded scheduled work so recurring jobs can continue inside the same task and drive background updates in Manus-built web apps. That matters because long-lived automations can retain context between runs instead of rebuilding state each time.
Kilo Code posted two cloud-agent automations: a webhook-driven CVE patch flow that opens PRs in parallel and a post-deploy smoke test that checks health, 2xx responses, and latency under 2 seconds. This matters because the examples show coding agents moving into CI-style remediation and production verification loops.
OpenAI documented Codex remote connections, letting the ChatGPT app point at a separate Codex host such as a Mac mini or rented VPS. Try it for long runs that need to stay alive off-device or for phone-first coding sessions.
Days after `/goal` workflows first surfaced, users showed the command also works in the Codex app and shared runs for SSH setup, mech-interp scripts, and recurring work that lasted hours or days. The evidence points to Codex being used as a long-running research and ops agent, though the app still lacks explicit `/goal` UI.
LangChain unveiled SmithDB, LangSmith Engine, Managed Deep Agents, and GA sandboxes at Interrupt. The stack gives agent teams a purpose-built trace database, autonomous failure triage, and managed execution environments for production workflows.
holaOS shipped Beta 0.1, adding Multi Workspaces, Sub Agents, a dashboard, and a kickoff flow on top of its agent-computer base. The release targets long-running workstreams that need persistent context instead of one-chat sessions.
OpenAI staff said /goal is now available in the Codex app, and users posted long-running runs that fixed React Doctor scores, built iOS features, and queued weekend tasks. The update moves Codex from CLI-only planning to persistent, steerable work sessions.
Crabbox 0.11.0 shipped a Google Cloud provider, repo-local job workflows, AWS Windows WSL2 hydration, and a Blacksmith sync-stall guard. Recent Codex and OpenClaw posts show Crabbox already being used for reproducible bug repro and recorded QA before-and-after runs.
OpenAI reports Codex can now keep pursuing a goal until an end state and is adding remote control plus a usage tab. The update matters because Codex sessions can span longer tasks and be managed across devices with less manual babysitting.
Manus introduced Cloud Computer, an always-on cloud machine available on web and mobile for paid personal plans. It lets agents keep running Slack, Discord, and Telegram bots, databases, and scheduled jobs after the user's laptop is offline.
ElectricSQL launched Electric Agents, treating agents as long-lived data entities that sync across shared coding sessions, swarms, and branches. The release matters for teams building collaborative agent systems that need durable state and coordination primitives, not just one-shot task runners.
Mistral Studio added a Workflows orchestration layer that tracks state, retries, branches, and human approvals in public preview. That lets long-running agent flows resume after failures instead of restarting from scratch.
OpenCode 1.4.11 beta lets sessions run inside git worktrees or remote environments, with a remote server that keeps sessions alive and resyncs locally after reconnects. Use it if you run multi-session agent work across machines or plugin-defined runtimes.
OpenAI updated the Agents SDK with sandbox execution, memory controls and run snapshotting, and launch partners Vercel, Modal, E2B and Daytona shipped integrations. Long-running agents can now keep files, credentials and execution state in isolated runtimes instead of wiring harness, compute and storage layers together manually.
Windsurf 2.0 launched with Devin embedded into the product, combining local agents with cloud agents that can continue across codebases after you close the laptop. The IDE now acts as a handoff layer between interactive edits and long-running remote execution.
Anthropic introduced Claude Code Routines, a cloud-run automation layer that can execute on schedules, API calls, and GitHub events. The rollout moves scheduling from local runs to hosted, persistent automation and adds new trigger surfaces for plan-wide use.
Open Agents open-sources a browser-based cloud coding platform that keeps sessions running in parallel after a laptop closes. Use the reference stack if you want sandboxed VMs, model routing, and durable execution for internal coding-agent systems.
LangChain launched Deep Agents Deploy in beta as a production path for open, model-agnostic agent harnesses configured with AGENTS.md, skills, and mcp.json. Deployments run on LangSmith and can expose MCP, A2A, and agent protocol while teams choose models and sandbox providers.
Rivet introduced agentOS, an embedded agent runtime built on WASM and V8 isolates with backend embedding, mounted filesystems, and built-in orchestration. If you run agents in production, compare it against separate sandbox infrastructure.
Claude Code can now run recurring prompts and background pull-request work on Anthropic-managed cloud environments from the web, desktop, or `/schedule`. That makes long-running repo tasks less dependent on a local machine, but users report task caps and restricted egress.
OpenAI says Responses API requests can reuse warm containers for skills, shell, and code interpreter, cutting startup times by about 10x. Faster execution matters more now that Codex is spreading to free users, students, and subagent-heavy workflows.
Cognition now lets Devin turn a one-off task into a recurring workflow on a schedule. It pushes Devin further from ad hoc sessions toward unattended maintenance jobs, which is useful for teams already trusting it with repetitive repo work.
Hankweave shipped budget controls that cap spend, tokens, and elapsed time globally or per step, including loop budgets and shared pools. Use them to prototype or productionize long agent runs without hand-managing model switches and failure states.