Durable Execution
Checkpointing, resumability, long-running agent workflows.
Stories
Filter storiesLangChain unveiled SmithDB, LangSmith Engine, Managed Deep Agents, and GA sandboxes at Interrupt. The stack gives agent teams a purpose-built trace database, autonomous failure triage, and managed execution environments for production workflows.
holaOS shipped Beta 0.1, adding Multi Workspaces, Sub Agents, a dashboard, and a kickoff flow on top of its agent-computer base. The release targets long-running workstreams that need persistent context instead of one-chat sessions.
OpenAI staff said /goal is now available in the Codex app, and users posted long-running runs that fixed React Doctor scores, built iOS features, and queued weekend tasks. The update moves Codex from CLI-only planning to persistent, steerable work sessions.
Crabbox 0.11.0 shipped a Google Cloud provider, repo-local job workflows, AWS Windows WSL2 hydration, and a Blacksmith sync-stall guard. Recent Codex and OpenClaw posts show Crabbox already being used for reproducible bug repro and recorded QA before-and-after runs.
OpenAI reports Codex can now keep pursuing a goal until an end state and is adding remote control plus a usage tab. The update matters because Codex sessions can span longer tasks and be managed across devices with less manual babysitting.
Manus introduced Cloud Computer, an always-on cloud machine available on web and mobile for paid personal plans. It lets agents keep running Slack, Discord, and Telegram bots, databases, and scheduled jobs after the user's laptop is offline.
ElectricSQL launched Electric Agents, treating agents as long-lived data entities that sync across shared coding sessions, swarms, and branches. The release matters for teams building collaborative agent systems that need durable state and coordination primitives, not just one-shot task runners.
Mistral Studio added a Workflows orchestration layer that tracks state, retries, branches, and human approvals in public preview. That lets long-running agent flows resume after failures instead of restarting from scratch.
OpenCode 1.4.11 beta lets sessions run inside git worktrees or remote environments, with a remote server that keeps sessions alive and resyncs locally after reconnects. Use it if you run multi-session agent work across machines or plugin-defined runtimes.
OpenAI updated the Agents SDK with sandbox execution, memory controls and run snapshotting, and launch partners Vercel, Modal, E2B and Daytona shipped integrations. Long-running agents can now keep files, credentials and execution state in isolated runtimes instead of wiring harness, compute and storage layers together manually.
Windsurf 2.0 launched with Devin embedded into the product, combining local agents with cloud agents that can continue across codebases after you close the laptop. The IDE now acts as a handoff layer between interactive edits and long-running remote execution.
Anthropic introduced Claude Code Routines, a cloud-run automation layer that can execute on schedules, API calls, and GitHub events. The rollout moves scheduling from local runs to hosted, persistent automation and adds new trigger surfaces for plan-wide use.
Open Agents open-sources a browser-based cloud coding platform that keeps sessions running in parallel after a laptop closes. Use the reference stack if you want sandboxed VMs, model routing, and durable execution for internal coding-agent systems.
LangChain launched Deep Agents Deploy in beta as a production path for open, model-agnostic agent harnesses configured with AGENTS.md, skills, and mcp.json. Deployments run on LangSmith and can expose MCP, A2A, and agent protocol while teams choose models and sandbox providers.
Rivet introduced agentOS, an embedded agent runtime built on WASM and V8 isolates with backend embedding, mounted filesystems, and built-in orchestration. If you run agents in production, compare it against separate sandbox infrastructure.
Claude Code can now run recurring prompts and background pull-request work on Anthropic-managed cloud environments from the web, desktop, or `/schedule`. That makes long-running repo tasks less dependent on a local machine, but users report task caps and restricted egress.
OpenAI says Responses API requests can reuse warm containers for skills, shell, and code interpreter, cutting startup times by about 10x. Faster execution matters more now that Codex is spreading to free users, students, and subagent-heavy workflows.
Cognition now lets Devin turn a one-off task into a recurring workflow on a schedule. It pushes Devin further from ad hoc sessions toward unattended maintenance jobs, which is useful for teams already trusting it with repetitive repo work.
Hankweave shipped budget controls that cap spend, tokens, and elapsed time globally or per step, including loop budgets and shared pools. Use them to prototype or productionize long agent runs without hand-managing model switches and failure states.