Skip to content
AI Primer
TOPIC45 stories

Agent Pattern

How-to / design-pattern / best-practice stories about building or operating coding agents (delegation depth, harness design, control surfaces).

RELEASE15th April
OpenAI Agents SDK adds sandbox execution and memory controls with Vercel, Modal, E2B and Daytona

OpenAI updated the Agents SDK with sandbox execution, memory controls and run snapshotting, and launch partners Vercel, Modal, E2B and Daytona shipped integrations. Long-running agents can now keep files, credentials and execution state in isolated runtimes instead of wiring harness, compute and storage layers together manually.

NEWS11th April
Meerkat reports harness-level cheating across 28+ submissions on nine agent benchmarks

Meerkat and Berkeley RDI audits said popular agent leaderboards were inflated by harness-level leakage and eval gaming, with one cleaned entry dropping from first to 14th. That makes published coding-agent rankings and benchmark comparisons less reliable, so treat leaderboard results with caution.

NEWS11th April
Vercel Sandbox benchmarks sub-500 ms node -v cold starts

Vercel said Sandbox is now the fastest microVM-based runtime, with fresh node -v cold starts now largely under 500 ms after a month of tuning. The update also puts persistent sandboxes into beta and expands plans for a programmable firewall, so teams should re-check runtime and security settings.

NEWS1w ago
ClawShop launches OpenClaw resources with SecretRef and PinchBench

Kilo Code’s ClawShop recap bundled a 30-minute KiloClaw setup workshop, SecretRef credential handling, searchable ClawBytes guides, and PinchBench for agentic performance. The event, OpenClaw 2026.4.10, and PetClaw together added new security, memory, budgeting, and desktop layers around the OpenClaw stack.

RELEASE1w ago
Anthropic launches Claude Managed Agents public beta with hosted sandboxes and outcome-based runs

Anthropic put Claude Managed Agents into public beta with hosted sandboxes, vaults, memory filesystems, and long-running sessions. Use the managed setup if you want explicit controls for tools, credentials, and completion criteria instead of custom harness code.

RELEASE1w ago
OpenClaw 2026.4.7 adds a headless inference hub, memory-wiki, and webhook TaskFlows

OpenClaw 2026.4.7 adds a headless inference hub, memory-wiki, session branch and restore, and webhook-driven TaskFlows. Composio also shipped a CLI for secure app authentication, so users can expand OpenClaw from a local coding harness into a broader agent runtime.

WORKFLOW1w ago
Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

Bram Cohen used the Claude Code leak to argue that prompt-only development produces bad software, while a separate 250-hour syntaqlite build said the durable version arrived only after a Python-to-Rust rewrite. Practitioners say specs, tests, linters, repo skills, and codebase context are the controls that keep coding agents maintainable.

NEWS1w ago
OpenClaw adds direct Claude Code and ClawHub listener routes

Builders shipped a direct Claude Code harness and a ClawHub marketplace skill for OpenClaw workflows. Use these routes to wire agent tooling into OpenClaw, but watch Claude API limits and token burn costs.

NEWS2w ago
Anthropic cuts Claude subscription access for third-party harnesses in Apr. 4 rollout

Anthropic’s Apr. 4 cutoff for using Claude subscriptions through OpenClaw-class harnesses went live. Users report API-billing fallbacks, ACP workarounds, and restored Claude Code quota, while edge cases around claude -p and Agent SDK use remain unsettled. The change pushes heavy agent loops toward metered access.

RELEASE2w ago
Hermes Agent adds /claude-code orchestration and cron hooks

Hermes Agent added direct /claude-code orchestration and cron-time script hooks, and the team also shipped Hermes-focused datasets and agent-tuned model variants. The update turns Hermes into a harness that can steer Claude Code and inject recurring context automatically.

WORKFLOW2w ago
Imbue publishes mngr workflow for 100-agent self-testing with Modal scale-out

Imbue published a walkthrough for mngr showing how it turns tutorial scripts into pytest cases, runs many agents in parallel, and merges fixes back into one branch. The case study offers a repeatable pattern for evaluating agent tools, so teams can borrow the tmux capture, artifact dashboards, and local-to-Modal handoff.

WORKFLOW2w ago
Claude Code adds /loop, /teleport, and /batch workflow guidance in Boris Cherny guide

A Boris Cherny guide maps Claude Code mobile sessions, /teleport, /loop, hooks, worktrees, /batch, and custom agents into one workflow set. Use it to turn scattered commands into repeatable patterns for long-running coding sessions across terminal, desktop, and cloud.

RELEASE2w ago
OpenClaw 2026.3.28 adds 9 MCP tools and Responses API support

OpenClaw 2026.3.28 exposes messaging and event handling as nine MCP tools, adds Responses API support, and lets plugins request permission during browser use. Use it to separate transport from agent logic so Claude Code, Codex, Cursor, and local harnesses can share the same account with less glue.

NEWS3w ago
ATLAS benchmarks Qwen3-14B at 74.6% LiveCodeBench on one RTX 5060 Ti

The ATLAS harness says a frozen Qwen3-14B Q4 model on one RTX 5060 Ti reached 74.6% pass@1-v(k=3) on LiveCodeBench v5 through multi-pass repair and selection. The result shifts comparison toward harness design, though HN commenters note it is not a one-shot head-to-head with hosted frontier models.

RELEASE3w ago
Cline launches Kanban with worktree-linked parallel CLI agents

Cline launched Kanban, a local multi-agent board that runs Claude, Codex, and Cline CLI tasks in isolated worktrees with dependency chains and diffs. Teams can use it as a visual control layer for parallel coding agents on repo chores that split cleanly.

RELEASE3w ago
Codex launches plugins for Slack, Figma, Gmail, and Google Drive

OpenAI rolled out Codex plugins across the app, CLI, and IDE extensions, with app auth, reusable skills, and optional MCP servers. Teams should test plugin-backed workflows and permission models before broad rollout.

RELEASE3w ago
Imbue launches Latchkey: local agents call HTTP APIs without exposing tokens

Imbue released Latchkey, a library that prepends ordinary curl calls so local agents can use SaaS and internal APIs while credentials stay on the developer machine. Try it where agents need many HTTP integrations but should not see raw secrets.

RELEASE3w ago
OpenCode adds remote sandboxes and syncs agent state across devices

OpenCode is adding remote sandboxes, synced state across laptop, server, and cloud, and more product surface inside its plugin system. That makes long-running off-laptop workflows more practical, but operators should still review telemetry, sandbox, and exposure defaults.

RELEASE3w ago
Claude Code releases 2.1.84: PowerShell preview, task hooks, idle-return clearing

Claude Code 2.1.84 adds an opt-in PowerShell tool, new task and worktree hooks, safer MCP limits, and better startup and prompt-cache behavior. Anthropic also documented auto mode’s action classifier and added iMessage as a channel, so teams should review permissions and remote-control workflows.

RELEASE3w ago
Expect launches CLI to QA apps in a real browser and record bug videos

Expect wraps browser QA for Claude Code, Codex, or Cursor into a CLI that records bug videos and feeds failures back into a fix loop. It gives coding agents a tighter UI validation cycle without requiring a custom browser harness.

RELEASE3w ago
OpenClaw releases 2026.3.24 with Teams, OpenWebUI, and skill-control UI

OpenClaw 2026.3.24 adds native Microsoft Teams, OpenWebUI sub-agent access, Slack reply buttons, and a control surface for skills and tools. The release expands where the runtime can plug into enterprise workflows, while also increasing the surface area teams need to secure.

RELEASE3w ago
Cursor adds Instant Grep: 13ms regex search across millions of files

Cursor shipped Instant Grep, a local regex index built from n-grams, inverted indexes, and Bloom filters that drops large-repo searches from seconds to milliseconds. Faster candidate retrieval shortens the coding-agent loop, especially when ripgrep-style scans become the bottleneck.

RELEASE3w ago
OpenClaw ships 2026.3.22 with ClawHub marketplace and OpenShell SSH sandboxes

OpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.

NEWS3w ago
OpenHands benchmarks EvoClaw and caps continuous-evolution scores at 38.03%

OpenHands introduced EvoClaw, a benchmark that reconstructs milestone DAGs from repo history to test continuous software evolution instead of isolated tasks. The first results show agents can clear single tasks yet still collapse under regressions and technical debt over longer runs.

RELEASE3w ago
Agent Computer launches cloud computers in under 0.5s with SSH access

Agent Computer launched cloud desktops that boot in under half a second and expose persistent disks, shared credentials, SSH access, and ACP control for agents. It gives coding agents a faster place to run tools and reuse auth, but teams still need to design safe session and credential boundaries.

NEWS3w ago
Vals AI updates SWE-Bench Verified harness to mini-swe-agent and score slips to 78.8%

Vals AI switched SWE-Bench Verified from SWE-Agent to the bash-only mini-swe-agent harness, aligning results more closely with the official benchmark setup. Top score dipped slightly to 78.8%, but the change reduces harness-specific confounds when comparing models.

WORKFLOW3w ago
LangChain launches Building Reliable Agents course with LangSmith loops

LangChain published a free course on taking agents from first run to production-ready systems with LangSmith loops for observability and evals. The timing lines up with new NVIDIA integration messaging, so teams can study process and stack choices together.

WORKFLOW4w ago
Agent Flywheel introduces beads-and-swarms workflow for 1,000 commits a day

Agent Flywheel lays out a planning-first workflow built on beads, agent mail, swarms, and TUI inspection for very large coding runs. It is useful because the guide exposes coordination primitives and review loops, not just benchmark screenshots.

WORKFLOW4w ago
Autoresearch claims 2718 Elo after 70 experiments on a Rust chess engine

A developer says an autoresearch loop hill-climbed a vibecoded Rust engine to 2718 Elo after running more than 70 experiments under a 500 ms move budget. The real takeaway is the workflow: automated experiment loops can optimize code against a measurable target.

RELEASE4w ago
Conductor adds plan mode, fast mode, and skills for Codex workflows

Conductor now bundles plan mode, fast mode, skills, repo quick start, and an experimental merge-conflict UI around Codex sessions. Try it if you want a higher-level harness for long-running code agents, but watch the foreground chat UX on larger tasks.

RELEASE4w ago
ACE launches self-improving AGENTS.md playbooks for code factories

ACE open-sources a platform that turns AGENTS.md instructions into evolving playbooks backed by execution history, with hosted and self-hosted options. It is a notable response to prompt drift and prompt extraction, because procedures become revisable operating docs instead of static prompts.

WORKFLOW4w ago
OpenHands compares 3 skill tasks and finds some reduce agent pass rates

OpenHands published a skill-eval recipe with bounded tasks, deterministic verifiers, and no-skill baselines, then showed some skills speed agents up while others make them brittle. Teams shipping skill libraries should measure them per task and model before rollout.

RELEASE4w ago
Imbue releases Offload to split Playwright runs across 200 Modal sandboxes

Imbue open-sourced Offload, a Rust CLI that spreads test suites across local or Modal sandboxes from one TOML config. It is useful when agent-heavy teams are bottlenecked on verification instead of generation, especially in browser or CI-heavy stacks.

RELEASE4w ago
Devin adds managed Devins for parallel VM task execution

Cognition updated Devin so one session can break down large work and delegate subtasks to worker Devins running in separate VMs. It matters for audits, migrations, and QA runs where one long-context agent is slower than explicit parallelism.

RELEASE4w ago
Morph launches FlashCompact: 33k tok/s compaction from 200k to 50k in 1.5s

Morph released FlashCompact, a specialized compaction model and SDK for coding agents, claiming 33k tokens per second and near-invisible long-context compression. Use it or copy the approach if compaction latency and noisy tool output are blocking longer agent runs.

RELEASE4w ago
OpenAI Codex adds subagents for parallel tasks in app and CLI

OpenAI rolled out native subagents in Codex so a main agent can spawn specialized parallel threads and return results to one session. Try it for larger code reviews and feature builds where you want to split work without polluting the main context.

RELEASE4w ago
Factory launches Analytics to tie tokens, tool calls, commits, and PRs to software output

Factory released an analytics layer for teams deploying coding agents, surfacing usage, tool calls, activity, and productivity from tokens through pull requests. Use it if you need ROI, readiness, and cost visibility as agent adoption scales.

RELEASE4w ago
Hyperbrowser releases HyperSkill to turn live docs into SKILL.md trees

Hyperbrowser open-sourced HyperSkill, which reads live documentation and emits a structured SKILL.md file or graph an agent can navigate. Try it to replace hand-written tool instructions with generated skill trees you can drop into an agent project.

RELEASE4w ago
OpenClaw-RL releases fully asynchronous online training with OPD for live agents

OpenClaw-RL released a fully asynchronous online training stack that turns live interaction feedback into ongoing agent updates with binary rewards and token-level OPD corrections. Use it as a starting point for online agent improvement only if you can score rollouts reliably and manage privacy risk.

NEWS1mo ago
OpenAI reports Responses API runtime uses compaction, proxy egress, and reusable skills

OpenAI published runtime details for the Responses API computer environment, including shell loops, capped output, automatic compaction, proxied outbound traffic, and reusable skills folders. Use it as a reference architecture for hosted agents that need state, safety controls, and tool execution patterns.

NEWS1mo ago
Terminal-Bench 2.0 removes OpenBlocks after cheating verification

Terminal-Bench maintainers said they independently verified cheating claims and removed OpenBlocks from the 2.0 leaderboard. Audit submission artifacts and harness details before relying on public coding-agent rankings.

NEWS1mo ago
OpenClaw-RL reports continuous agent training from user corrections and next-state signals

The OpenClaw-RL paper proposes training agents continuously from normal interactions by turning user corrections, logs, and next-state feedback into rewards and word-level supervision. Watch it if you build persistent agents and want adaptation to come from live deployment traces instead of offline labeling.

NEWS1mo ago
Cursor publishes CursorBench to compare coding models on intelligence and token efficiency

Cursor published its internal benchmarking approach and reported wider separation between coding models than SWE-bench-style leaderboards show. Use it as a reference for production routing decisions, but validate results against your own online traffic and task mix.

WORKFLOW1mo ago
Karpathy releases autoresearch after nanochat cuts Time to GPT-2 by 11%

Andrej Karpathy open-sourced autoresearch, a minimal agent loop for automated ML research, and reported roughly 20 additive changes that reduced nanochat’s Time to GPT-2 from 2.02 hours to 1.80 hours. Research teams can use it as a concrete recipe for closed-loop experimentation on any metric with cheap proxy evaluations.

RELEASE1mo ago
OpenAI adds phase parameter to GPT-5.4 for commentary and final answers

OpenAI documented a new response field that separates in-progress commentary from terminal answers in GPT-5.4 turns, with guidance for replaying those messages in follow-up calls. Agent builders can stream status updates without mixing them into final model output.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.