Skip to content
AI Primer
TOPIC50 stories

Agent Pattern

How-to / design-pattern / best-practice stories about building or operating coding agents (delegation depth, harness design, control surfaces).

WORKFLOW30th May
Codex community ships /dynamic swarms, session lifecycles, and model routing

Builders added /dynamic orchestration, custom-model routing, and repo runbooks around Codex as users exposed new session lifecycle controls in the app. That makes Codex a better fit for long-running, multi-context coding work.

WORKFLOW30th May
Pi ecosystem adds /goal tasks, acceptance gates, and Lovely Dev Tools

Three independent Pi builders shipped a goal runner, contract-style subagent acceptance gates, and a new Lovely Dev Tools extension in the same window. That gives Pi users more deterministic long-running loops and cleaner local tool interfaces without starting from an empty harness.

WORKFLOW29th May
Conductor, CC Mirror, and Codex add Claude-style Dynamic Workflows

A day after Claude Code introduced Dynamic Workflows, builders shipped ports and clones for Codex, Conductor, and GLM-backed CC Mirror. The rapid ports turn the feature into a reusable orchestration pattern rather than an Anthropic-only runtime.

RELEASE28th May
Firecrawl launches /monitor webhooks with up to 90% lower token use

Firecrawl launched /monitor, a URL watcher that only pings agents when tracked pages actually change and can send results by webhook. Use it for change-only ingestion to cut LLM token spend on monitored pages.

WORKFLOW25th May
Microsoft benchmarks SkillOpt at +24.8 Codex points by editing skills, not weights

Microsoft Research released SkillOpt, which optimizes external skill files instead of fine-tuning model weights and reports best-or-tied results across 52 evaluation cells. The method matters because it improved Codex and Claude Code accuracy without extra inference-time calls.

WORKFLOW25th May
Developers compare red-green TDD, Hurl tests, and label-triggered agents for code verification

Practitioners published tests-first coding-agent workflows built around red-green TDD, Hurl suites, GitHub label actions, and Codex-based execution checks. The pattern matters because verification remains the main bottleneck once generation is fast, especially in longer multi-file sessions.

WORKFLOW1w ago
Codex users share /goal audits, mobile delegation, and Raspberry Pi workflows

Practitioners published reusable Codex workflows for project audits, memory-driven skill packaging, mobile delegation, and remote computer use. Try the prompt-and-steps patterns if you want to adapt Codex across repos and devices.

WORKFLOW1w ago
Agent Skills ecosystem ships handoff docs, htmx v4 packs, and Project Think support

Independent builders published reusable skills infrastructure across coding agents, including Project Think preview support, handoff docs, and an htmx v4 skill pack. That matters because skills are starting to work like portable workflow units instead of one-off prompt snippets inside a single tool.

WORKFLOW1w ago
Codex users ship durable-memory workspaces and auto-triage flows

Independent Codex users published Obsidian memory setups, reusable skill prompts, auto-triage flows, and Cloudflare-backed runners for longer jobs. That matters because Codex is being wrapped into persistent workspaces and operator-defined subagents instead of one-shot chats.

WORKFLOW1w ago
Agent Skills supports Codex, Cursor, Gemini CLI, and VS Code through new libraries and plugins

New guides, plugins, and reusable libraries show the Agent Skills format moving beyond Claude Code into multiple coding-agent clients and runtimes. That matters because workflows are becoming portable artifacts instead of one-off prompts tied to a single harness.

WORKFLOW1w ago
Codex users report iPhone simulator bug-bashes, Appshots form fills, and locked-Mac runs

Two days after Codex added locked-Mac control and Appshots, users posted end-to-end iPhone simulator debugging, Safari form-filling, and remote-control workflows. That matters because the feature is moving from launch copy into concrete computer-use tasks that can replace manual QA and repetitive UI work.

WORKFLOW1w ago
Lovable adds is_stuck pipeline with Overflow retrieval to cut stuck rate 5%

Lovable described a production loop where an is_stuck classifier detects repeated failures, Overflow injects past solution pairs, and send_feedback escalates real tool failures. The system lowered stuck rate 5% and raised publish rate 2%, so teams can use the same signal to debug outages and agent frustration.

WORKFLOW2w ago
Claude Code users launch `/goal`, Obsidian, and audit playbooks to fight long-session drift

Independent builders published Claude Code memory and workflow scaffolding, including a `/goal` prompting guide, Obsidian-backed knowledge capture, and audit tooling for long-running agents. This matters because context compaction and stale session memory are becoming practical bottlenecks for multi-session coding workflows.

WORKFLOW2w ago
Kilo Code introduces Cloud Agent CVE and smoke-test workflows with webhook triggers

Kilo Code posted two cloud-agent automations: a webhook-driven CVE patch flow that opens PRs in parallel and a post-deploy smoke test that checks health, 2xx responses, and latency under 2 seconds. This matters because the examples show coding agents moving into CI-style remediation and production verification loops.

WORKFLOW3w ago
Codex app adds /goal for long-running React Doctor and iOS runs

OpenAI staff said /goal is now available in the Codex app, and users posted long-running runs that fixed React Doctor scores, built iOS features, and queued weekend tasks. The update moves Codex from CLI-only planning to persistent, steerable work sessions.

WORKFLOW3w ago
Perplexity opens agent skills manual with 'Zen of Skills' rules for folder-based workflows

Perplexity published its internal manual for building agent skills and paired it with a research post about how those skills power products like Computer. The guide matters because it gives external builders concrete patterns for decomposing agent behavior into reusable skill folders instead of one-off prompts.

RELEASE4w ago
Cursor releases Team Kit with /verify-this, /loop-on-ci, and harness skills

Cursor's Team Kit packages internal skills like /verify-this, CLI and UI automation harnesses, PR cleanup, and /loop-on-ci, installable with /add-plugin cursor-team-kit. It turns several internal review and validation habits into reusable commands for agent-driven coding workflows.

WORKFLOW4w ago
Codex users report `/goal` sessions with 70-minute Stripe fixes and a 4,000-prompt cap

Users posted long-running Codex `/goal` sessions with auto-continuations, `pause`/`resume`, and file-backed goals. Watch the 4,000-prompt startup cap and early-stop drift if you plan to run longer agent loops.

WORKFLOW4w ago
Practitioners report harness playbooks with Playwright CLI, create_agent, and MCP

Builders shared concrete Symphony, create_agent, and MCP setup guides after arguing that model switching is easy but harness switching is not. The playbooks matter because they make harness engineering more repeatable, so teams can copy tested tooling and integration patterns.

WORKFLOW4w ago
LangChain adds Browserbase search, fetch, and browser subagents to Deep Agents

LangChain shipped a Browserbase integration that gives Deep Agents dedicated search, fetch, and browser subagents with dashboard observability. That turns web navigation into a first-class tool path for agent workflows instead of a custom one-off browser loop.

WORKFLOW1mo ago
mattpocock/skills ranks #1 on GitHub at 28K stars with `/grill-me` and `/tdd` packs

mattpocock/skills hit the top of GitHub Trending as reusable `SKILL.md` packs for grilling specs, writing PRDs, and enforcing TDD spread across coding-agent workflows. The format is starting to look like a distribution layer for agent behavior, with faster install tooling and third-party skills shipping around the same pattern.

WORKFLOW1mo ago
OpenRouter launches `create-headless-agent` for Bun-based multi-model CLIs

OpenRouter released a new skill and guide that scaffold a headless agent CLI on top of its Agent SDK. The template packages multi-model inference, tool calling, and Bun-based CLI setup into a reusable starting point.

WORKFLOW1mo ago
ClawSweeper closes 4,000 OpenClaw issues with 50 Codex agents in one day

Steipete’s maintainer bot ran 50 Codex agents in parallel and closed about 4,000 OpenClaw issues in a day. The cleanup pushed into rate limits, so use the README dashboard and Project Clowfish clustering to track large agent sweeps.

WORKFLOW1mo ago
Kilo Code opens Roo migration with --install-extension and AGENTS.md conversion

Kilo Code published a Roo Code migration path ahead of Roo’s May 15 archive, including one-command install, automated file renames, custom-agent conversion, and API key re-auth. Use the guide to map Roo modes, rules, MCP config, and checkpoints into Kilo’s agent and worktree model before the cutoff.

RELEASE1mo ago
CopilotKit launches Open Generative UI with openGenerativeUI: true

CopilotKit open-sourced Open Generative UI, a flag that lets agents stream interactive UI components directly into chat. The release packages a concrete alternative to raw-code UI generation into a reusable dev toolkit.

WORKFLOW1mo ago
Codex users report subagent, MCP, and canary deploy workflows

Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.

RELEASE1mo ago
OpenAI Agents SDK adds sandbox execution and memory controls with Vercel, Modal, E2B and Daytona

OpenAI updated the Agents SDK with sandbox execution, memory controls and run snapshotting, and launch partners Vercel, Modal, E2B and Daytona shipped integrations. Long-running agents can now keep files, credentials and execution state in isolated runtimes instead of wiring harness, compute and storage layers together manually.

NEWS1mo ago
Meerkat reports harness-level cheating across 28+ submissions on nine agent benchmarks

Meerkat and Berkeley RDI audits said popular agent leaderboards were inflated by harness-level leakage and eval gaming, with one cleaned entry dropping from first to 14th. That makes published coding-agent rankings and benchmark comparisons less reliable, so treat leaderboard results with caution.

NEWS1mo ago
Vercel Sandbox benchmarks sub-500 ms node -v cold starts

Vercel said Sandbox is now the fastest microVM-based runtime, with fresh node -v cold starts now largely under 500 ms after a month of tuning. The update also puts persistent sandboxes into beta and expands plans for a programmable firewall, so teams should re-check runtime and security settings.

NEWS1mo ago
ClawShop launches OpenClaw resources with SecretRef and PinchBench

Kilo Code’s ClawShop recap bundled a 30-minute KiloClaw setup workshop, SecretRef credential handling, searchable ClawBytes guides, and PinchBench for agentic performance. The event, OpenClaw 2026.4.10, and PetClaw together added new security, memory, budgeting, and desktop layers around the OpenClaw stack.

RELEASE1mo ago
Anthropic launches Claude Managed Agents public beta with hosted sandboxes and outcome-based runs

Anthropic put Claude Managed Agents into public beta with hosted sandboxes, vaults, memory filesystems, and long-running sessions. Use the managed setup if you want explicit controls for tools, credentials, and completion criteria instead of custom harness code.

RELEASE1mo ago
OpenClaw 2026.4.7 adds a headless inference hub, memory-wiki, and webhook TaskFlows

OpenClaw 2026.4.7 adds a headless inference hub, memory-wiki, session branch and restore, and webhook-driven TaskFlows. Composio also shipped a CLI for secure app authentication, so users can expand OpenClaw from a local coding harness into a broader agent runtime.

WORKFLOW1mo ago
Bram Cohen compares vibe coding with AI Level 6 workflows after Claude Code leak

Bram Cohen used the Claude Code leak to argue that prompt-only development produces bad software, while a separate 250-hour syntaqlite build said the durable version arrived only after a Python-to-Rust rewrite. Practitioners say specs, tests, linters, repo skills, and codebase context are the controls that keep coding agents maintainable.

NEWS1mo ago
OpenClaw adds direct Claude Code and ClawHub listener routes

Builders shipped a direct Claude Code harness and a ClawHub marketplace skill for OpenClaw workflows. Use these routes to wire agent tooling into OpenClaw, but watch Claude API limits and token burn costs.

NEWS1mo ago
Anthropic cuts Claude subscription access for third-party harnesses in Apr. 4 rollout

Anthropic’s Apr. 4 cutoff for using Claude subscriptions through OpenClaw-class harnesses went live. Users report API-billing fallbacks, ACP workarounds, and restored Claude Code quota, while edge cases around claude -p and Agent SDK use remain unsettled. The change pushes heavy agent loops toward metered access.

RELEASE1mo ago
Hermes Agent adds /claude-code orchestration and cron hooks

Hermes Agent added direct /claude-code orchestration and cron-time script hooks, and the team also shipped Hermes-focused datasets and agent-tuned model variants. The update turns Hermes into a harness that can steer Claude Code and inject recurring context automatically.

WORKFLOW1mo ago
Imbue publishes mngr workflow for 100-agent self-testing with Modal scale-out

Imbue published a walkthrough for mngr showing how it turns tutorial scripts into pytest cases, runs many agents in parallel, and merges fixes back into one branch. The case study offers a repeatable pattern for evaluating agent tools, so teams can borrow the tmux capture, artifact dashboards, and local-to-Modal handoff.

WORKFLOW2mo ago
Claude Code adds /loop, /teleport, and /batch workflow guidance in Boris Cherny guide

A Boris Cherny guide maps Claude Code mobile sessions, /teleport, /loop, hooks, worktrees, /batch, and custom agents into one workflow set. Use it to turn scattered commands into repeatable patterns for long-running coding sessions across terminal, desktop, and cloud.

RELEASE2mo ago
OpenClaw 2026.3.28 adds 9 MCP tools and Responses API support

OpenClaw 2026.3.28 exposes messaging and event handling as nine MCP tools, adds Responses API support, and lets plugins request permission during browser use. Use it to separate transport from agent logic so Claude Code, Codex, Cursor, and local harnesses can share the same account with less glue.

NEWS2mo ago
ATLAS benchmarks Qwen3-14B at 74.6% LiveCodeBench on one RTX 5060 Ti

The ATLAS harness says a frozen Qwen3-14B Q4 model on one RTX 5060 Ti reached 74.6% pass@1-v(k=3) on LiveCodeBench v5 through multi-pass repair and selection. The result shifts comparison toward harness design, though HN commenters note it is not a one-shot head-to-head with hosted frontier models.

RELEASE2mo ago
Codex launches plugins for Slack, Figma, Gmail, and Google Drive

OpenAI rolled out Codex plugins across the app, CLI, and IDE extensions, with app auth, reusable skills, and optional MCP servers. Teams should test plugin-backed workflows and permission models before broad rollout.

RELEASE2mo ago
Cline launches Kanban with worktree-linked parallel CLI agents

Cline launched Kanban, a local multi-agent board that runs Claude, Codex, and Cline CLI tasks in isolated worktrees with dependency chains and diffs. Teams can use it as a visual control layer for parallel coding agents on repo chores that split cleanly.

RELEASE2mo ago
Imbue launches Latchkey: local agents call HTTP APIs without exposing tokens

Imbue released Latchkey, a library that prepends ordinary curl calls so local agents can use SaaS and internal APIs while credentials stay on the developer machine. Try it where agents need many HTTP integrations but should not see raw secrets.

RELEASE2mo ago
OpenCode adds remote sandboxes and syncs agent state across devices

OpenCode is adding remote sandboxes, synced state across laptop, server, and cloud, and more product surface inside its plugin system. That makes long-running off-laptop workflows more practical, but operators should still review telemetry, sandbox, and exposure defaults.

RELEASE2mo ago
Claude Code releases 2.1.84: PowerShell preview, task hooks, idle-return clearing

Claude Code 2.1.84 adds an opt-in PowerShell tool, new task and worktree hooks, safer MCP limits, and better startup and prompt-cache behavior. Anthropic also documented auto mode’s action classifier and added iMessage as a channel, so teams should review permissions and remote-control workflows.

RELEASE2mo ago
OpenClaw releases 2026.3.24 with Teams, OpenWebUI, and skill-control UI

OpenClaw 2026.3.24 adds native Microsoft Teams, OpenWebUI sub-agent access, Slack reply buttons, and a control surface for skills and tools. The release expands where the runtime can plug into enterprise workflows, while also increasing the surface area teams need to secure.

RELEASE2mo ago
Expect launches CLI to QA apps in a real browser and record bug videos

Expect wraps browser QA for Claude Code, Codex, or Cursor into a CLI that records bug videos and feeds failures back into a fix loop. It gives coding agents a tighter UI validation cycle without requiring a custom browser harness.

RELEASE2mo ago
OpenClaw ships 2026.3.22 with ClawHub marketplace and OpenShell SSH sandboxes

OpenClaw shipped version 2026.3.22 with ClawHub, OpenShell plus SSH sandboxes, side-question flows, and more search and model options, then followed with a 2026.3.23 patch. Teams get a broader plugin surface, but should patch quickly and review plugin trust boundaries as the ecosystem grows.

NEWS2mo ago
OpenHands benchmarks EvoClaw and caps continuous-evolution scores at 38.03%

OpenHands introduced EvoClaw, a benchmark that reconstructs milestone DAGs from repo history to test continuous software evolution instead of isolated tasks. The first results show agents can clear single tasks yet still collapse under regressions and technical debt over longer runs.

NEWS2mo ago
Vals AI updates SWE-Bench Verified harness to mini-swe-agent and score slips to 78.8%

Vals AI switched SWE-Bench Verified from SWE-Agent to the bash-only mini-swe-agent harness, aligning results more closely with the official benchmark setup. Top score dipped slightly to 78.8%, but the change reduces harness-specific confounds when comparing models.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.