TOPIC50 stories

Agent design patterns

How-to / design-pattern / best-practice stories about building or operating coding agents (delegation depth, harness design, control surfaces).

Stories

Filter stories

WORKFLOW1w ago

Engineers use markdown wikis as lightweight memory for Codex, Claude Code, and Hermes

Practitioners described GitHub or folder-based markdown knowledge bases that feed persistent company or personal context to Codex, Claude Code, and Hermes. OpenWiki added codebase and personal brain modes for the same pattern.

WORKFLOW1w ago

ClaudeDevs reports Sonnet 5 + Fable 5 advisor hits ~92% SWE-bench Pro score at ~63% price

ClaudeDevs reports Sonnet 5 with a Fable 5 advisor reached ~92% of Fable 5's SWE-bench Pro score at ~63% of the price. Other builders route implementation to Sonnet, Codex, GPT-5.5, or GLM workers.

NEWS2w ago

Fable 5 users report Opus 4.8 fallbacks and $600 Max quota rotations

Fable 5 users reported Opus 4.8 fallbacks, $600 Max-account rotations, slow browser automation, and token-saving subagents. Watch routing opacity, quota burn, and latency before relying on it for long-running agent work.

RELEASE2w ago

ElevenAgents introduces Procedures with SOP imports from docs, PDFs, and TXT

ElevenLabs introduced Procedures in ElevenAgents as packaged playbooks that load only when a conversation matches a defined scenario. Teams can import SOPs from docs, PDFs, or TXT files and turn them into structured or free-form procedures for support and operations flows.

NEWS2w ago

Apify integrates x402 with 20,000 Actors for USDC-paid runs

Apify added more than 20,000 Actors to the x402 flow, letting agents pay in USDC and run tools on demand through HTTP 402 responses. That gives agents a way to buy web automation tasks without pre-provisioned API keys or a manual checkout step, so builders can test paid tool use directly.

WORKFLOW2w ago

Codex users report /goal, /rewind, and /compact workflows after launch

A day after /goal and thread automations landed in Codex, practitioners started standardizing on /goal specs, /fork or /side detours, and /rewind plus /compact recovery. The pattern matters because verifier design and compaction timing now control how well long runs hold together.

RELEASE2w ago

Microsoft opens SkillOpt with batch eval loops for agent SOP files

Microsoft open-sourced SkillOpt, a system that treats agent skill documents as tunable artifacts and improves them against measured task batches. It matters because practitioners are already standardizing shared /research, QA, and packageable skills across harnesses, turning skill files into a new optimization surface alongside models.

RELEASE2w ago

Plannotator v0.21.3 adds file-scoped review comments and Codex app-server support

Plannotator v0.21.3 shipped file-scoped comments, a unified review UX, default per-file Ask AI chats, and a more reliable Codex app-server path. It matters because guided reviews and plan checks can now plug into agent workflows with less custom glue.

WORKFLOW3w ago

Codex supports thread automations with /goal, /btw, and heartbeat wake-ups

Codex users documented thread automations as recurring wake-up calls that preserve thread context, alongside /goal and /btw patterns for steering long-running loops. The workflow matters because teams can schedule check-ins, queue instructions mid-run, and add adversarial review passes without building a separate orchestrator.

RELEASE3w ago

Junior adds memory and cuts one analytics task from 3m to 1m

Junior’s first memory system cut one analytics task from about 3 minutes to 1 minute in early tests, with tokens down two-thirds and tool calls down 60%. The feature moves persistent task learning into the agent loop, though the results are still internal.

WORKFLOW3w ago

Human-on-the-Bridge compares reusable eval assets with LLM judges and human review

A new Human-on-the-Bridge paper argued for front-loading expert judgment into reusable evaluation assets, while practitioners also shared double-run and multi-model review setups. The cluster matters because teams tuning agent harnesses need repeatable ways to measure behavior beyond one-off benchmark scores or subjective PR review.

WORKFLOW4w ago

Developers publish loop libraries and control-loop guides for long-running agents

Builders released reusable loop artifacts this week, including a Loop Library Skill, repo templates, and published control-loop definitions for docs sweeps, onboarding checks, and error triage. It matters because teams are turning one-shot prompting into persistent agent runs with explicit stop conditions and shared repo state.

RELEASE4w ago

Sakana launches Marlin as a Virtual CSO with up to 8-hour autonomous research

Sakana launched Marlin, a Virtual CSO that runs for up to 8 hours, forms hypotheses, browses sources, and returns slide decks plus reports. It turns Sakana’s long-horizon reasoning work into a shipped deep-research product.

WORKFLOW4w ago

Codex supports agent-written `/goal` prompts for spawned threads

Codex users are having the agent write its own `/goal` and sub-agent goals, with OpenAI-side commentary describing that as a built-in meta-prompting pattern. The workflow turns long autonomous runs into a tighter control loop, but users still review goals first so a bad objective does not burn tokens for hours.

WORKFLOW1mo ago

Practitioners report Fable 5 planner workflows with Opus, Codex, and HTML logs

Users are using Fable 5 as a planner and long-run orchestrator while pushing implementation and heavy reasoning to Opus and Codex. The setup keeps Fable on supervision and planning, so teams can track execution through live status pages on larger tasks.

WORKFLOW1mo ago

Hyperbrowser, InsForge, and Higgsfield release Fable 5 harnesses and MCP workflows

Hyperbrowser shipped a Claude Code harness, InsForge showed a Fable run drop from 5.5M to 2.3M tokens, and Higgsfield published new MCP workflows. These tools add reusable harness, context, and interface layers around Fable for more controlled runs.

WORKFLOW1mo ago

/teach adds npx install and primary-source lessons

Matt Pocock's /teach skill installs with npx skills add mattpocock/skills --skill teach and runs structured strategy lessons inside a Claude agent. Follow-up posts add primary-source reading to the lessons and point to a larger dedicated repo.

WORKFLOW1mo ago

Anthropic updates Fable prompting with /model, high effort, and /goal loops

Anthropic published Fable-specific guidance for Claude Code and API, emphasizing the /model switch, higher default effort, simpler prompts, and /goal-style verification loops. The Claude Code team says older prompt scaffolds can work against Fable's longer sessions.

WORKFLOW1mo ago

Agent tooling adds .prose.md programs, PR panes, and exact-edit primitives

Builders shipped OpenProse workflow files, ghzinga PR tabs, cmux terminal controls, datasette-agent-edit primitives, and an agent-optimized CLI fork. These pieces turn prompt strings into reusable files, panes, and testable edit loops for coding agents.

WORKFLOW1mo ago

Claude Code users report auto mode, dynamic workflows, and critique loops finding 144 bugs

Practitioners shared repeatable setups for multi-hour Claude runs using auto approvals, dynamic workflows, cloud sessions, and critique loops. One large-codebase sweep reported 144 bugs fixed in about four hours with fewer false positives under model critique.

WORKFLOW1mo ago

Codex /goal template adds 6 fields for verification commands and stop conditions

A community workflow broke long-running Codex goals into six required fields, then added an eight-item preflight checklist and helper tools. The structure is meant to reduce runs that drift, stop early, or claim completion without an objective verification step.

WORKFLOW1mo ago

Codex community ships /dynamic swarms, session lifecycles, and model routing

Builders added /dynamic orchestration, custom-model routing, and repo runbooks around Codex as users exposed new session lifecycle controls in the app. That makes Codex a better fit for long-running, multi-context coding work.

WORKFLOW1mo ago

Pi ecosystem adds /goal tasks, acceptance gates, and Lovely Dev Tools

Three independent Pi builders shipped a goal runner, contract-style subagent acceptance gates, and a new Lovely Dev Tools extension in the same window. That gives Pi users more deterministic long-running loops and cleaner local tool interfaces without starting from an empty harness.

WORKFLOW1mo ago

Conductor, CC Mirror, and Codex add Claude-style Dynamic Workflows

A day after Claude Code introduced Dynamic Workflows, builders shipped ports and clones for Codex, Conductor, and GLM-backed CC Mirror. The rapid ports turn the feature into a reusable orchestration pattern rather than an Anthropic-only runtime.

RELEASE1mo ago

Firecrawl launches /monitor webhooks with up to 90% lower token use

Firecrawl launched /monitor, a URL watcher that only pings agents when tracked pages actually change and can send results by webhook. Use it for change-only ingestion to cut LLM token spend on monitored pages.

WORKFLOW1mo ago

Developers compare red-green TDD, Hurl tests, and label-triggered agents for code verification

Practitioners published tests-first coding-agent workflows built around red-green TDD, Hurl suites, GitHub label actions, and Codex-based execution checks. The pattern matters because verification remains the main bottleneck once generation is fast, especially in longer multi-file sessions.

WORKFLOW1mo ago

Microsoft benchmarks SkillOpt at +24.8 Codex points by editing skills, not weights

Microsoft Research released SkillOpt, which optimizes external skill files instead of fine-tuning model weights and reports best-or-tied results across 52 evaluation cells. The method matters because it improved Codex and Claude Code accuracy without extra inference-time calls.

WORKFLOW1mo ago

Codex users share /goal audits, mobile delegation, and Raspberry Pi workflows

Practitioners published reusable Codex workflows for project audits, memory-driven skill packaging, mobile delegation, and remote computer use. Try the prompt-and-steps patterns if you want to adapt Codex across repos and devices.

WORKFLOW1mo ago

Agent Skills ecosystem ships handoff docs, htmx v4 packs, and Project Think support

Independent builders published reusable skills infrastructure across coding agents, including Project Think preview support, handoff docs, and an htmx v4 skill pack. That matters because skills are starting to work like portable workflow units instead of one-off prompt snippets inside a single tool.

WORKFLOW1mo ago

Codex users ship durable-memory workspaces and auto-triage flows

Independent Codex users published Obsidian memory setups, reusable skill prompts, auto-triage flows, and Cloudflare-backed runners for longer jobs. That matters because Codex is being wrapped into persistent workspaces and operator-defined subagents instead of one-shot chats.

WORKFLOW1mo ago

Agent Skills supports Codex, Cursor, Gemini CLI, and VS Code through new libraries and plugins

New guides, plugins, and reusable libraries show the Agent Skills format moving beyond Claude Code into multiple coding-agent clients and runtimes. That matters because workflows are becoming portable artifacts instead of one-off prompts tied to a single harness.

WORKFLOW1mo ago

Codex users report iPhone simulator bug-bashes, Appshots form fills, and locked-Mac runs

Two days after Codex added locked-Mac control and Appshots, users posted end-to-end iPhone simulator debugging, Safari form-filling, and remote-control workflows. That matters because the feature is moving from launch copy into concrete computer-use tasks that can replace manual QA and repetitive UI work.

WORKFLOW1mo ago

Lovable adds is_stuck pipeline with Overflow retrieval to cut stuck rate 5%

Lovable described a production loop where an is_stuck classifier detects repeated failures, Overflow injects past solution pairs, and send_feedback escalates real tool failures. The system lowered stuck rate 5% and raised publish rate 2%, so teams can use the same signal to debug outages and agent frustration.

WORKFLOW2mo ago

Kilo Code introduces Cloud Agent CVE and smoke-test workflows with webhook triggers

Kilo Code posted two cloud-agent automations: a webhook-driven CVE patch flow that opens PRs in parallel and a post-deploy smoke test that checks health, 2xx responses, and latency under 2 seconds. This matters because the examples show coding agents moving into CI-style remediation and production verification loops.

WORKFLOW2mo ago

Codex app adds /goal for long-running React Doctor and iOS runs

OpenAI staff said /goal is now available in the Codex app, and users posted long-running runs that fixed React Doctor scores, built iOS features, and queued weekend tasks. The update moves Codex from CLI-only planning to persistent, steerable work sessions.

WORKFLOW2mo ago

Perplexity opens agent skills manual with 'Zen of Skills' rules for folder-based workflows

Perplexity published its internal manual for building agent skills and paired it with a research post about how those skills power products like Computer. The guide matters because it gives external builders concrete patterns for decomposing agent behavior into reusable skill folders instead of one-off prompts.

RELEASE2mo ago

Cursor releases Team Kit with /verify-this, /loop-on-ci, and harness skills

Cursor's Team Kit packages internal skills like /verify-this, CLI and UI automation harnesses, PR cleanup, and /loop-on-ci, installable with /add-plugin cursor-team-kit. It turns several internal review and validation habits into reusable commands for agent-driven coding workflows.

WORKFLOW2mo ago

Codex users report `/goal` sessions with 70-minute Stripe fixes and a 4,000-prompt cap

Users posted long-running Codex `/goal` sessions with auto-continuations, `pause`/`resume`, and file-backed goals. Watch the 4,000-prompt startup cap and early-stop drift if you plan to run longer agent loops.

WORKFLOW2mo ago

Practitioners report harness playbooks with Playwright CLI, create_agent, and MCP

Builders shared concrete Symphony, create_agent, and MCP setup guides after arguing that model switching is easy but harness switching is not. The playbooks matter because they make harness engineering more repeatable, so teams can copy tested tooling and integration patterns.

WORKFLOW2mo ago

LangChain adds Browserbase search, fetch, and browser subagents to Deep Agents

LangChain shipped a Browserbase integration that gives Deep Agents dedicated search, fetch, and browser subagents with dashboard observability. That turns web navigation into a first-class tool path for agent workflows instead of a custom one-off browser loop.

WORKFLOW2mo ago

mattpocock/skills ranks #1 on GitHub at 28K stars with `/grill-me` and `/tdd` packs

mattpocock/skills hit the top of GitHub Trending as reusable `SKILL.md` packs for grilling specs, writing PRDs, and enforcing TDD spread across coding-agent workflows. The format is starting to look like a distribution layer for agent behavior, with faster install tooling and third-party skills shipping around the same pattern.

WORKFLOW2mo ago

OpenRouter launches `create-headless-agent` for Bun-based multi-model CLIs

OpenRouter released a new skill and guide that scaffold a headless agent CLI on top of its Agent SDK. The template packages multi-model inference, tool calling, and Bun-based CLI setup into a reusable starting point.

WORKFLOW2mo ago

ClawSweeper closes 4,000 OpenClaw issues with 50 Codex agents in one day

Steipete’s maintainer bot ran 50 Codex agents in parallel and closed about 4,000 OpenClaw issues in a day. The cleanup pushed into rate limits, so use the README dashboard and Project Clowfish clustering to track large agent sweeps.

WORKFLOW2mo ago

Kilo Code opens Roo migration with --install-extension and AGENTS.md conversion

Kilo Code published a Roo Code migration path ahead of Roo’s May 15 archive, including one-command install, automated file renames, custom-agent conversion, and API key re-auth. Use the guide to map Roo modes, rules, MCP config, and checkpoints into Kilo’s agent and worktree model before the cutoff.

RELEASE2mo ago

CopilotKit launches Open Generative UI with openGenerativeUI: true

CopilotKit open-sourced Open Generative UI, a flag that lets agents stream interactive UI components directly into chat. The release packages a concrete alternative to raw-code UI generation into a reusable dev toolkit.

WORKFLOW3mo ago

Codex users report subagent, MCP, and canary deploy workflows

Practitioners shared repeatable Codex workflows for long-lived threads, background subagents, computer-use access through MCP, and canary rollouts. Codex is being used less as a one-shot assistant and more as a persistent automation harness.

RELEASE3mo ago

OpenAI Agents SDK adds sandbox execution and memory controls with Vercel, Modal, E2B and Daytona

OpenAI updated the Agents SDK with sandbox execution, memory controls and run snapshotting, and launch partners Vercel, Modal, E2B and Daytona shipped integrations. Long-running agents can now keep files, credentials and execution state in isolated runtimes instead of wiring harness, compute and storage layers together manually.

NEWS3mo ago

Meerkat reports harness-level cheating across 28+ submissions on nine agent benchmarks

Meerkat and Berkeley RDI audits said popular agent leaderboards were inflated by harness-level leakage and eval gaming, with one cleaned entry dropping from first to 14th. That makes published coding-agent rankings and benchmark comparisons less reliable, so treat leaderboard results with caution.

NEWS3mo ago

Vercel Sandbox benchmarks sub-500 ms node -v cold starts

Vercel said Sandbox is now the fastest microVM-based runtime, with fresh node -v cold starts now largely under 500 ms after a month of tuning. The update also puts persistent sandboxes into beta and expands plans for a programmable firewall, so teams should re-check runtime and security settings.

NEWS3mo ago

ClawShop launches OpenClaw resources with SecretRef and PinchBench

Kilo Code’s ClawShop recap bundled a 30-minute KiloClaw setup workshop, SecretRef credential handling, searchable ClawBytes guides, and PinchBench for agentic performance. The event, OpenClaw 2026.4.10, and PetClaw together added new security, memory, budgeting, and desktop layers around the OpenClaw stack.