workflowMarch 21, 2026

Agent Flywheel introduces beads-and-swarms workflow for 1,000 commits a day

Agent Flywheel lays out a planning-first workflow built on beads, agent mail, swarms, and TUI inspection for very large coding runs. It is useful because the guide exposes coordination primitives and review loops, not just benchmark screenshots.

4 min read

Agent Flywheel introduces beads-and-swarms workflow for 1,000 commits a day

TL;DR

Agent Flywheel's complete guide packages a planning-first coding workflow around "beads," agent swarms, and iterative review, with the core claim that integrated tooling and prompts can support "1,000 high-quality commits a day."
In a follow-up walkthrough, the practical thread shows the loop in use: Claude Code studies a polished Go TUI project, extracts design patterns, and folds them back into a reusable building-glamorous-tuis skill.
The most concrete implementation detail comes from the swarm plan, where a TUI upgrade is split into 10 beads and executed by "5 Claude Code + 5 Codex agents" with file reservations, dependency layers, and regression checks.
Early demos in the ntm HUD post frame the end state as agent-operated orchestration, not just a nicer human dashboard: the system pulls in Agent Mail, Beads, and tmux context data so the coordinating agent can monitor and steer the swarm.

What the flywheel actually adds

The new piece in Agent Flywheel's guide is not a single model or benchmark. It is a coordination stack: heavy up-front planning, converting plans into self-contained "beads," polishing those artifacts, then dispatching swarms across tools like Claude, Codex, and Gemini via Agent Mail. The linked writeup describes a self-reinforcing loop where each iteration improves the planning artifacts before more code is generated Flywheel guide.

That makes the story more operational than most "agentic coding" posts. Instead of jumping straight to codegen, the method treats planning docs, dependency graphs, and task packaging as first-class assets. The useful engineering idea is that the workflow tries to scale by improving the inputs to agents, not only the agents themselves.

How the skill-improvement loop works

The follow-up thread turns that abstract flywheel into a reproducible pattern. In the examples thread, doodlestein has Claude Code first read AGENTS.md and README.md, then "fully understand the code" of an existing TUI-heavy project, and finally update a reusable skill so those patterns are "fully embodied" in building-glamorous-tuis.

The key claim is that models can generalize from a "golden exemplar" if the exemplar is concrete enough. The same thread says you can search prior coding-agent sessions with a cass tool, while the skills catalog at Skills.md is presented as the place where those refined workflows get stored and reused. This is less a new SDK than a method for turning successful project-specific work into portable agent instructions.

What the swarm runtime looks like in practice

The most detailed runtime evidence comes from the swarm plan. The screenshots lay out a 10-bead execution plan for an ntm TUI upgrade, including progress bars, a bubbles/table pane list, a Huh-based spawn wizard, scroll indicators, sparklines, spring transitions, animated gradients, six new TUI Inspector profiles, and a final regression pass with go build and go test -short.

A second screenshot in the file reservations shows how the swarm is coordinated: each agent gets exclusive-write files, shared files are tagged for merge safety, and the spawn command launches "5 CC + 5 COD" with staggered starts. The thread explicitly calls this "in-context recursive self-improvement," because the improved TUI skill is then used to upgrade ntm, and ntm itself helps manage the swarm doing the work.

The demo in the ntm HUD post adds the UI layer. Pressing F12 opens an ntm HUD that aggregates Agent Mail, Beads, and tmux-derived context data, while the accompanying video ntm HUD demo shows a monitorable control surface rather than raw tmux panes. The author says "99%+" of current beads_viewer usage is now indirect, with agents using it on behalf of a human, which is the clearest statement of the project's scaling thesis: the dashboard is becoming agent-facing infrastructure, not just operator convenience.

TL;DR

What the flywheel actually adds

How the skill-improvement loop works

What the swarm runtime looks like in practice

Discussion across the web