Skip to content
AI Primer
workflow

Imbue publishes mngr workflow for 100-agent self-testing with Modal scale-out

Imbue published a walkthrough for mngr showing how it turns tutorial scripts into pytest cases, runs many agents in parallel, and merges fixes back into one branch. The case study offers a repeatable pattern for evaluating agent tools, so teams can borrow the tmux capture, artifact dashboards, and local-to-Modal handoff.

4 min read
Imbue publishes mngr workflow for 100-agent self-testing with Modal scale-out
Imbue publishes mngr workflow for 100-agent self-testing with Modal scale-out

TL;DR

You can read the full case study, skim the original mngr launch post, and inspect the open source repository. The nicest implementation detail is the linked skill definition, which shows the tutorial matcher and test-generation workflow in public. The other standout is Imbue's artifact page pattern, where tmux sessions get routed through asciinema so the test run preserves both terminal output and live TUI playback.

Tutorial blocks become pytest functions

Imbue's setup starts with tutorial.sh, split into blocks of consecutive non-empty lines. As the workflow outline puts it, each block gets one or more pytest functions, and then one agent is assigned to run, debug, and improve each test.

The interesting part is the 1:N mapping. In the case study, Imbue says tutorial commands are often too terse for serious end-to-end coverage, so agents expand one example into multiple tests for happy and unhappy paths while still citing the original block through the test fixture.

the tutorial-authoring step fills in another piece: they seed the script with comments like # Managing snapshots, let a coding agent draft examples, and keep the ones that survive review. Bad examples are still useful because they expose confusing CLI affordances.

tmux capture and the artifact page

mngr runs agents inside tmux, which breaks the usual trick of treating everything as a plain subprocess transcript. Imbue's thread says the fix is a custom connect_command in test config that redirects attachment into a helper script.

The case study says that helper uses asciinema to record the attach session into the test output directory. That gives the harness two parallel artifacts:

  • a CLI transcript from the test wrapper
  • a TUI recording from the tmux attach path

Imbue then surfaces both in a web page. For anyone trying to evaluate agent tools, that is the part worth stealing.

The integrator keeps implementation fixes separate

After pytest --collect-only enumerates the suite, the orchestration loop uses mngr create, mngr list, mngr pull, and mngr stop as its core primitives, according to the case study. The merge step is where the workflow stops looking like a demo and starts looking production-minded.

Imbue says each testing agent splits its work into two buckets:

  1. implementation fixes
  2. non-implementation changes, like test improvements or docs fixes

The integrator merges the non-implementation bucket together, then ranks implementation fixes by importance and preserves them as distinct commits on one linear branch. The same post adds one more useful detail: agents that get stuck emit a blocked outcome into the final report instead of forcing a bad merge.

The local-to-remote handoff is almost comically small. In Imbue's example, scaling from about 10 local agents to 100 remote ones means changing mngr create foo to mngr create foo@.modal.

The new detail from the case study is what had to change under the hood to make that believable. Imbue says early local development used Git worktrees because merges are easy when every agent already has a branch in the original repo, but remote agents cannot use worktrees and instead rely on Git mirror mode. So when they wanted parity between local and Modal runs, they reconfigured local agents to use mirror mode too, then pulled changes back explicitly before merging. That is a small systems detail, but it is the kind that usually decides whether a 100-agent demo survives contact with reality.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
Tutorial blocks become pytest functions1 post
Share on X