Imbue published a walkthrough for mngr showing how it turns tutorial scripts into pytest cases, runs many agents in parallel, and merges fixes back into one branch. The case study offers a repeatable pattern for evaluating agent tools, so teams can borrow the tmux capture, artifact dashboards, and local-to-Modal handoff.

mngr turns its own tutorial script into end-to-end tests, then sends one agent per test to debug or improve them.sync-tutorial-to-e2e-tests post and the linked skill file, tutorial blocks are matched to pytest functions with an explicit citation API plus a matcher script that checks coverage.connect_command redirects agent sessions into recordings, so CLI transcripts and TUI playback land in the same test output flow.mngr create foo to mngr create foo@.modal, matching the broader mngr launch post and GitHub repo pitch that the same primitives work locally or remotely.You can read the full case study, skim the original mngr launch post, and inspect the open source repository. The nicest implementation detail is the linked skill definition, which shows the tutorial matcher and test-generation workflow in public. The other standout is Imbue's artifact page pattern, where tmux sessions get routed through asciinema so the test run preserves both terminal output and live TUI playback.
Imbue's setup starts with tutorial.sh, split into blocks of consecutive non-empty lines. As the workflow outline puts it, each block gets one or more pytest functions, and then one agent is assigned to run, debug, and improve each test.
The interesting part is the 1:N mapping. In the case study, Imbue says tutorial commands are often too terse for serious end-to-end coverage, so agents expand one example into multiple tests for happy and unhappy paths while still citing the original block through the test fixture.
the tutorial-authoring step fills in another piece: they seed the script with comments like # Managing snapshots, let a coding agent draft examples, and keep the ones that survive review. Bad examples are still useful because they expose confusing CLI affordances.
mngr runs agents inside tmux, which breaks the usual trick of treating everything as a plain subprocess transcript. Imbue's thread says the fix is a custom connect_command in test config that redirects attachment into a helper script.
The case study says that helper uses asciinema to record the attach session into the test output directory. That gives the harness two parallel artifacts:
Imbue then surfaces both in a web page. For anyone trying to evaluate agent tools, that is the part worth stealing.
After pytest --collect-only enumerates the suite, the orchestration loop uses mngr create, mngr list, mngr pull, and mngr stop as its core primitives, according to the case study. The merge step is where the workflow stops looking like a demo and starts looking production-minded.
Imbue says each testing agent splits its work into two buckets:
The integrator merges the non-implementation bucket together, then ranks implementation fixes by importance and preserves them as distinct commits on one linear branch. The same post adds one more useful detail: agents that get stuck emit a blocked outcome into the final report instead of forcing a bad merge.
The local-to-remote handoff is almost comically small. In Imbue's example, scaling from about 10 local agents to 100 remote ones means changing mngr create foo to mngr create foo@.modal.
The new detail from the case study is what had to change under the hood to make that believable. Imbue says early local development used Git worktrees because merges are easy when every agent already has a branch in the original repo, but remote agents cannot use worktrees and instead rely on Git mirror mode. So when they wanted parity between local and Modal runs, they reconfigured local agents to use mirror mode too, then pulled changes back explicitly before merging. That is a small systems detail, but it is the kind that usually decides whether a 100-agent demo survives contact with reality.