Shopify opened the /autoresearch plugin after an autoresearch loop produced a 53% faster parse-plus-render path and 61% fewer allocations in Liquid. Try it if you want agent-driven optimization backed by tests and measurable performance targets.

Shopify says it has open-sourced the /autoresearch plugin for pi, with the launch post framing it simply as “tell it what you want, it will do the rest.” The public release landed alongside a concrete case study: the Liquid thread says the loop was run against Shopify’s 20-year-old Liquid template engine and produced a 53% faster parse-and-render path plus 61% fewer allocations.
The technical pattern was iterative search, not architecture surgery. As the breakdown describes it, the agent “proposes one small change,” benchmarks it, keeps it if it improves the metric, and reverts it if it does not. The accepted changes were small but cumulative: scanning for }} directly instead of invoking regex repeatedly, freezing and reusing string objects in comparisons, detecting single-condition if statements up front, splitting product.title once at parse time instead of every render, skipping per-iteration loop-limit checks when no limit exists, and parsing simple filter names without the full lexer.
The bigger engineering takeaway is that the workflow was only practical because Liquid already had a strong validation harness. In his notes, Simon Willison highlights “974 unit tests” as the unlock for safely letting an agent try many small performance edits, and he argues that a benchmarking script makes “make it faster” an actionable objective instead of a vague prompt. The attached [img:6|notes screenshot] makes the same point: autoresearch works when the agent can repeatedly test, measure, and discard bad ideas.
That also explains why this is more interesting than a one-off CEO coding anecdote. The thread says the same plugin can target other measurable objectives such as test speed, bundle size, build times, and Lighthouse scores, which makes it a reusable optimization loop for mature codebases with tests and benchmarks already in place. Even the reaction thread around the PR stayed focused on that pattern: one summary called out a 20-year-old production engine improving by more than 50%, while Willison’s writeup treats the result as evidence that coding agents are now effective at systematic benchmark-driven cleanup in legacy systems.
And the most important part: we open sourced the /autoresearch plugin for pi. Just tell it what you want, it will do the rest. github.com/davebcn87/pi-a…
The meta-insight here is devastating: Liquid has been battle-hardened by thousands of engineers over 20 years. An AI ran a loop for a couple days and found 6+ architectural inefficiencies none of them caught. Not because it's smarter. Because it never got used to "good Show more
Published some notes on @tobi's autoresearch PR that improved the performance benchmark scores of the Liquid template language (which Tobi created for Shopify 20 years ago) by a hefty 53% simonwillison.net/2026/Mar/13/li…