Skip to content
AI Primer
workflow

Codex runs browser signups inside Crabbox e2e tests

Steipete showed Codex running inside Crabbox, opening accounts and completing live web signups while building and end-to-end testing the same project. The setup extends coding agents into real browser actions, but it can also trigger payment and verification messages during automated runs.

3 min read
Codex runs browser signups inside Crabbox e2e tests
Codex runs browser signups inside Crabbox e2e tests

TL;DR

  • steipete's main Crabbox post showed Codex running inside Crabbox, the remote execution layer for maintainers and AI agents, while building and end-to-end testing the same project.
  • According to steipete's PayPal verification post, the setup can cross from code into live operations fast enough to trigger real account signups and verification texts during automated runs.
  • In replies, steipete's tool list said the signup flow used browser use and iMessage, while his prompt example reduced the instruction to "do whatever you need to do to e2e test this."
  • steipete's appshots post added a smaller workflow reveal: Codex Live can take app screenshots directly, which replaced his habit of dragging screenshots in by hand.

Crabbox's README pitches a remote testing control plane for maintainers and AI agents, and the providers section says it can lease cloud capacity, point at SSH hosts, or use agent sandbox providers. steipete's orchestration thread fills in the other half of the picture: he is already using an orchestrator loop that wakes up every five minutes, steers work into threads, and lets some repo work land autonomously.

Crabbox

Crabbox is not just a sandbox name-check in a tweet. The official README describes it as a remote software testing and execution control plane that can sync a dirty checkout, run commands remotely, stream output, and collect evidence.

The providers docs make the agent angle concrete: Crabbox can lease managed cloud machines, attach to existing SSH hosts, or run through agent sandbox providers. That lines up with steipete's cloud reply, where he said the payoff is fewer people "running around with their MacBooks open" because agents can run in the cloud.

Browser signups

The sharpest reveal here is not that Codex can test a UI. It is that, per steipete's main post, Codex was signing up for services automatically via browser and computer use while the project was still building itself.

The follow-ups make that less metaphorical. In steipete's tool list, he named browser use and iMessage. In steipete's payment reply, he said he paid for the signups himself, and in his prompt example the instruction was simply "do whatever you need to do to e2e test this."

Appshots

A separate post exposed a smaller but very usable trick. steipete's appshots post said he had been dragging screenshots into Codex Live manually before discovering appshots, and steipete's shortcut reply added that the trigger is pressing the left and right Command keys.

That fits the same pattern as the signup demo: less prompt choreography, more direct capture of what is on screen.

Orchestrator skills

The browser-signup stunt sits inside a broader maintainer loop. The maintainer-orchestrator skill tells a root Codex session to delegate independent repositories into separate worker threads, while the GitHub triage skill explicitly marks "autonomous candidates" among issues and PRs.

That makes the four-day looping claim read less like a flex post and more like an operating model: remote runners, live proof, browser actions, and a thread-based control layer sitting on top.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
TL;DR2 posts
Crabbox1 post
Browser signups3 posts
Appshots1 post
Share on X