Skip to content
AI Primer
workflow

Codex supports 56-hour tasks as builders report passkey and browser failures

Codex users shared 56-hour task runs, PM-to-PR workflows, and a new black-box session recorder for tracking drift, token use, and incomplete responses. The longer autonomous sessions matter because browser auth gaps, passkey failures, and tool-selection bugs become real blockers once Codex is used beyond quick code generation.

4 min read
Codex supports 56-hour tasks as builders report passkey and browser failures
Codex supports 56-hour tasks as builders report passkey and browser failures

TL;DR

A lot of the interesting detail sits outside the splashy task-length screenshots. You can read Every's knowledge-work guide, inspect the codex-blackbox repo, and see the surrounding skill layer in the autoreview skill doc and crabbox.

56-hour runs

The headline number here is not just that Codex can stay busy for 56 hours. steipete's workflow note says his own runs moved from roughly 30 to 60 minutes into 4 to 10 hour jobs after layering in /goal, autoreview, and crabbox.

That lines up with Aakash Gupta's interview thread, which describes a 60-hour refactor where Ryan Lopopolo gave only two extra prompts across the run. The operating model is less chat assistant, more background worker that keeps going until the repo or the harness stops it.

Guardrails in the repo

The most useful detail in Gupta's reporting is the sequence of fixes that made long runs viable. The team reportedly spent months making the repository legible enough for the agent, then encoded house style into automated checks instead of relying on engineers to patch mistakes by hand.

The three phases in Gupta's thread are concrete:

  1. Make the repo legible with docs, architecture decisions, and an agents.md file.
  2. Encode team taste into CI lints and AI reviewer personas.
  3. Expand who can ship, including PMs writing PRDs and designers running painted-door experiments.

That explains why the first month was slower. Gupta's follow-up thread says engineers were forced to turn each recurring failure into a permanent guardrail, even when typing the fix manually would have been much faster.

Browser and tool failures

Long autonomous sessions make boring product gaps feel huge. jjpcodes' passkey complaint says the built-in browser falls over on sites that require passkeys, which turns ordinary auth flows into hard blockers.

Tool selection looks shaky too. In thekitze's SVG mishap post, Codex ignored an OpenAI-native skill and improvised with SVG instead. steipete's debugging note adds a second pattern: the same model may happily declare code bug-free until you explicitly tell it a bug exists, at which point it keeps digging and starts surfacing issues.

That cluster of complaints is why thekitze's browser reply bluntly says to avoid the Codex browser altogether. The raw task length is impressive, but the fragile bits are concentrated around navigation, auth, and choosing the right built-in tool.

The sidecar tools around Codex

r/openclaw

Openclaw, codex cli and codex ui live session all together

0 comments

A small tooling ecosystem is forming around the model's weak spots. the codex-blackbox Reddit post pitches a live session recorder for Codex CLI, Codex UI, and OpenClaw sessions, specifically to track model changes, incomplete responses, token use, and regressions after updates.

Other users are building structure above the agent rather than inside it. Dan Shipper's thread setup describes separate pulse, log, inbox, and router threads for recurring knowledge-work jobs, and Amir Mushich's branded video demo shows Codex driving a product video through custom BrandSkill.md and MotionSkill.md files.

Those add-ons all point the same way: Codex is already being used as a long-running production system, but the workflows getting shared most often are wrappers, recorders, and skills that make its behavior easier to steer and inspect.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
Browser and tool failures2 posts
The sidecar tools around Codex1 post
Share on X