workflowMay 16, 2026

Codex users report 2-hour mech-interp runs and 150-hour tasks with `/goal`

Days after `/goal` workflows first surfaced, users showed the command also works in the Codex app and shared runs for SSH setup, mech-interp scripts, and recurring work that lasted hours or days. The evidence points to Codex being used as a long-running research and ops agent, though the app still lacks explicit `/goal` UI.

6 min read

Codex users report 2-hour mech-interp runs and 150-hour tasks with `/goal`

TL;DR

daniel_mac8's mech-interp run says GPT-5.5 plus Codex in /goal mode completed an end-to-end interpretability experiment in about two hours, work he estimated at 25 to 45 focused human hours.
According to petergostev's app discovery, /goal is not just a CLI feature, because starting a Codex app message with /goal triggers the mode even though the app has no explicit UI for it.
Long-running sessions are showing up fast: daniel_mac8's screenshot captured a /goal run at 5 days 10 hours, while a retweeted structural biology report claimed more than 150 hours of continuous autoresearch.
Users are already stretching /goal beyond coding, as jxnlco's morning-prep workflow used it across Slack, Gmail, Calendar, Linear, and Obsidian, and dkundel's list extended it to slides, feedback analysis, video editing, and expenses.
The control surface is widening at the same time: OpenAI's mobile announcement and OpenAIDevs' rollout post said Codex can now be steered from the ChatGPT mobile app while runs continue on a laptop, Mac mini, or devbox.

You can read OpenAI's mobile Codex post, see mattlam_'s locked-laptop test where Codex kept self-testing while the user was away, and check altryne's setup screenshot showing a second device-control tab that appears to be present before it is fully explained. The oddest reveal is still petergostev's finding: one slash command turns the app into a long-run agent loop without any dedicated button for it.

`/goal` changed Codex from chat turns into completion loops

The cleanest description came from kilocode, who framed /goal as the difference between generating something and iterating until the objective is actually complete. That matches why yacineMTB's blunt endorsement said to never use a Codex prompt without /goal, and why TheRealAdamG's repost reduced the trick to model plus harness.

The important product detail is that the behavior surfaced before the UI did. petergostev's post showed the command working inside the Codex app by prefixing a message with /goal, which makes the feature feel half-shipped and already heavily used.

Mech-interp and structural biology runs

The strongest concrete report is daniel_mac8's experiment log. He says Codex read the Natural Language Autoencoders paper and repo, designed a 120-prompt dataset and rubric, configured an Nvidia GB10 box over SSH, debugged inference, wrote extraction and scoring scripts, ran the experiment, reran failures, and produced a report.

His attached breakdown in

is useful because it separates the expensive parts. The slow pieces were glue work, environment setup, inference-path debugging, and building a trustworthy scoring pipeline, not just model runtime.

A second research-style datapoint came from the structural biology repost, which claimed GPT-5.5 in /goal had been running for more than 150 hours on autoresearch work. another post from daniel_mac8 treated that as evidence that general-purpose models are already clearing a chunk of PhD-style execution, at least for tightly scoped experimental workflows.

Multi-day runtime is becoming normal enough to screenshot

The runtime numbers are getting weird quickly. daniel_mac8's screenshot showed a session labeled "Pursuing goal" after 5 days 10 hours, and kevinkern's earlier post showed a separate run completing after 15 hours 27 minutes.

That is enough variation to sketch the emerging shape of the product:

Short unattended loops: mattlam_'s mobile example had Codex self-test while the laptop was locked.
Medium runs: justalexoki's reposted screenshot showed a goal achieved after 8 hours 32 minutes.
Multi-day persistence: the 5.5-day screenshot suggests the harness is willing to stay alive far past a normal chat session.

Not everyone thinks /goal is the right default. pvncher's orchestration post argued that with a strong orchestration setup, goal mode can be overkill for many tasks.

The workflows are already broader than coding

The practical use cases skew toward ops and personal automation as much as software delivery. jxnlco's prompt used Codex to search Slack, Gmail, Calendar, and Linear, then save a morning brief into Obsidian after reviewing past notes.

dkundel expanded the pattern into a list:

keep up with Slack and daily priorities
brainstorm presentations and draft slides
analyze user feedback and crunch data
edit videos
file repetitive expenses

onusoz's complaint adds a useful counterexample. He found /goal still underperformed his queued implementation prompt on whole-codebase refactors, because the model sometimes declared partial scope victory instead of finishing the entire project.

Mobile control and remote boxes turned it into a roaming agent

OpenAI's official rollout made the timing clearer. OpenAI said Codex can now be used from the ChatGPT mobile app on iOS and Android, while OpenAIDevs paired that with Remote SSH becoming generally available for devboxes and other managed environments. The linked announcement is OpenAI's work-with-Codex-from-anywhere post.

The setup details in the tweet thread were concrete:

mobile rollout was in preview on iOS and Android in supported regions, per OpenAI's thread
Windows connection support was still coming soon, per the same thread
setup lived in the Codex desktop sidebar, and required updated ChatGPT mobile and Codex macOS apps, per OpenAIDevs' reply

That changed how people described the product. btibor91's hands-on note said it already felt useful for continuing and monitoring tasks away from the keyboard, while daniel_mac8's reaction called remote SSH and devboxes the killer feature because it pushes Codex closer to a personal agent you can steer from a phone.

Usage resets, unfinished tabs, and other rough edges

The rough edges are visible in the same week as the hype. mikehostetler's bug report said /goal did not pause when weekly API usage ran out, and jeffscottward's question suggests at least some heavy users saw unexpected weekly resets while pushing the feature hard.

There are also hints of product surface arriving ahead of explanation. altryne's hands-on post said the mobile flow exposed another tab for controlling other devices that did not seem to work yet, and petergostev's app screenshot remains the clearest sign that /goal itself is already live in the app even though OpenAI has not given it first-class UI.

Discussion across the web

Where this story is being discussed, in original context.

On X· 7 threads

TL;DR1 post

`/goal` changed Codex from chat turns into completion loops2 posts

Mech-interp and structural biology runs2 posts

Multi-day runtime is becoming normal enough to screenshot3 posts

The workflows are already broader than coding1 post

Mobile control and remote boxes turned it into a roaming agent3 posts

Usage resets, unfinished tabs, and other rough edges1 post