updateMay 10, 2026

GPT-5.5 users report 3.3M cached tokens and 2.5x /fast credits

Engineers shared fresh measurements on GPT-5.5 cache reuse, /fast pricing, and bug-finding budgets after comparison posts for GPT-5.5 and Opus 4.7 led the coding round-up. The reports suggest Codex cost and quality now swing on cache behavior and effort settings as much as on list prices.

4 min read

GPT-5.5 users report 3.3M cached tokens and 2.5x /fast credits

TL;DR

chetaslua's usage screenshot showed 3.8M total tokens with 3.3M counted as cached, while teortaxesTex's dashboard screenshot showed a much larger day where 162.3M of 164.2M input tokens were cache hits.
petergostev's /fast credit screenshot says Codex charges GPT-5.5 /fast at 2.5x the Standard credit rate for a claimed 1.5x speedup, and petergostev's follow-up post says his timing utility did not consistently see that full gain.
MParakhin argued GPT-5.5 looks like a stronger base model, but a smaller thinking budget at the same xhigh setting made it worse on high-complexity bug finding.
Workflow reports split by task: TimSuchanek said GPT-5.5 low now beats Claude for codebase research, while haider1 said frontend tasks that used to need high or xhigh now often work on medium.

You can inspect the credit multiplier in petergostev's screenshot, poke at the measurement script in codex-speed-monitor, and compare that cost talk with bridgemindai's plan-mode workflow, mattlam_'s 1 day 12 hour run, and sarahwooders' cache-eviction note. The weird part of the GPT-5.5 rollout is how much the user reports focus on harness behavior, cache reuse, and effort knobs, not just the model name.

Cache hits

The strongest consistent signal in the posts is cache reuse. chetaslua's screenshot put cached tokens at 3.3M out of 3.8M total, and teortaxesTex's dashboard screenshot showed an even more extreme day with 162,266,368 cache-hit input tokens versus 1,329,413 cache-miss input tokens.

Those screenshots matter because they shift the cost conversation away from raw token volume. teortaxesTex called Codex's cache implementation "near perfect," and sarahwooders' model-swap note added a practical caveat: switching models mid-session may evict that cache.

/fast credits

The cleanest hard number in the evidence pool is the /fast multiplier. petergostev's screenshot says Fast mode raises GPT-5.5 speed by 1.5x but charges 2.5x the credits, with GPT-5.4 at 2x.

That tradeoff immediately triggered user measurement. In petergostev's follow-up post, he said his open source tool did not show the promised 1.5x consistently; the repo is linked as codex-speed-monitor. At the same time, ai_for_success's meme post and mattlam_'s long-running run screenshot captured the other side of the pricing story: people were watching rate limits closely while also letting xhigh plus fast jobs run for more than a day.

Thinking budgets

The quality complaints were less about raw model competence than about how much thinking the product lets users buy. MParakhin said GPT-5.5 is a better base model, but a "drastically reduced thinking budget" at the same xhigh setting made it worse for bug finding and other high-complexity tasks.

The comparison thread echoed that split. TheRealAdamG's repost of hiarun02 said GPT-5.5 high was the strongest coding agent they had measured, but lower reasoning tiers were "surprisingly weak." That lines up with haider1's frontend report, which said frontend coding improved enough that medium now handles work that previously needed high or xhigh.

Workflow knobs

The most useful posts were about tactics, not rankings. bridgemindai's plan-mode post described a two-step loop: make Codex write a plan first, review the plan, then ask it to implement. The screenshot showed the model generating a concrete test plan, explicit assumptions, and then executing against that structure.

Other posts filled in the surrounding harness behavior:

TimSuchanek said GPT-5.5 low now does better codebase research than Claude.
sarahwooders said model swapping can help break out of "vibe-hell," even if it risks cache eviction.
sarahwooders' support-agent post said a Letta Code support agent running on a Mac mini was already outperforming human support on some tasks when given docs, source access, and persistent memory.
grx_xce's Game Dev Arena post claimed GPT-5.5's frontend verbosity helped it top Arcada Labs' game-dev leaderboard at 1362 Elo, just ahead of Claude Opus 4.7 Thinking at 1352.

TL;DR

Cache hits

/fast credits

Thinking budgets

Workflow knobs

Discussion across the web