Pricing, Limits & Cost — Explore AI Tools & Stories

Fresh stories

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Codex app code now references GPT-5.6 Sol, Terra, and Luna, while posts claim Sol Ultra reaches 91.9% on TerminalBench at lower cost. Treat release timing, limits, and benchmark claims as unofficial until OpenAI publishes details.

🧠Codex3rd July

Breaking

Condense.chat opens Adeline 1 proxy for 9% agent-loop compaction

Condense.chat opened a compression proxy that strips tokens with Helene 1 and compacts settled agent loops with Adeline 1 to about 9% of their size. The service claims 100M saved tokens and 3× plan extension for Claude or Codex users, so test it on non-sensitive workflows first.

New

Context Engineering·3rd July·4 min read

Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Vals and Artificial Analysis published independent Sonnet 5 results a day after launch, placing it just behind Opus 4.8 and Fable 5 while using far more turns than Sonnet 4.6. Lower token pricing did not make agentic tasks cheaper, and some finance benchmarks still triggered refusals.

💳Evals1st July

Fable 5 users report Opus 4.8 fallbacks and $600 Max quota rotations

Fable 5 users reported Opus 4.8 fallbacks, $600 Max-account rotations, slow browser automation, and token-saving subagents. Watch routing opacity, quota burn, and latency before relying on it for long-running agent work.

⚙️Fable3rd July

Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

Users posted mixed reports after Anthropic brought Fable 5 back: some sessions stayed on Fable, while others routed most work to Opus 4.8 or stalled mid-run. Watch for routing changes and cost spikes, since reports also mention refusals on ordinary tasks and ad hoc multi-model workarounds.

💳Reliability1st July·6 min read

New

Ramp introduces PorTAL with half-cost LoRA porting across Qwen and Gemma models

Ramp published PorTAL, a method that learns a reusable task representation once and recalibrates only a thin converter when moving that task to a new base model. In reported Qwen and Gemma experiments, it matched per-task LoRA accuracy while cutting data and cost roughly in half.

💳Cost Optimization1st July

New

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Google shipped Nano Banana 2 Lite for image generation and Gemini Omni Flash for conversational video generation and editing in the Gemini API and AI Studio. The release sets image generation at about 4 seconds and $0.034 per 1K image, while Omni Flash adds multi-turn video edits at $0.10 per second.

Release🧠Gemini30th June

Breaking

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

Nous updated Hermes Agent web extraction to skip the old summarizer loop, pass cleaner content directly to the model, and page large documents on demand. The change is claimed to cut read latency by up to 60x and cost by 49x, so teams should compare output quality before adopting it.

New

Hermes Agent·30th June·3 min read

New

The Information reports OpenAI cuts inference costs by more than 50% on some models

Multiple summaries of The Information report said OpenAI found inference optimizations that more than halved costs on some existing models. If that holds, it changes the margin, pricing, and usage-limit math behind ChatGPT and API serving even before new model releases arrive.

💳Model serving30th June

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

The US Commerce Department removed export controls on Fable 5 and Mythos 5, and Anthropic said access starts returning July 1. Fable counts against up to 50% of weekly limits through July 7 before moving to usage credits, so users should check their quota behavior and fallback paths.

💳Claude Code30th June

See all stories →

The Information reports OpenAI cuts inference costs by more than 50% on some models

💳Model serving30th June

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

💳Claude Code30th June

Briefs forJuly 3

Top storiesthis week

See all →

Breaking

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Cognition launched Devin Fusion, a hybrid coding harness that reroutes work mid-task and says it cuts Fable-class cost by 35%. Use it when upfront routing misses late complexity; the router can re-evaluate after investigation starts.

New

Model Routing·29th June·5 min read

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

A day after Codex reset limits for weekend drain reports, OpenAI said auto-review, duplicate background suggestions, and retry behavior were compounding usage and issued another full reset. Users also get one extra reset credit within 24 hours while reporting and scheduling fixes roll out.

💳Codex29th June

Codex resets all usage limits as OpenAI investigates weekend drain reports

Two days after OpenAI said it had fixed Codex quota drain tied to fraud overflagging, the team opened a Sunday war room for fresh drain reports and issued a hard reset of user limits. The incident matters because background usage and reset rules were still opaque during long-running agent work.

💳Codex28th June

See all stories →

New

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Release🧠Model RoutingAgent product updates29th June · 5 min read

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

💳Codex29th June

Codex resets all usage limits as OpenAI investigates weekend drain reports

💳Codex28th June

Daily AI Digest

Get the best stories delivered
to your inbox

Explore what's new in AI

Filters

Fresh stories

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Condense.chat opens Adeline 1 proxy for 9% agent-loop compaction

Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Fable 5 users report Opus 4.8 fallbacks and $600 Max quota rotations

Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

Ramp introduces PorTAL with half-cost LoRA porting across Qwen and Gemma models

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

The Information reports OpenAI cuts inference costs by more than 50% on some models

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Condense.chat opens Adeline 1 proxy for 9% agent-loop compaction

Fable 5 users report Opus 4.8 fallbacks and $600 Max quota rotations

Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

Ramp introduces PorTAL with half-cost LoRA porting across Qwen and Gemma models

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Hermes Agent updates web extraction with 60x faster reads and 49x lower cost

The Information reports OpenAI cuts inference costs by more than 50% on some models

US Commerce removes Fable 5 export controls; Anthropic restores access July 1

Briefs forJuly 3

Top storiesthis week

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

Codex resets all usage limits as OpenAI investigates weekend drain reports

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Codex fixes usage overcounting with one extra banked reset and auto-review rollback

Codex resets all usage limits as OpenAI investigates weekend drain reports

Daily AI Digest