Skip to content
AI Primer
TOPIC50 stories

Pricing & limits

Pricing changes, quotas, rate limits, and plan/tier changes for AI dev tools and APIs.

NEWS3rd July
Fable 5 users report Opus 4.8 fallbacks and $600 Max quota rotations

Fable 5 users reported Opus 4.8 fallbacks, $600 Max-account rotations, slow browser automation, and token-saving subagents. Watch routing opacity, quota burn, and latency before relying on it for long-running agent work.

NEWS3rd July
Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Codex app code now references GPT-5.6 Sol, Terra, and Luna, while posts claim Sol Ultra reaches 91.9% on TerminalBench at lower cost. Treat release timing, limits, and benchmark claims as unofficial until OpenAI publishes details.

RELEASE3rd July
Condense.chat opens Adeline 1 proxy for 9% agent-loop compaction

Condense.chat opened a compression proxy that strips tokens with Helene 1 and compacts settled agent loops with Adeline 1 to about 9% of their size. The service claims 100M saved tokens and 3× plan extension for Claude or Codex users, so test it on non-sensitive workflows first.

NEWS1st July
Fable 5 users report Opus 4.8 fallbacks, refusals, and $321 sessions

Users posted mixed reports after Anthropic brought Fable 5 back: some sessions stayed on Fable, while others routed most work to Opus 4.8 or stalled mid-run. Watch for routing changes and cost spikes, since reports also mention refusals on ordinary tasks and ad hoc multi-model workarounds.

NEWS1st July
Claude Sonnet 5 ranks #3 on Vals and hits 183 turns on AA-Briefcase

Vals and Artificial Analysis published independent Sonnet 5 results a day after launch, placing it just behind Opus 4.8 and Fable 5 while using far more turns than Sonnet 4.6. Lower token pricing did not make agentic tasks cheaper, and some finance benchmarks still triggered refusals.

RELEASE30th June
Google releases Nano Banana 2 Lite and Gemini Omni Flash

Google shipped Nano Banana 2 Lite for image generation and Gemini Omni Flash for conversational video generation and editing in the Gemini API and AI Studio. The release sets image generation at about 4 seconds and $0.034 per 1K image, while Omni Flash adds multi-turn video edits at $0.10 per second.

RELEASE29th June
Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Cognition launched Devin Fusion, a hybrid coding harness that reroutes work mid-task and says it cuts Fable-class cost by 35%. Use it when upfront routing misses late complexity; the router can re-evaluate after investigation starts.

NEWS29th June
Codex fixes usage overcounting with one extra banked reset and auto-review rollback

A day after Codex reset limits for weekend drain reports, OpenAI said auto-review, duplicate background suggestions, and retry behavior were compounding usage and issued another full reset. Users also get one extra reset credit within 24 hours while reporting and scheduling fixes roll out.

NEWS27th June
OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

RELEASE27th June
Datalab ranks 95.9% on a 225-document extraction benchmark at under half Reducto's price

Datalab’s balanced extraction mode scored 95.9% on a 225-document benchmark and beat Reducto Deep Extract’s 95.1%, according to Vik Paruchuri. The update also adds citations and reasoning, but the benchmark and price comparison are vendor-reported.

NEWS1w ago
Codex fixes quota drain tied to fraud overflagging with an account-wide usage reset

OpenAI said Codex accounts were seeing faster usage draining than intended because abuse and fraud checks were overflagging some sessions, then issued a usage reset for all users. It matters because paid Codex workflows were losing quota unexpectedly mid-run, directly affecting reliability and cost.

RELEASE1w ago
Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

A day after Seedance 2.0's 4K rollout story, partners began shipping the cheaper Seedance 2.0 Mini across Venice, ComfyUI, and Pika MCP. The 15-second 720p variant with native audio gives video workflows a lower-cost path than the flagship model.

NEWS1w ago
Claude Tag users report token billing and shared-memory concerns

A day after Claude Tag launched, engineers raised token billing, lock-in, and shared-memory concerns while Anthropic described its agent-identity model. Watch how Claude behaves in shared Slack channels, where it uses its own credentials and scoped access.

RELEASE1w ago
Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.

RELEASE1w ago
Kilo Code launches Auto Efficient routing with KiloBench model selection

Kilo Code added an Auto Efficient mode that routes each request to the cheapest model that clears its benchmark bar using public KiloBench results. The router stays session-aware and falls back to stronger paid models when confidence is low.

NEWS2w ago
GLM-5.2 ranks #1 on DeepSWE with 44% pass@1

Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.

NEWS2w ago
Engineers compare GLM-5.2 local builds: $10k Mac Studio, 17 tok/s, and 2-bit quant tradeoffs

Practitioners published concrete GLM-5.2 self-host numbers, from Mac Studio and 4090-class setups to annualized power and hardware costs. That matters because open weights now offer privacy and rate-limit control, but quant quality, electricity, and latency still keep hosted APIs cheaper for many teams.

NEWS2w ago
Engineers report GLM-5.2 matches near-Opus planning at about 1/10 the price

Independent tests put GLM-5.2 near Opus 4.8 and GPT-5.5 on planning and coding, and users shared Claude Code, BrowserCode, dcode, and local-serving recipes. It matters because many engineers are treating it as a daily-driver option for text-heavy coding, though teams still report weaker vision and provider limits.

RELEASE2w ago
Kilo Code adds Terminal Bench scores and average attempt cost to model picker

Kilo Code now shows Terminal Bench completion rate and average attempt cost directly in model details inside its CLI and VS Code extension. It matters because the numbers come from Kilo's own harness and retry logic rather than public leaderboard scaffolds.

RELEASE2w ago
Moonshot releases Kimi K2.7 Code HighSpeed at 180 tok/s with 2x API pricing

Moonshot rolled out HighSpeed for Kimi K2.7 Code, claiming about 180 tok/s on coding tasks, up to 260 tok/s on shorter contexts, and roughly 6x speedups. Watch the tight capacity limits and mixed benchmark results, and budget for the 2x pricing if you want the faster mode.

NEWS2w ago
Anthropic delays Claude Agent SDK credit shift for claude -p and third-party apps

Anthropic paused a same-day policy change that would have moved Claude Agent SDK, claude -p, and third-party SDK apps onto separate monthly credits. Existing subscription-backed workflows continue unchanged for now, but teams should watch for the redesigned billing plan.

RELEASE2w ago
GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

GLM-5.2 opened to GLM Coding Plan users and posters claimed #1 BridgeBench scores in BS and Reasoning, with one post citing 1/10th the cost and 300 tokens per second. Early frontend tests still found a gap to Fable 5 and Opus on finer visual details.

RELEASE3w ago
OpenRouter launches Fusion API with DRACO panel tests at 1% of Fable

OpenRouter launched Fusion, a server-side panel API that fans prompts to multiple models, judges the outputs, and returns one synthesized answer. The company said DRACO landed within 1% of Fable at roughly half the price, but the published evals do not cover long-horizon tasks.

RELEASE3w ago
Z.ai releases GLM-5.2 for Coding Plan users with 1M context and Max mode

Z.ai made GLM-5.2 available to GLM Coding Plan users with High and Max thinking modes, 1M context, and promised API plus MIT open source next week. Early testers reported higher plan pricing, heavy rate limits, and mixed build quality versus Opus and Fable.

RELEASE3w ago
Moonshot releases Kimi K2.7 Code: +21.8% on Kimi Code Bench v2, 30% fewer reasoning tokens

Moonshot open-sourced Kimi K2.7 Code and says it outperforms K2.6 by 21.8% on Kimi Code Bench v2 while using 30% fewer reasoning tokens. The release includes open weights and API access, so teams can test the 180 tok/s HighSpeed rollout and early Cline/OpenCode support.

NEWS3w ago
Fable 5 users report Opus 4.8 fallbacks during research prompts

Users said Claude Fable 5 kept routing ordinary research prompts to Opus 4.8 after Anthropic’s labeled fallback path appeared. Watch for mid-session model swaps if you rely on Fable for research work.

WORKFLOW3w ago
Practitioners report Fable 5 planner workflows with Opus, Codex, and HTML logs

Users are using Fable 5 as a planner and long-run orchestrator while pushing implementation and heavy reasoning to Opus and Codex. The setup keeps Fable on supervision and planning, so teams can track execution through live status pages on larger tasks.

RELEASE3w ago
Codex adds banked rate-limit resets for Go, Plus, Pro, and Business

OpenAI started rolling out bankable Codex resets to Go, Plus, Pro, and Business users, plus a two-week referral program that can add more resets. That lets users save capacity for heavier Browser use and longer Codex sessions instead of losing resets on a fixed clock.

NEWS3w ago
Claude users report silent fallback and 30-day retention after Fable 5 launch

Anthropic said flagged frontier-LLM requests will visibly fall back to Opus 4.8 after complaints about hidden downgrades and 30-day retention. If you run Claude in production, watch for fallback behavior and verify retention settings before deployment.

NEWS3w ago
Fable 5 users report 90-minute Max caps and June 23 plan cutoff

One day after Fable 5 launched, users reported burning through Max quotas in about 90 minutes while Anthropic told subscribers the model will leave Claude plans on June 23 until capacity improves. If you depend on Fable, plan for quota pressure and route critical jobs elsewhere.

RELEASE3w ago
Cohere releases North Mini Code: 30B MoE, 3B active, 256K context

Cohere open-sourced North Mini Code, a 30B-parameter coding MoE with 3B active parameters, 256K context, and Apache 2.0 licensing. OpenCode added it the same day, making the release immediately usable in a coding-agent client.

NEWS3w ago
Anthropic updates Claude Fable 5 limits with 5-hour and weekly resets

Anthropic reset Fable's 5-hour and weekly quotas after launch-day reports of Max users exhausting access in minutes. Access also depended on the latest Claude Code build, and plan messaging said included use ends June 22 before usage credits take over.

WORKFLOW3w ago
Claude Code users report auto mode, dynamic workflows, and critique loops finding 144 bugs

Practitioners shared repeatable setups for multi-hour Claude runs using auto approvals, dynamic workflows, cloud sessions, and critique loops. One large-codebase sweep reported 144 bugs fixed in about four hours with fewer false positives under model critique.

NEWS4w ago
Kilo Code benchmarks MiniMax M3 vs Claude Opus 4.8: 13/17 bugs at $0.07 vs $1.30

A seeded code-audit benchmark found MiniMax M3 and the cheapest Claude Opus 4.8 run each caught 13 of 17 planted bugs, but at sharply different cost. The results also showed models found different bugs, and higher reasoning settings did not reliably improve cost efficiency.

NEWS4w ago
OpenRouter adds cache-hit pricing telemetry as Devin exposes adaptive routing

Vendors pushed routing and spend controls closer to the default app layer, including OpenRouter's cache-hit pricing telemetry and Devin's adaptive routing. The discussion frames model choice more as a budget-control problem than a pure quality setting.

NEWS4w ago
Claude Mythos 5 leaks in Dev Mode and API with tier-above-Opus pricing hints

Multiple leak accounts reported a Claude Mythos 5 slug in Dev Mode and the API, pointing to a separate model class above Opus. If confirmed, Anthropic is preparing a new top-tier Claude line with much higher price assumptions, though timing and pricing remain unconfirmed.

NEWS4w ago
Cognition launches Devin Productivity Guarantee with $10M cap

Cognition said it will fund Devin usage up to $10 million when measured engineering value falls below cost, and published a technical writeup estimating productive engineering hours per session. It matters because the company is shifting agent pricing from tokens to claimed output and extending coding evaluation toward much longer task horizons.

NEWS4w ago
Codex users report outages, 5-hour caps, and token shortages after Sites launch

Users reported outages, tighter 5-hour caps, and token availability problems a day after OpenAI launched Codex Sites and plugins. OpenAI reset Codex usage limits after three incidents, so teams should watch quotas and backend reliability as agent workflows ramp up.

NEWS4w ago
Uber cuts AI coding-tool spend to $1,500 per employee per tool each month

Uber set a $1,500 monthly limit for each AI coding tool an employee uses, covering products such as Cursor and Claude Code. The cap gives enterprises an early benchmark for coding-agent spend as token costs outgrow typical software-seat budgets.

NEWS4w ago
LangSmith launches Sandbox, LLM Gateway, and Engine for agent execution, spend tracking, and eval triage

LangSmith added sandboxed execution, spend-aware gateway routing, and Engine to surface recurring agent failures from traces. The bundle gives teams one place to run agents, control token spend, and turn production issues into debugging and eval loops.

RELEASE4w ago
OpenRouter launches Pareto Code with min_coding_score and 1B routed tokens per day

OpenRouter launched Pareto Code, a free experimental coding router that filters by min_coding_score and says it is already handling about 1 billion tokens a day. The release adds a tunable routing path for coding workloads where cost and model quality need to be balanced.

NEWS4w ago
Claude Code updates Dynamic Workflows trigger to `ultracode` after accidental 103-agent runs

Anthropic changed the Dynamic Workflows trigger word from “workflow” to `ultracode` after users reported accidental fan-outs, including a 103-agent run that burned 2M tokens. The tweak should reduce surprise parallel launches, though subagent-heavy sessions can still hit rate and usage limits quickly.

RELEASE4w ago
Factory introduces Router with 25% lower AI spend and 99% of Opus 4.7 Terminal-Bench 2

Factory put Router into private preview in its CLI and desktop app to route coding tasks across models, claiming 20-25% lower spend. The launch targets rising agent costs, though session continuity and routing behavior remain active points of debate.

NEWS4w ago
Cursor raises Teams usage limits and adds Premium seats with 5x usage

Cursor raised usage limits for all Teams users and introduced a Premium seat tier with 5x usage for 3x the price. Teams can now budget coding-agent access around seat quotas instead of raw token meters.

RELEASE4w ago
Browser Use launches browser infrastructure at $0.02/hour with subsecond cold starts

Browser Use rebuilt its runtime around a custom Chromium fork, Firecracker fork, and custom Linux kernel, claiming $0.02 per hour pricing with subsecond cold starts. The shift targets the infrastructure bottlenecks behind browser agents rather than model quality alone.

NEWS4w ago
Codex raises weekly and hourly limits to 100% after 5 million users

OpenAI restored Codex weekly and hourly quotas across paid ChatGPT plans after Tibo Sottiaux said the product hit 5 million users. Watch for long-running QA loops, migration PRs, and remote desktop sessions that can still burn through quotas fast.

NEWS4w ago
Opus 4.8 users report token burn, failed tool calls, and DeepSWE gaps

Three days after Opus 4.8 launched, new tests and field reports added failed tool calls, Bash-specific breakdowns, and higher token burn to the complaint list. Users report materially worse cost and stability in long coding sessions, while DeepSWE and GBA Eval point in different directions.

NEWS4w ago
Developers report Codex beats Claude Code on DeepSWE, token burn, and multi-hour /goal sessions

Independent users compared GPT-5.5/Codex with Opus 4.8/Claude Code using DeepSWE cost charts, GBA Eval runs, and long coding sessions. The split matters because engineers choosing a daily coding stack now have external quality-versus-cost evidence instead of only vendor launch claims.

NEWS4w ago
Claude Code users report accidental workflow triggers, 199-agent research runs, and 50M-token burn

Three days after Dynamic Workflows launched, Claude Code users reported accidental mode triggers, a 199-agent deep-research run that burned about 50 million tokens, and steep quota hits from design workflows. The complaints matter because orchestration can now dominate cost and behavior even when the underlying model is working as expected.

NEWS1mo ago
Opus 4.8 users report write failures, sycophancy, and 58% DeepSWE

Two days after launch, users and benchmarks pointed to write failures, sycophancy, lower security recall, and a 58% DeepSWE result. GPT-5.5 still leads on cost, output tokens, and pass@1 in shared coding-agent tests, so compare both before switching.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.