Skip to content
AI Primer
TOPIC23 stories

Capacity Planning

Forecasting demand, concurrency, and system headroom.

NEWS1w ago
Anthropic doubles Claude Code 5-hour limits after SpaceX Colossus 1 compute deal

Anthropic said a SpaceX compute deal will add 300+ MW and 220,000+ NVIDIA GPUs, and it doubled Claude Code 5-hour limits across paid plans. It also raised Opus API ceilings; users should still watch the unchanged weekly caps.

RELEASE1w ago
OpenAI opens Multipath Reliable Connection for 100,000-plus GPU training clusters

OpenAI and partners released Multipath Reliable Connection, an RDMA transport that spreads training traffic across multiple network paths and is already deployed on the company's largest clusters. The protocol targets congestion and failure recovery in giant GPU trainings, and teams building similar clusters should track the Open Compute Project release.

NEWS2w ago
White House blocks Mythos expansion from ~50 to ~120 organizations

Posts summarizing WSJ reporting say Anthropic’s push to widen Mythos preview access by about 70 organizations was opposed over national-security and compute-capacity concerns. The change matters because access to Anthropic’s top cyber model may stay tightly rationed for defenders, vendors, and evaluators.

NEWS2w ago
Bedrock adds OpenAI models and stateful runtime in coming weeks

AWS says OpenAI models will land on Bedrock in coming weeks alongside a new stateful runtime. OpenAI also said its Microsoft partnership is now non-exclusive, which opens a multi-cloud path for deployment and procurement.

RELEASE3w ago
Google DeepMind releases Decoupled DiLoCo with 12B Gemma training across 4 US regions

Google DeepMind introduced Decoupled DiLoCo, a distributed-training method that trained a 12B Gemma model across four US regions and mixed TPU6e/v5p hardware while tolerating failures. It matters because it targets the networking and uptime bottlenecks that make frontier training geographically rigid and operationally fragile.

NEWS3w ago
Google launches TPU 8t and TPU 8i with 3x pod compute and 1,152-chip inference pods

Google unveiled eighth-generation TPUs split into TPU 8t for training and TPU 8i for inference, saying 8t delivers nearly 3x per-pod compute over Ironwood while 8i links 1,152 chips in a pod. Google is tuning its hardware stack for larger training runs and lower-latency agent inference at cloud scale.

NEWS1mo ago
OpenAI resets Codex usage limits across all plans after a rate-limit spike

OpenAI reset Codex usage limits across all plans after dashboards showed more users hitting caps and the team said it still did not fully understand the trigger. Use the reset to recheck capacity assumptions, since OpenAI also said it banned abuse accounts and March’s repeated resets point to a broader capacity issue.

NEWS1mo ago
Claude Code limits concurrent work as users report weeklong waits and missing desktop threads

Users report stricter Claude Code request caps, weeklong cooldowns, and desktop threads disappearing after restarts. Watch quotas closely and shift to lighter models or token-cutting workflows around /context and /clear if the limits hit your workflow.

NEWS1mo ago
Sora removes web access on Apr. 26 and API access on Sep. 24

Sora says web and mobile access end on Apr. 26, with API access ending on Sep. 24. Teams now have a fixed migration window, but bulk export still appears unavailable.

NEWS1mo ago
Claude Code limits concurrent agents as users report RPM caps

Users report new request-per-minute caps that trigger after three to four concurrent agents, and Boris Cherny says efficiency work is underway. The issue hits the multi-agent workflows Anthropic has been promoting, separate from five-hour usage buckets.

RELEASE1mo ago
Arm launches AGI CPU with 136 Neoverse V3 cores and 272-core blade

Arm introduced its first production server chip under its own banner, with up to 136 Neoverse V3 cores and a 272-core dual-node reference blade. The launch pushes Arm deeper into direct datacenter silicon for agentic AI workloads, not just IP licensing.

NEWS1mo ago
Anthropic limits Claude 5-hour sessions as users report 529 overloads

Anthropic confirmed new peak-time metering that burns through 5-hour Claude sessions faster, and multiple power users posted 529 overloaded errors and early exhaustion. If you rely on Max plans for coding, watch for session limits and consider moving daily work to Codex.

NEWS1mo ago
Artificial Analysis launches AA-AgentPerf for 200-turn, 100K-token coding traces

Artificial Analysis introduced AA-AgentPerf to benchmark hardware on real coding-agent traces instead of synthetic chat prompts. The benchmark reports users per accelerator, kW, dollar, and rack, so teams can compare production cost and throughput more realistically.

NEWS1mo ago
Anthropic limits Claude 5-hour sessions during 5am-11am PT peak window

Anthropic said free, Pro, and Max users will hit 5-hour Claude session limits faster on weekdays from 5am to 11am PT, while weekly caps stay the same. Shift long Claude Code jobs off-peak and watch prompt-cache misses.

NEWS1mo ago
Meta raises AI capacity with up to $27B Nebius infrastructure deal

Meta agreed to buy up to $27 billion of AI infrastructure from Nebius over five years, including $12 billion of dedicated capacity and optional overflow tied to Vera Rubin deployments. Plan for tighter next-generation GPU supply as hyperscalers lock in capacity years ahead of spot demand.

NEWS2mo ago
Researchers report US data centers may need 697–1,451 MGD of new water capacity by 2030

Researchers report US data centers may need 697–1,451 million gallons per day of new peak water capacity by 2030 in a baseline scenario, even if national totals stay small. Model local peak-day water constraints, not just annual averages, when planning new clusters.

NEWS2mo ago
Anthropic raises Claude off-peak usage 2x across Free, Pro, Max, and Team through Mar. 27

Anthropic is doubling Claude usage outside peak hours from Mar. 13 to Mar. 27, with the bonus applied automatically across Free, Pro, Max, Team, and Claude Code. Shift long runs and bulk jobs to off-peak windows to stretch limits without changing plans.

NEWS2mo ago
Epoch AI reports top chip designers used about 90% of HBM and CoWoS supply in 2025

Epoch AI estimates that NVIDIA, Google, AMD, and Amazon consumed nearly all high-bandwidth memory and advanced packaging tied to frontier AI chips in 2025. Track this if you are planning compute, custom silicon, or open-weight infrastructure strategy.

RELEASE2mo ago
Hugging Face launches Storage Buckets for mutable checkpoints, logs, and agent traces

Hugging Face introduced Storage Buckets, a mutable S3-like repo type for checkpoints, processed data, logs, and traces that do not fit Git workflows. Use it to move overwrite-heavy or high-volume artifacts out of versioned repos without leaving the Hub.

NEWS2mo ago
Codex reports choppy service as demand outpaces added compute

OpenAI says Codex capacity is lagging a demand spike, leaving some sessions choppy while the team adds more compute. If you depend on Codex in production workflows, plan for transient instability and keep fallback review or execution paths ready.

NEWS2mo ago
Thinking Machines Lab launches 1GW Vera Rubin partnership with NVIDIA

Thinking Machines and NVIDIA announced a multi-year plan to deploy at least 1 gigawatt of Vera Rubin systems for training and customizable AI platforms. Watch it as a marker of how frontier training capacity is concentrating into a few very large infrastructure bets.

RELEASE2mo ago
Together GPU Clusters adds autoscaling, RBAC, observability, and self-healing

Together GPU Clusters added autoscaling, RBAC, observability, and self-healing controls to its managed cluster product. Use it if your team is moving from ad hoc GPU pools to production training or inference and needs more platform controls out of the box.

NEWS2mo ago
Oracle says Abilene AI data center stays on schedule with 200MW operational

Oracle disputed reports of delays at the Abilene site, said 200MW is already operational, and reiterated that the campus supports liquid cooling and multiple hardware generations. Infra teams tracking capacity and supplier signals should treat the recent delay narrative as disputed.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.