Forecasting demand, concurrency, and system headroom.
Users report new request-per-minute caps that trigger after three to four concurrent agents, and Boris Cherny says efficiency work is underway. The issue hits the multi-agent workflows Anthropic has been promoting, separate from five-hour usage buckets.
Sora says web and mobile access end on Apr. 26, with API access ending on Sep. 24. Teams now have a fixed migration window, but bulk export still appears unavailable.
Arm introduced its first production server chip under its own banner, with up to 136 Neoverse V3 cores and a 272-core dual-node reference blade. The launch pushes Arm deeper into direct datacenter silicon for agentic AI workloads, not just IP licensing.
Anthropic confirmed new peak-time metering that burns through 5-hour Claude sessions faster, and multiple power users posted 529 overloaded errors and early exhaustion. If you rely on Max plans for coding, watch for session limits and consider moving daily work to Codex.
Artificial Analysis introduced AA-AgentPerf to benchmark hardware on real coding-agent traces instead of synthetic chat prompts. The benchmark reports users per accelerator, kW, dollar, and rack, so teams can compare production cost and throughput more realistically.
Anthropic said free, Pro, and Max users will hit 5-hour Claude session limits faster on weekdays from 5am to 11am PT, while weekly caps stay the same. Shift long Claude Code jobs off-peak and watch prompt-cache misses.
Meta agreed to buy up to $27 billion of AI infrastructure from Nebius over five years, including $12 billion of dedicated capacity and optional overflow tied to Vera Rubin deployments. Plan for tighter next-generation GPU supply as hyperscalers lock in capacity years ahead of spot demand.
Researchers report US data centers may need 697–1,451 million gallons per day of new peak water capacity by 2030 in a baseline scenario, even if national totals stay small. Model local peak-day water constraints, not just annual averages, when planning new clusters.
Anthropic is doubling Claude usage outside peak hours from Mar. 13 to Mar. 27, with the bonus applied automatically across Free, Pro, Max, Team, and Claude Code. Shift long runs and bulk jobs to off-peak windows to stretch limits without changing plans.
Epoch AI estimates that NVIDIA, Google, AMD, and Amazon consumed nearly all high-bandwidth memory and advanced packaging tied to frontier AI chips in 2025. Track this if you are planning compute, custom silicon, or open-weight infrastructure strategy.
OpenAI says Codex capacity is lagging a demand spike, leaving some sessions choppy while the team adds more compute. If you depend on Codex in production workflows, plan for transient instability and keep fallback review or execution paths ready.
Hugging Face introduced Storage Buckets, a mutable S3-like repo type for checkpoints, processed data, logs, and traces that do not fit Git workflows. Use it to move overwrite-heavy or high-volume artifacts out of versioned repos without leaving the Hub.
Thinking Machines and NVIDIA announced a multi-year plan to deploy at least 1 gigawatt of Vera Rubin systems for training and customizable AI platforms. Watch it as a marker of how frontier training capacity is concentrating into a few very large infrastructure bets.
Together GPU Clusters added autoscaling, RBAC, observability, and self-healing controls to its managed cluster product. Use it if your team is moving from ad hoc GPU pools to production training or inference and needs more platform controls out of the box.
Oracle disputed reports of delays at the Abilene site, said 200MW is already operational, and reiterated that the campus supports liquid cooling and multiple hardware generations. Infra teams tracking capacity and supplier signals should treat the recent delay narrative as disputed.