breakingApril 22, 2026

Google launches TPU 8t and TPU 8i with 3x pod compute and 1,152-chip inference pods

Google unveiled eighth-generation TPUs split into TPU 8t for training and TPU 8i for inference, saying 8t delivers nearly 3x per-pod compute over Ironwood while 8i links 1,152 chips in a pod. Google is tuning its hardware stack for larger training runs and lower-latency agent inference at cloud scale.

4 min read

Google launches TPU 8t and TPU 8i with 3x pod compute and 1,152-chip inference pods

TL;DR

Google split its eighth-generation TPU line into Google's launch post, with TPU 8t aimed at training and TPU 8i aimed at inference, instead of shipping one general-purpose part.
According to Google's launch post, TPU 8t delivers nearly 3x compute per pod versus Ironwood, while OfficialLoganK's follow-up repeated the same 2x to 3x framing from Cloud Next.
Google's launch post says TPU 8i links 1,152 chips inside one inference pod, a layout pitched around low latency and enough throughput to run millions of agents concurrently.
scaling01's screenshot surfaced Google's claim that TPU 8t can scale to 1 million chips in a single training cluster, which is the bigger systems reveal than the chip naming.
Demand is already there: rohanpaul_ai's post said Google Cloud told attendees it is now processing more than 16 billion tokens per minute through direct customer API traffic, up from 10 billion last quarter.

You can jump from Google's launch page to the systems architecture write-up, and the more interesting screenshots are not the beauty shots but the pod-scale claims in scaling01's cluster image and scaling01's spec table. The hardware split is simple. The cluster math is where Google is really showing its hand.

TPU 8t and TPU 8i

Google's main change is architectural, not cosmetic. TPU 8t is the training chip, TPU 8i is the inference chip, and Google is explicitly tuning each part for a different job instead of asking one accelerator family to cover both.

That split shows up even in the product framing. demishassabis's repost and algo_diver's repost both center the two-part lineup itself, while Google's own post ties the chips to a broader stack story spanning models, tools, agents, and apps.

Pod-scale numbers

The headline numbers break in two directions:

TPU 8t delivers nearly 3x compute per pod over Ironwood, per Google's launch post.
TPU 8i connects 1,152 chips in one pod for inference throughput and latency, again per Google's launch post.
Google also said TPU 8t can scale to 1 million chips in a single training cluster, according to scaling01's screenshot.

The spec table in scaling01's screenshot adds more color, although it appears to be a conference slide rather than a standalone doc. It lists 216 GB HBM and 12.6 PFLOPs FP4 for TPU 8t, and 288 GB HBM with 8.60 TB/s bandwidth plus 10.1 PFLOPs FP4 for TPU 8i.

Training silicon versus inference silicon

A dual-chip TPU roadmap makes one tradeoff explicit. Training wants giant synchronized clusters and raw compute growth. Inference wants pod designs that can keep latency down while serving lots of parallel requests.

Google's wording tracks that split closely. koraykv's architecture post describes TPU 8t as built for massive-scale training and TPU 8i as built for low-latency inference, while Google's launch post frames 8i around running millions of agents cost-effectively.

That is a notable shift in emphasis from the older TPU story, which usually centered model training first and serving second. Here, the inference part gets its own chip and its own pod-scale bragging rights.

The agent throughput pitch

Google did not just say TPU 8i is for inference. It said the pod is sized for the throughput and low latency needed to run millions of agents concurrently.

That language matters because it connects the chip launch to the rest of Google's Cloud Next stack story. In the same post, Google described the new TPUs as part of an integrated system that runs from chips up through developer tools, agents, and applications.

The result is a cleaner read on where Google thinks demand is heading: not just bigger frontier training runs, but inference fleets with enough concurrency to keep agent workloads cheap enough to operate at cloud scale.

API demand

The best demand datapoint in the evidence pool is not a benchmark. It is the token counter. According to rohanpaul_ai's post, Google Cloud said direct customer API traffic is now above 16 billion tokens per minute, up from 10 billion last quarter.

That quarter-over-quarter jump gives the TPU 8i launch more context than the staged photos do. If the number is accurate, Google is pitching new inference hardware into a business already seeing a 60 percent jump in token flow within one quarter.