breakingJune 20, 2026

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.

3 min read

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

TL;DR

Wafer said its GLM-5.2 endpoint tops Artificial Analysis on both throughput and latency, with 222 output tokens per second and 12.6 seconds end-to-end, according to Wafer's launch post.
Wafer priced the model at $1.20 per million input tokens and $4.10 per million output tokens, with a $0.20 cache line cited in Wafer's pricing reply and Wafer's follow-up post.
The company is steering users to serverless first, because Wafer's sunset notice says Wafer Pass was discontinued and Wafer's endpoint reply says GLM-5.2 is serverless for now.
Demand spiked quickly enough that Wafer's capacity reply, another capacity reply, and a third capacity reply all say the team is adding compute as fast as it can.

You can [hit the endpoint here]Wafer's GLM-5.2 page, pull up the [Hermes Agent setup docs]Hermes Agent docs, and compare the [Artificial Analysis-style benchmark chart]Wafer's launch post against Wafer's later [DeepSWE claim]Wafer's DeepSWE post. Jeremy Howard's early reaction in Jeremy Howard's GLM-5.2 post is also unusually strong for an open-weights release.

Artificial Analysis

Wafer's main claim is simple: fastest GLM-5.2 inference among listed providers on Artificial Analysis.

The attached chart in Wafer's launch post shows Wafer at 222 output tok/s, ahead of GMI at 173 and Together AI at 155. The same chart puts Wafer at 12.6 seconds end-to-end, ahead of Together AI at 16.9 and GMI at 17.2.

A later post, Wafer's DeepSWE post, adds a second bragging right: GLM-5.2 as the top open-source model on DeepSWE at 44%.

Pricing

The pricing story is almost as aggressive as the speed story.

Across one reply, another reply, a third reply, and Wafer's follow-up post, the company repeated the same numbers: $1.20 input, $4.10 output, and in two posts a $0.20 cache price.

The screenshot attached to Wafer's pricing reply places Wafer in the chart's "most attractive quadrant," pairing the lowest price point in the field with the highest output speed.

Serverless

The access model shifted fast enough to show up in replies before it showed up anywhere else.

According to Wafer's sunset notice, Wafer Pass was sunset because demand for serverless and dedicated endpoints was too high. But Wafer's endpoint reply says GLM-5.2 is serverless for now, which makes the near-term offering narrower than the dedicated-endpoint language in Wafer's sales reply suggests.

Wafer is also pointing developers at integration docs. In Wafer's Hermes Agent reply, the company linked setup instructions for Hermes Agent through Nous Research.

Demand

The most concrete signal in the reply thread is not the benchmark chart, it is the repeated capacity warning.

Wafer told multiple users in one reply, another, a third, and a fourth that demand was surging and more compute was being added. That lines up with Wafer's volume comment, which says the company had already seen enormous GLM-5.2 inference volume.

Outside Wafer's own account, Simon Willison's post framed the release as a race for ultra-fast inference providers, while Jeremy Howard's GLM-5.2 post and Vipul Ved's hands-on note both described the model itself as unusually strong.

TL;DR

Artificial Analysis

Pricing

Serverless

Demand

Discussion across the web