releaseJune 14, 2026

GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

GLM-5.2 opened to GLM Coding Plan users and posters claimed #1 BridgeBench scores in BS and Reasoning, with one post citing 1/10th the cost and 300 tokens per second. Early frontend tests still found a gap to Fable 5 and Opus on finer visual details.

4 min read

GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

TL;DR

GLM-5.2 is now live for paid GLM Coding Plan users, and WesRoth's launch summary says Z AI is pitching it as a flagship coding model with a usable 1 million token context window.
On BridgeBench, bridgemindai's benchmark post claimed GLM-5.2 hit #1 on BS at 100.0 and #1 on Reasoning at 42.8, which matches the current Reasoning page and SpeedBench page; the public BS benchmark page explains the test as a pushback benchmark over 100 nonsense-seeded tasks.
Speed is part of the pitch: bridgemindai's benchmark post cited 300 tokens per second, and BridgeBench's GLM 5.2 speed page currently shows 296.7 median tok/s with 2450ms TTFT.
Hands-on coding reports are more mixed than the headline rank: in aibuilderclub_'s frontend rerun, GLM-5.2 produced a usable 3D dashboard from the same prompt and reference image, but aibuilderclub_'s follow-up and aibuilderclub_'s later reply both said it still trails Fable 5 and Opus on finer details and hard edge cases.

You can check Z AI's own model switching doc, the live Reasoning scorecard, and the separate speed breakdown. The oddest detail is that the official doc frames GLM-5.2 as a drop-in model swap inside Claude Code style env vars, while the early public chatter is splitting cleanly between benchmark wins and "still not Opus" frontend tests.

BridgeBench

The benchmark claim that drove most of the conversation is real in one dimension and qualified in another. BridgeBench's live Reasoning page lists GLM 5.2 at 42.8 overall, with 13.3% accuracy, 90.1% evidence, 30 tasks passed, and especially strong cluster scores in Stateful Execution at 71.1 and Constraint Reconciliation at 49.6.

The BS side needs more careful wording. bridgemindai's benchmark post said GLM-5.2 ranked #1 at 100.0, but the public BS benchmark page is a general leaderboard page rather than a GLM-5.2 detail card, and bridgemindai's category caveat added that a model being "better" does not make it better across every category.

BridgeBench also defines what that BS number means. The benchmark page describes 100 tasks across finance, legal, medical, physics, and software, each seeded with made-up jargon or reversed relationships, then scored on whether the model pushes back instead of confidently accepting nonsense.

Frontend gap

The cleanest hands-on test in the evidence pool is a one-shot frontend rerun against the same prompt and reference image. In aibuilderclub_'s frontend rerun, the output already included the hard bits that usually break weaker coding models: a 3D globe, route arcs, glass panels, stats cards, and a usable page.

The same tester still drew a clear ceiling. aibuilderclub_'s follow-up said GLM-5.2 "falls short of Opus on the finer details," and aibuilderclub_'s everyday-versus-hard-tasks reply said it feels close to Opus on everyday work but shows a noticeable gap on harder tasks.

That makes the early read pretty concrete:

One-shot frontend generation is already good enough to look competitive, per the original rerun post.
Visual precision and polish still lag Fable 5, according to the same rerun post.
Harder edge cases need extra iterations, per aibuilderclub_'s finer-details reply.
The gap is narrowing faster than expected, in aibuilderclub_'s later reaction.

Throughput

The speed claim is not just social hype. BridgeBench's speed page currently shows 296.7 median tokens per second, 294.7 average throughput, 2450ms TTFT, and 100.0% success across five runs, which is close enough to bridgemindai's 300 tok/s claim that the shorthand survives contact with the public scorecard.

That speed is part of why the model is getting attention despite the still-visible quality gap in harder coding work. bridgemindai's speed reaction called it "one of the best Chinese models" they had used and emphasized that it runs blazing fast.

The cost claim is thinner than the speed claim. bridgemindai's benchmark post said GLM-5.2 comes in at one tenth the cost of Fable 5, but the evidence pool here does not include an official Z AI pricing page for a like-for-like comparison.

Model switching

The most useful official detail is buried in Z AI's own model switching doc. It says GLM Coding Plan users can swap GLM-5.2 into a coding agent by editing settings.json, setting CLAUDE_CODE_AUTO_COMPACT_WINDOW to 1000000, and mapping both ANTHROPIC_DEFAULT_SONNET_MODEL and ANTHROPIC_DEFAULT_OPUS_MODEL to glm-5.2[1m].

That doc also shows how Z AI is positioning the release: not as a separate interface to learn, but as a model slug that drops into existing Claude Code style wiring. WesRoth's launch summary added that open-source weights and API access are planned for next week, so the current rollout is still plan-first rather than fully open on day one.

TL;DR

BridgeBench

Frontend gap

Throughput

Model switching

Discussion across the web