Skip to content
AI Primer
release

GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

GLM-5.2 opened to GLM Coding Plan users and posters claimed #1 BridgeBench scores in BS and Reasoning, with one post citing 1/10th the cost and 300 tokens per second. Early frontend tests still found a gap to Fable 5 and Opus on finer visual details.

4 min read
GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8
GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

TL;DR

You can check Z AI's own model switching doc, the live Reasoning scorecard, and the separate speed breakdown. The oddest detail is that the official doc frames GLM-5.2 as a drop-in model swap inside Claude Code style env vars, while the early public chatter is splitting cleanly between benchmark wins and "still not Opus" frontend tests.

BridgeBench

The benchmark claim that drove most of the conversation is real in one dimension and qualified in another. BridgeBench's live Reasoning page lists GLM 5.2 at 42.8 overall, with 13.3% accuracy, 90.1% evidence, 30 tasks passed, and especially strong cluster scores in Stateful Execution at 71.1 and Constraint Reconciliation at 49.6.

The BS side needs more careful wording. bridgemindai's benchmark post said GLM-5.2 ranked #1 at 100.0, but the public BS benchmark page is a general leaderboard page rather than a GLM-5.2 detail card, and bridgemindai's category caveat added that a model being "better" does not make it better across every category.

BridgeBench also defines what that BS number means. The benchmark page describes 100 tasks across finance, legal, medical, physics, and software, each seeded with made-up jargon or reversed relationships, then scored on whether the model pushes back instead of confidently accepting nonsense.

Frontend gap

The cleanest hands-on test in the evidence pool is a one-shot frontend rerun against the same prompt and reference image. In aibuilderclub_'s frontend rerun, the output already included the hard bits that usually break weaker coding models: a 3D globe, route arcs, glass panels, stats cards, and a usable page.

The same tester still drew a clear ceiling. aibuilderclub_'s follow-up said GLM-5.2 "falls short of Opus on the finer details," and aibuilderclub_'s everyday-versus-hard-tasks reply said it feels close to Opus on everyday work but shows a noticeable gap on harder tasks.

That makes the early read pretty concrete:

Throughput

The speed claim is not just social hype. BridgeBench's speed page currently shows 296.7 median tokens per second, 294.7 average throughput, 2450ms TTFT, and 100.0% success across five runs, which is close enough to bridgemindai's 300 tok/s claim that the shorthand survives contact with the public scorecard.

That speed is part of why the model is getting attention despite the still-visible quality gap in harder coding work. bridgemindai's speed reaction called it "one of the best Chinese models" they had used and emphasized that it runs blazing fast.

The cost claim is thinner than the speed claim. bridgemindai's benchmark post said GLM-5.2 comes in at one tenth the cost of Fable 5, but the evidence pool here does not include an official Z AI pricing page for a like-for-like comparison.

Model switching

The most useful official detail is buried in Z AI's own model switching doc. It says GLM Coding Plan users can swap GLM-5.2 into a coding agent by editing settings.json, setting CLAUDE_CODE_AUTO_COMPACT_WINDOW to 1000000, and mapping both ANTHROPIC_DEFAULT_SONNET_MODEL and ANTHROPIC_DEFAULT_OPUS_MODEL to glm-5.2[1m].

That doc also shows how Z AI is positioning the release: not as a separate interface to learn, but as a model slug that drops into existing Claude Code style wiring. WesRoth's launch summary added that open-source weights and API access are planned for next week, so the current rollout is still plan-first rather than fully open on day one.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
Frontend gap2 posts
Share on X