Skip to content
AI Primer

Z.ai's GLM family of language models.

Pricing

Artificial Analysis · Jun 19, 2026, 1:00 PM
Input / 1M
$0.60
Output / 1M
$2.20
Blended / 1M
$1.00
Output TPS
117
TTFT (s)
0.64

Model Intelligence

Arena ranking
38
Benchmarkable
No
Model level
family
Intelligence Index
33.8
Math Index
95
MMLU Pro
0.86
GPQA
0.86
HLE
0.25
LiveCodeBench
0.89
SciCode
0.45
AIME 2025
0.95
IFBench
0.68
LCR
0.64
TerminalBench Hard
0.32
TAU2
0.96

Recent stories

26 linked stories
newsPRIMARY2026-06-27
GLM-5.2 ranks 30/99 on PrinzBench as testers report legal hallucinations

PrinzBench added GLM-5.2 and scored it 30/99 for legal research, while a separate LisanBench run placed GLM-5.2-high at #29 and noted high token use. The result matters because it cuts against code-centric GLM hype and points to weak search, statute fidelity, and reasoning on professional legal tasks.

newsSECONDARY2026-06-27
OpenRouter reports four open-weight models handle agents; Chinese models hit 45% of traffic

OpenRouter said four open-weight models now handle real agentic workloads, and a JPMorgan report put Chinese models at about 45% of platform traffic. The shift matters because teams are optimizing for price, hosting, and task fit instead of defaulting to frontier APIs.

releasePRIMARY2026-06-24
Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.

newsPRIMARY2026-06-22
GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

GLM-5.2 added Perplexity Agent API, Droid, and more hosting options, while Baseten reported over 280 TPS and sub-0.8s TTFT. Builders should watch the cost and benchmark data as it moves into production agent stacks.

newsSECONDARY2026-06-22
Fugu Ultra testers report 30-minute runs and 17x GLM cost after launch

Sakana launched Fugu Ultra on AI Gateway and published a technical report, with early testers sharing mixed results. Reports mention polished outputs on some tasks, but also 30-minute runs, uneven coding quality, and much higher cost than GLM-5.2.

newsPRIMARY2026-06-20
GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.

newsPRIMARY2026-06-20
GLM-5.2 ranks #1 on DeepSWE with 44% pass@1

Independent results put GLM-5.2 at the top of the open-model DeepSWE board and near the top on debate and post-train evals. Watch token use and long reasoning traces, which can offset its headline price advantage.

newsSECONDARY2026-06-20
Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.

newsSECONDARY2026-06-20
Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Ollama said it doubled GPU capacity for GLM-5.2 cloud usage and said the model is currently hosted only in the US. The rollout adds capacity as open-model demand climbs, so users should check hosting and privacy details before deploying.

newsPRIMARY2026-06-20
Engineers compare GLM-5.2 local builds: $10k Mac Studio, 17 tok/s, and 2-bit quant tradeoffs

Practitioners published concrete GLM-5.2 self-host numbers, from Mac Studio and 4090-class setups to annualized power and hardware costs. That matters because open weights now offer privacy and rate-limit control, but quant quality, electricity, and latency still keep hosted APIs cheaper for many teams.

newsPRIMARY2026-06-19
Engineers report GLM-5.2 matches near-Opus planning at about 1/10 the price

Independent tests put GLM-5.2 near Opus 4.8 and GPT-5.5 on planning and coding, and users shared Claude Code, BrowserCode, dcode, and local-serving recipes. It matters because many engineers are treating it as a daily-driver option for text-heavy coding, though teams still report weaker vision and provider limits.

workflowPRIMARY2026-06-18
GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows

Builders published Claude Code and Droid setups for GLM-5.2 while Unsloth quantized it for local 256GB machines and Hugging Face opened temporary free inference. Teams can now run the open-weight model across hosted, local, and agent workflows.

newsSECONDARY2026-06-18
Artificial Analysis launches AA-Briefcase with Claude Fable 5 at 1587 Elo

Artificial Analysis launched AA-Briefcase, a benchmark for multi-week knowledge-work projects with thousands of source files, and Claude Fable 5 leads at 1587 Elo. The first results show a wide cost spread, so teams should compare both quality and task cost before choosing a model.

newsPRIMARY2026-06-17
GLM-5.2 ranks #1 on Vals and Design Arena, AA Coding Index hits 50.7

Fresh third-party results put GLM-5.2 atop multiple open-model leaderboards, including the AA Coding Index, Vals Index, Terminal Bench 2.1, and Design Arena. The scores add independent confirmation, though demand spiked enough to strain some providers.

releasePRIMARY2026-06-16
Z.ai releases GLM-5.2 open weights with 1M context and 46.2% DeepSWE

Z.ai released GLM-5.2 MIT-licensed open weights with 1M context and broad runtime support. Vendor and arena results put it near frontier closed models on long-horizon coding.

releaseSECONDARY2026-06-15
Moonshot releases Kimi K2.7 Code HighSpeed at 180 tok/s with 2x API pricing

Moonshot rolled out HighSpeed for Kimi K2.7 Code, claiming about 180 tok/s on coding tasks, up to 260 tok/s on shorter contexts, and roughly 6x speedups. Watch the tight capacity limits and mixed benchmark results, and budget for the 2x pricing if you want the faster mode.

newsSECONDARY2026-06-14
Fable users compare GLM-5.2, GPT-5.5, and model panels on one-shot UI work

Two days after Fable 5 went offline, developers started testing GLM-5.2, GPT-5.5, and multi-model panels against the kinds of one-shot frontend and greenfield builds Fable handled well. The early pattern is that replacements cover much of the work, but Fable still leads on UI taste and first-pass product completion.

releasePRIMARY2026-06-14
GLM-5.2 ranks #1 on BridgeBench Reasoning at 42.8

GLM-5.2 opened to GLM Coding Plan users and posters claimed #1 BridgeBench scores in BS and Reasoning, with one post citing 1/10th the cost and 300 tokens per second. Early frontend tests still found a gap to Fable 5 and Opus on finer visual details.

releasePRIMARY2026-06-13
Z.ai releases GLM-5.2 for Coding Plan users with 1M context and Max mode

Z.ai made GLM-5.2 available to GLM Coding Plan users with High and Max thinking modes, 1M context, and promised API plus MIT open source next week. Early testers reported higher plan pricing, heavy rate limits, and mixed build quality versus Opus and Fable.

workflowSECONDARY2026-05-29
Conductor, CC Mirror, and Codex add Claude-style Dynamic Workflows

A day after Claude Code introduced Dynamic Workflows, builders shipped ports and clones for Codex, Conductor, and GLM-backed CC Mirror. The rapid ports turn the feature into a reusable orchestration pattern rather than an Anthropic-only runtime.

newsPRIMARY2026-04-10
GLM-5.1 ranks #3 on Code Arena

Arena ranked GLM-5.1 third on Code Arena and first among open models, putting it on par with Claude Sonnet 4.6 and within about 20 points of the overall lead. The update gives the open model a new frontier coding benchmark after its initial release and hosting wave.

newsPRIMARY2026-04-08
GLM-5.1 lands on Modal, Together AI, Letta Code, and Tembo

Providers and agent platforms added GLM-5.1 endpoints across Modal, Together AI, Letta Code, Tembo, and Tabbit, with free trials, no-key access, and 99.9% SLA options. Use the new hosting options to test the model for coding and long-horizon agent workloads without waiting on self-hosting.

releasePRIMARY2026-04-07
Z.ai releases GLM-5.1, a 744B open model with 58.4 SWE-Bench Pro and 8-hour agent runs

Z.ai released GLM-5.1, a 744B open model built for long-horizon agentic coding and ranked first among open systems on SWE-Bench Pro. Day-0 support in OpenRouter, Ollama, SGLang, vLLM, OpenCode, and local quantization paths makes it ready to test in existing stacks.

releasePRIMARY2026-04-01
Z.ai launches GLM-5V-Turbo for screenshot coding and GUI-agent tasks

Z.ai released GLM-5V-Turbo, a multimodal coding model for screenshots, video, design drafts, and GUI-agent tasks. It keeps text-coding performance steady while adding native vision support, so teams can test visual workflows without swapping models.

releasePRIMARY2026-03-28
Z.ai releases GLM-5.1 to all Coding Plan users with 5am–11am PT switch window

Z.ai said GLM-5.1 is now available to all GLM Coding Plan users and highlighted a 5am to 11am PT switch window. The update broadens access beyond the initial rollout, though early practitioner tests reported weaker Repo bench and tool-calling behavior than 5.0.

releasePRIMARY2026-03-27
Z.ai releases GLM-5.1 to Coding Plan users with `glm-5.1` model switch

Z.ai made GLM-5.1 available to all Coding Plan users and documented how to route coding agents to it by changing the model name in config. Early harness benchmarks place it near Opus 4.6 on coding evals, but BridgeBench users report much slower tokens per second.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.