Arena ranked GLM-5.1 third on Code Arena and first among open models, putting it on par with Claude Sonnet 4.6 and within about 20 points of the overall lead. The update gives the open model a new frontier coding benchmark after its initial release and hosting wave.

You can read the release post, inspect the model overview, and check the live Code Arena leaderboard. The GitHub repo already points developers to API access and agent integrations, and the Hugging Face model page confirms open-weight distribution under the same name.
The headline number is simple: Code Arena lists GLM-5.1 at 1530 Elo, behind only two Claude Opus 4.6 variants in the leaderboard screenshot, while Arena's post called it the first frontier-level open model to crack the top three.
The broader leaderboard context matters too. The Arena leaderboard says the ranking covers agentic coding tasks with multi-step reasoning and tool use, spans 60 models, and is built from 231,158 votes. That makes this a coding-agent benchmark story, not a single-pass code generation story.
In the official launch post, Z.ai framed GLM-5.1 as its flagship model for agentic engineering and long-horizon work. The company claimed state of the art on SWE-Bench Pro at 58.4, ahead of GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3, and Gemini 3.1 Pro at 54.2.
The same post says GLM-5.1 also widened its gap over GLM-5 on NL2Repo and Terminal-Bench 2.0. That lines up neatly with Arena's claim that the model is already clearing several proprietary coding models on live arena results.
Z.ai's more interesting claim is not the raw benchmark table. In both the model docs and release post, the company keeps hammering on sustained execution: GLM-5.1 is supposed to keep improving over hundreds of rounds and thousands of tool calls instead of peaking early.
The blog gives three concrete examples:
That is Christmas-come-early material for coding agent nerds because it treats runtime as a useful variable instead of a fixed budget.
For engineers, the shipping details are in the docs rather than the tweets. The GLM-5.1 overview lists a 200K context window, 128K max output, multiple thinking modes, real-time streaming responses, strong tool invocation, and continuous autonomous work on one task for up to 8 hours.
The migration guide adds the API-level changes:
glm-5.1tool_stream=truethinking={ type: "enabled" }The GitHub repo says GLM-5.1 is available on Z.ai's API platform and compatible with Claude Code and OpenClaw, while the release post says the model is released under the MIT License and rolled into Z.ai's coding plans with temporary off-peak quota discounts through the end of April.
One more datapoint landed after the Code Arena buzz. grx_xce said GLM-5.1 now leads every open-weight category on Design Arena as well, which suggests the same release is hitting front-end and visual-generation evals, not just agentic coding.
That post is commentary, not an official benchmark writeup, but it adds a useful closing detail: the GLM-5.1 story on April 10 was no longer just "good open coding model." By the end of the day it was being treated as a broad open-weight frontier contender across multiple arena-style leaderboards.