Skip to content
AI Primer
update

GLM-5.1 ranks #3 on Code Arena

Arena ranked GLM-5.1 third on Code Arena and first among open models, putting it on par with Claude Sonnet 4.6 and within about 20 points of the overall lead. The update gives the open model a new frontier coding benchmark after its initial release and hosting wave.

4 min read
GLM-5.1 ranks #3 on Code Arena
GLM-5.1 ranks #3 on Code Arena

TL;DR

  • According to Arena's ranking update, GLM-5.1 reached #3 on Code Arena, tied roughly with Claude Sonnet 4.6 and marked as the first frontier-level open model to break into the top three.
  • Arena's follow-up said GLM-5.1 remained the #1 open model and sat within about 20 points of the overall lead while scoring ahead of Claude Sonnet 4.6, Opus 4.5, GPT-5.4 High, and Gemini 3.1 Pro.
  • In the official GLM-5.1 announcement, Z.ai positioned the model around Claude Opus 4.6 on coding and long-horizon agent work, then backed that with benchmark wins on SWE-Bench Pro and gains on Terminal-Bench 2.0 and NL2Repo.
  • The GLM-5.1 docs and migration guide add the product details engineers actually care about: 200K context, 128K max output, streaming tool calls, deep-thinking mode, and up to 8-hour autonomous runs on a single task.
  • grx_xce's Design Arena post added a second signal later that day, saying GLM-5.1 now leads every open-weight category on Design Arena too.

You can read the release post, inspect the model overview, and check the live Code Arena leaderboard. The GitHub repo already points developers to API access and agent integrations, and the Hugging Face model page confirms open-weight distribution under the same name.

Code Arena

The headline number is simple: Code Arena lists GLM-5.1 at 1530 Elo, behind only two Claude Opus 4.6 variants in the leaderboard screenshot, while Arena's post called it the first frontier-level open model to crack the top three.

The broader leaderboard context matters too. The Arena leaderboard says the ranking covers agentic coding tasks with multi-step reasoning and tool use, spans 60 models, and is built from 231,158 votes. That makes this a coding-agent benchmark story, not a single-pass code generation story.

Z.ai's own benchmark case

In the official launch post, Z.ai framed GLM-5.1 as its flagship model for agentic engineering and long-horizon work. The company claimed state of the art on SWE-Bench Pro at 58.4, ahead of GPT-5.4 at 57.7, Claude Opus 4.6 at 57.3, and Gemini 3.1 Pro at 54.2.

The same post says GLM-5.1 also widened its gap over GLM-5 on NL2Repo and Terminal-Bench 2.0. That lines up neatly with Arena's claim that the model is already clearing several proprietary coding models on live arena results.

Long-horizon runs

Z.ai's more interesting claim is not the raw benchmark table. In both the model docs and release post, the company keeps hammering on sustained execution: GLM-5.1 is supposed to keep improving over hundreds of rounds and thousands of tool calls instead of peaking early.

The blog gives three concrete examples:

  • A vector database optimization run that went past 600 iterations and 6,000 tool calls, ending at 21.5k QPS.
  • A KernelBench Level 3 run where GLM-5.1 reached 3.6x speedup, still short of Claude Opus 4.6 at 4.2x.
  • An 8-hour browser-based Linux desktop build that kept adding apps and polish over time.

That is Christmas-come-early material for coding agent nerds because it treats runtime as a useful variable instead of a fixed budget.

API and agent hooks

For engineers, the shipping details are in the docs rather than the tweets. The GLM-5.1 overview lists a 200K context window, 128K max output, multiple thinking modes, real-time streaming responses, strong tool invocation, and continuous autonomous work on one task for up to 8 hours.

The migration guide adds the API-level changes:

  • model ID: glm-5.1
  • streaming tool calls via tool_stream=true
  • deep thinking via thinking={ type: "enabled" }
  • larger context and output limits than earlier GLM models

The GitHub repo says GLM-5.1 is available on Z.ai's API platform and compatible with Claude Code and OpenClaw, while the release post says the model is released under the MIT License and rolled into Z.ai's coding plans with temporary off-peak quota discounts through the end of April.

Design Arena

One more datapoint landed after the Code Arena buzz. grx_xce said GLM-5.1 now leads every open-weight category on Design Arena as well, which suggests the same release is hitting front-end and visual-generation evals, not just agentic coding.

That post is commentary, not an official benchmark writeup, but it adds a useful closing detail: the GLM-5.1 story on April 10 was no longer just "good open coding model." By the end of the day it was being treated as a broad open-weight frontier contender across multiple arena-style leaderboards.

Share on X