workflowJune 18, 2026

GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows

Builders published Claude Code and Droid setups for GLM-5.2 while Unsloth quantized it for local 256GB machines and Hugging Face opened temporary free inference. Teams can now run the open-weight model across hosted, local, and agent workflows.

6 min read

GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows

TL;DR

Z.ai shipped GLM-5.2 as an MIT-licensed open-weight model with a 1M-token context window, High and Max reasoning modes, and the same API pricing as 5.1, according to Zai_org's launch thread.
Builders immediately wired it into Anthropic-style agent shells: aibuilderclub_'s Claude Code setup shows the env-var swap, while FactoryAI's Droid reply confirms day-one availability there too.
Local use got a second path when UnslothAI's quantization thread compressed GLM-5.2 from 1.51 TB to a 238 GB 2-bit build that they said preserves about 82 percent accuracy.
Hosted access spread fast across providers and agent surfaces. _akhaliq's Hugging Face note flagged a temporary free window, while Together AI, Baseten, and vercel_dev all announced support.
The interesting part is not one benchmark chart. MaximeRivest's long run logged 178 tool calls in a single document-conversion task, and peakcooper's Pi harness post surfaced a stranger integration artifact: GLM-5.2 sometimes thought it was Claude until it checked the local config.

You can jump straight to the tech blog, skim a vLLM self-hosting guide, and compare that to Unsloth's local run guide. The rollout also scattered into product surfaces quickly: Vercel AI Gateway, Modular Cloud, and OpenRouter all showed up in the first wave.

Claude Code

The Claude Code hack is simple: point Anthropic-compatible environment variables at a GLM endpoint, then set the default Sonnet, Opus, and subagent model names to glm-5.2[1m].

The posted config included these knobs:

ANTHROPIC_BASE_URL for the compatible endpoint
ANTHROPIC_AUTH_TOKEN for the provider key
ANTHROPIC_DEFAULT_SONNET_MODEL=glm-5.2[1m]
ANTHROPIC_DEFAULT_OPUS_MODEL=glm-5.2[1m]
CLAUDE_CODE_SUBAGENT_MODEL=glm-5.2[1m]
CLAUDE_CODE_EFFORT_LEVEL=max
CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000

That is a useful tell about where GLM-5.2 landed first. It was not just another chat model slug. People were treating it as a drop-in backend for existing coding-agent harnesses.

Droid

FactoryAI confirmed GLM-5.2 availability in Droid through direct replies, and users were already posting celebratory clips from the app the same day.

The evidence here is thin on implementation details, but it is enough to show the pattern: GLM-5.2 moved into agent products almost immediately, not weeks later after a separate integration cycle. That lines up with the broader provider wave in ollama's launch thread, which listed Claude Code, Codex App, Hermes Agent, and chat access on day one.

Local GGUF

Unsloth's 2-bit release is the most concrete local-use update in the evidence pool. They said the model shrank from 1.51 TB to 238 GB, an 84 percent reduction, while keeping roughly 82 percent accuracy, and claimed it could run on a 256 GB Mac or mixed RAM and VRAM setups.

Cedric Chee's follow-up on the GGUFs adds a more practical nuance: dynamic 4-bit and 5-bit quants looked close to lossless, and 4-bit may be the sweet spot for bigger out-of-distribution tasks. That gives the local story two lanes instead of one:

2-bit for getting the full model onto unusually large but still prosumer hardware
4-bit or 5-bit for better quality when the machine can take the extra footprint

The open-weight story gets real when the model can leave the benchmark chart and fit into a machine a single team can actually buy.

Free and hosted endpoints

The fastest adoption signal was provider sprawl. Hugging Face Inference Providers briefly made GLM-5.2 free through Z.ai-backed endpoints, and the launch then fanned out across Together AI, Baseten, Vercel AI Gateway, OpenRouter, Ollama Cloud, Venice, and Modular Cloud.

A few concrete details stood out:

Zai_org's free-window post said the Hugging Face free access lasted only a few hours
vercel_dev's AI Gateway post called GLM-5.2 Z.ai's first 1M-context model and said it hit 88 percent on Next.js evals, or 96 percent with AGENTS.md
AskVenice's launch post exposed High and Max reasoning modes inside a privacy-focused product surface
AskVenice's privacy-mode post added TEE and E2EE support for GLM-5.2 specifically
opencode's demand update said demand spiked to 3x normal levels
ollama's capacity post said Ollama doubled GPU capacity to keep up

There was also a self-hosting route. vllm_project's guide thread framed GLM-5.2 as a drop-in tool-calling backend for OpenAI Responses API compatible agents, which is the piece that matters if a team wants the model behind its own GPUs rather than another vendor endpoint.

Agent runs

The early hands-on reports were less about one-shot codegen and more about sustained runs. MaximeRivest's PDF conversion thread described a single prompt that turned a blog into a Remarkable-friendly PDF with 178 tool calls, 64,000 tokens, and a $1.75 bill.

Other examples pushed the same direction:

cedric_chee's Discord clone test said GLM-5.2 spent about an hour building a minimal Discord clone from a vague prompt
haider1's long refactor note said it held context across a 12-step refactor and felt more reliable on tool calling than earlier open models
AskVenice's retro game walkthrough published a full browser-game workflow around one self-contained HTML file
yacineMTB's reverse engineering post pointed to people using it for decompilation and assembly work

Those are messy, real agent tasks. That is a better fit for the launch framing than isolated snippet tests.

Model identity glitches

One of the weirder day-one findings came from Pi harness users. peakcooper's Pi harness post said GLM-5.2 insisted it was Claude from Anthropic until it inspected the local agent config, and peakcooper's follow-up added that the harness used only a minimal system prompt that did not name the model.

That does not prove anything broader than a bad identity prior, but it is a useful artifact of the current compatibility stack. When open models are dropped behind Claude-style or OpenAI-style tool shells, the surrounding harness can shape the model's self-description in odd ways.

Missing vision

The cleanest caveat in the evidence pool is also the simplest one: GLM-5.2 is text-only. jeremyphoward's follow-up called image handling the big gap after otherwise praising the model's speed, long-context behavior, and judgment in jeremyphoward's first impression.

That matters because the rest of the rollout makes GLM-5.2 look unusually ready for agent workflows. Claude Code shims, Droid support, private hosted surfaces, and self-hosted vLLM recipes all arrived quickly. Multimodal agent work still needs something else.