Skip to content
AI Primer
workflow

GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows

Builders published Claude Code and Droid setups for GLM-5.2 while Unsloth quantized it for local 256GB machines and Hugging Face opened temporary free inference. Teams can now run the open-weight model across hosted, local, and agent workflows.

6 min read
GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows
GLM-5.2 ships in Claude Code, Droid, and 2-bit GGUF workflows

TL;DR

You can jump straight to the tech blog, skim a vLLM self-hosting guide, and compare that to Unsloth's local run guide. The rollout also scattered into product surfaces quickly: Vercel AI Gateway, Modular Cloud, and OpenRouter all showed up in the first wave.

Claude Code

The Claude Code hack is simple: point Anthropic-compatible environment variables at a GLM endpoint, then set the default Sonnet, Opus, and subagent model names to glm-5.2[1m].

The posted config included these knobs:

  • ANTHROPIC_BASE_URL for the compatible endpoint
  • ANTHROPIC_AUTH_TOKEN for the provider key
  • ANTHROPIC_DEFAULT_SONNET_MODEL=glm-5.2[1m]
  • ANTHROPIC_DEFAULT_OPUS_MODEL=glm-5.2[1m]
  • CLAUDE_CODE_SUBAGENT_MODEL=glm-5.2[1m]
  • CLAUDE_CODE_EFFORT_LEVEL=max
  • CLAUDE_CODE_AUTO_COMPACT_WINDOW=1000000

That is a useful tell about where GLM-5.2 landed first. It was not just another chat model slug. People were treating it as a drop-in backend for existing coding-agent harnesses.

Droid

FactoryAI confirmed GLM-5.2 availability in Droid through direct replies, and users were already posting celebratory clips from the app the same day.

The evidence here is thin on implementation details, but it is enough to show the pattern: GLM-5.2 moved into agent products almost immediately, not weeks later after a separate integration cycle. That lines up with the broader provider wave in ollama's launch thread, which listed Claude Code, Codex App, Hermes Agent, and chat access on day one.

Local GGUF

Unsloth's 2-bit release is the most concrete local-use update in the evidence pool. They said the model shrank from 1.51 TB to 238 GB, an 84 percent reduction, while keeping roughly 82 percent accuracy, and claimed it could run on a 256 GB Mac or mixed RAM and VRAM setups.

Cedric Chee's follow-up on the GGUFs adds a more practical nuance: dynamic 4-bit and 5-bit quants looked close to lossless, and 4-bit may be the sweet spot for bigger out-of-distribution tasks. That gives the local story two lanes instead of one:

  • 2-bit for getting the full model onto unusually large but still prosumer hardware
  • 4-bit or 5-bit for better quality when the machine can take the extra footprint

The open-weight story gets real when the model can leave the benchmark chart and fit into a machine a single team can actually buy.

Free and hosted endpoints

The fastest adoption signal was provider sprawl. Hugging Face Inference Providers briefly made GLM-5.2 free through Z.ai-backed endpoints, and the launch then fanned out across Together AI, Baseten, Vercel AI Gateway, OpenRouter, Ollama Cloud, Venice, and Modular Cloud.

A few concrete details stood out:

There was also a self-hosting route. vllm_project's guide thread framed GLM-5.2 as a drop-in tool-calling backend for OpenAI Responses API compatible agents, which is the piece that matters if a team wants the model behind its own GPUs rather than another vendor endpoint.

Agent runs

The early hands-on reports were less about one-shot codegen and more about sustained runs. MaximeRivest's PDF conversion thread described a single prompt that turned a blog into a Remarkable-friendly PDF with 178 tool calls, 64,000 tokens, and a $1.75 bill.

Other examples pushed the same direction:

Those are messy, real agent tasks. That is a better fit for the launch framing than isolated snippet tests.

Model identity glitches

One of the weirder day-one findings came from Pi harness users. peakcooper's Pi harness post said GLM-5.2 insisted it was Claude from Anthropic until it inspected the local agent config, and peakcooper's follow-up added that the harness used only a minimal system prompt that did not name the model.

That does not prove anything broader than a bad identity prior, but it is a useful artifact of the current compatibility stack. When open models are dropped behind Claude-style or OpenAI-style tool shells, the surrounding harness can shape the model's self-description in odd ways.

Missing vision

The cleanest caveat in the evidence pool is also the simplest one: GLM-5.2 is text-only. jeremyphoward's follow-up called image handling the big gap after otherwise praising the model's speed, long-context behavior, and judgment in jeremyphoward's first impression.

That matters because the rest of the rollout makes GLM-5.2 look unusually ready for agent workflows. Claude Code shims, Droid support, private hosted surfaces, and self-hosted vLLM recipes all arrived quickly. Multimodal agent work still needs something else.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
TL;DR3 posts
Droid1 post
Free and hosted endpoints8 posts
Agent runs3 posts
Share on X