Skip to content
AI Primer
release

Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

Morph said its code-serving stack now exposes Qwen, GLM-5.2, MiniMax M3, and DeepSeek v4 with code-tuned speculative decoding. It claims 20-35% higher acceptance than Eagle 3.1 or DFlash, plus kernels for cheaper hardware.

3 min read
Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance
Morph supports Qwen, GLM-5.2, MiniMax M3, DeepSeek v4 with 20-35% higher code acceptance

TL;DR

  • morphllm's launch post says Morph now serves Qwen, GLM 5.2, MiniMax M3, and DeepSeek v4 through a code-focused inference stack.
  • In morphllm's speculative decoding post, the company claims code-tuned speculative models deliver 20 to 35 percent higher acceptance rates than Eagle 3.1 or DFlash.
  • morphllm's kernel post says part of the speed and cost story comes from writing kernels and systems for discounted hardware with weaker interconnects or worse uptime.
  • According to morphllm's access note, these models are available either through OpenRouter or directly from Morph.

You can jump straight to the supported-model list, inspect Morph's acceptance-rate claim, and see that its kernel work is aimed at oddball H200 and B200 deployments rather than only ideal NVLink boxes. The access post is also unusually direct: OpenRouter or direct, nothing more dressed up than that.

Supported models

The concrete product update is simple: Morph is positioning itself as a serving layer for open models used in coding, not just a wrapper around one flagship model. In morphllm's announcement, the launch set is Qwen, GLM 5.2, MiniMax M3, and DeepSeek v4.

That list matters because it spans several open-model families rather than one vendor line:

  • Qwen
  • GLM 5.2
  • MiniMax M3
  • DeepSeek v4

Speculative decoding

The sharper technical claim lives in morphllm's follow-up: Morph says it trains speculative models specifically for coding. The stated result is 20 to 35 percent higher acceptance rates than off-the-shelf Eagle 3.1 or DFlash.

Morph does not give benchmark methodology, prompt mix, or latency breakdown in the tweet thread. What it does make clear is the optimization target: higher speculative-token acceptance on code workloads, not generic decoding throughput.

Kernel work on cheaper hardware

Morph's kernel post ties the serving story to infrastructure, not just model routing. The company says H200 and B200 systems usually arrive with NVLink, but cheaper supply can mean slow interconnects and weaker uptime, so it writes kernels and systems meant to stay reliable in those setups.

That is a more specific claim than generic cost optimization. The target is discounted hardware that behaves worse than the clean, tightly connected clusters most inference marketing assumes.

Access paths

Access is the least ambiguous part of the rollout. In morphllm's access note, the company says the models can be used on OpenRouter or directly from Morph.

That makes this look less like a closed hosted feature and more like a distribution play: Morph is selling its code-serving stack both through a popular routing surface and through its own endpoint.

Share on X