releaseMay 15, 2026

Unsloth updates Qwen3.5 MTP GGUFs with draft-mtp flags for 1.8x speed

Unsloth said its updated Qwen3.5 MTP GGUFs now run about 1.8x faster after llama.cpp added spec-draft-p-min 0.75 and renamed the mode to draft-mtp. The update also raises draft-token settings and expands the small-model MTP set for local runners.

3 min read

Unsloth updates Qwen3.5 MTP GGUFs with draft-mtp flags for 1.8x speed

TL;DR

Unsloth said its updated MTP GGUFs now run about 1.8x faster, up from the roughly 1.4x figures that danielhanchen's update contrasted with his May 13 launch post.
The speed gain is tied to a llama.cpp change that danielhanchen's update attributes to --spec-draft-p-min 0.75, with the implementation living in llama.cpp PR #22673.
The CLI changed too: danielhanchen's update says --spec-type mtp became --spec-type draft-mtp, and the recommended draft-token ceiling moved from 2 in the original launch post to 6 in the updated branch.
Unsloth expanded the lineup beyond the original 27B and 35B-A3B release that the first announcement benchmarked, and a follow-up correction says the newly posted small-model MTP GGUFs are Qwen3.5 0.8B, 2B, 4B, and 9B, not Qwen3.6.

You can jump straight to Unsloth's MTP guide, inspect the exact llama.cpp change in PR #22673, and the weirdest detail is that the rollout landed with a naming correction: the small MTP quants initially described as Qwen3.6 were later corrected by danielhanchen's follow-up to Qwen3.5.

Speedup jump

The before-and-after here is unusually concrete. The May 13 post put average MTP speedup around 1.4x for dense models and 1.15 to 1.2x for the MoE model, with 140 tokens/s on Qwen3.6 27B MTP and 220 tokens/s on 35B-A3B MTP.

Two days later, danielhanchen's update said the same setup now reaches about 1.8x. The stated reason is a new acceptance threshold, --spec-draft-p-min 0.75, added in llama.cpp PR #22673.

The original release also came with a limit: the May 13 benchmarks said Unsloth did not recommend more than 2 draft tokens because acceptance dropped from 83% to 50% at 4 draft tokens.

Flags and fallback behavior

The updated branch changed the command surface in three places:

--spec-type mtp became --spec-type draft-mtp, according to danielhanchen's update.
--spec-draft-n-max moved from the earlier 2 in the follow-up details to 6 in the newer update.
A hybrid path now exists: danielhanchen's update says ngram-mod can be combined with draft MTP as --spec-type ngram-mod,draft-mtp.

There is also a rollback knob. For users seeing regressions on the updated branch, danielhanchen's update says setting --spec-draft-p-min back to 0.0 restores the old behavior.

Model coverage and the naming correction

The first release centered on two bigger checkpoints, Qwen3.6 27B and Qwen3.6 35B-A3B, in the May 13 announcement. The May 15 update added small-model MTP GGUFs, but danielhanchen's correction says those are actually Qwen3.5 0.8B, 2B, 4B, and 9B.

That same correction also sketches the next batch: the follow-up says Unsloth is working on Qwen3.5-122B and Qwen3.5-397B MTP variants. For local runners, that makes this story less about one benchmark jump and more about MTP support spreading across the Qwen GGUF lineup.