Unsloth updates Qwen3.5 MTP GGUFs with draft-mtp flags for 1.8x speed
Unsloth said its updated Qwen3.5 MTP GGUFs now run about 1.8x faster after llama.cpp added spec-draft-p-min 0.75 and renamed the mode to draft-mtp. The update also raises draft-token settings and expands the small-model MTP set for local runners.

TL;DR
- Unsloth said its updated MTP GGUFs now run about 1.8x faster, up from the roughly 1.4x figures that danielhanchen's update contrasted with his May 13 launch post.
- The speed gain is tied to a llama.cpp change that danielhanchen's update attributes to
--spec-draft-p-min 0.75, with the implementation living in llama.cpp PR #22673. - The CLI changed too: danielhanchen's update says
--spec-type mtpbecame--spec-type draft-mtp, and the recommended draft-token ceiling moved from 2 in the original launch post to 6 in the updated branch. - Unsloth expanded the lineup beyond the original 27B and 35B-A3B release that the first announcement benchmarked, and a follow-up correction says the newly posted small-model MTP GGUFs are Qwen3.5 0.8B, 2B, 4B, and 9B, not Qwen3.6.
You can jump straight to Unsloth's MTP guide, inspect the exact llama.cpp change in PR #22673, and the weirdest detail is that the rollout landed with a naming correction: the small MTP quants initially described as Qwen3.6 were later corrected by danielhanchen's follow-up to Qwen3.5.
Speedup jump
The before-and-after here is unusually concrete. The May 13 post put average MTP speedup around 1.4x for dense models and 1.15 to 1.2x for the MoE model, with 140 tokens/s on Qwen3.6 27B MTP and 220 tokens/s on 35B-A3B MTP.
Two days later, danielhanchen's update said the same setup now reaches about 1.8x. The stated reason is a new acceptance threshold, --spec-draft-p-min 0.75, added in llama.cpp PR #22673.
The original release also came with a limit: the May 13 benchmarks said Unsloth did not recommend more than 2 draft tokens because acceptance dropped from 83% to 50% at 4 draft tokens.
Flags and fallback behavior
The updated branch changed the command surface in three places:
--spec-type mtpbecame--spec-type draft-mtp, according to danielhanchen's update.--spec-draft-n-maxmoved from the earlier2in the follow-up details to6in the newer update.- A hybrid path now exists: danielhanchen's update says
ngram-modcan be combined with draft MTP as--spec-type ngram-mod,draft-mtp.
There is also a rollback knob. For users seeing regressions on the updated branch, danielhanchen's update says setting --spec-draft-p-min back to 0.0 restores the old behavior.
Model coverage and the naming correction
The first release centered on two bigger checkpoints, Qwen3.6 27B and Qwen3.6 35B-A3B, in the May 13 announcement. The May 15 update added small-model MTP GGUFs, but danielhanchen's correction says those are actually Qwen3.5 0.8B, 2B, 4B, and 9B.
That same correction also sketches the next batch: the follow-up says Unsloth is working on Qwen3.5-122B and Qwen3.5-397B MTP variants. For local runners, that makes this story less about one benchmark jump and more about MTP support spreading across the Qwen GGUF lineup.