Fast and memory-efficient fine-tuning for LLMs
Open-source Python library for fast, memory-efficient fine-tuning and inference of large language models.
Unsloth said its updated Qwen3.5 MTP GGUFs now run about 1.8x faster after llama.cpp added spec-draft-p-min 0.75 and renamed the mode to draft-mtp. The update also raises draft-token settings and expands the small-model MTP set for local runners.