Quantization-Aware Training

Apply fake quantization during training or fine-tuning to improve final quantized model accuracy.

Quantization-Aware Training (QAT) is a torchao workflow/API for applying fake quantization during model training or fine-tuning so the converted quantized model can retain better accuracy or perplexity than post-training quantization. In torchao it uses prepare and convert steps, including QATConfig and quantize_ APIs, to insert fake-quantized layers before training and convert them to quantized operations afterward.

Recent stories

0 linked stories

No linked stories yet.