Quantization-Aware Training
Apply fake quantization during training or fine-tuning to improve final quantized model accuracy.
Quantization-Aware Training (QAT) is a torchao workflow/API for applying fake quantization during model training or fine-tuning so the converted quantized model can retain better accuracy or perplexity than post-training quantization. In torchao it uses prepare and convert steps, including QATConfig and quantize_ APIs, to insert fake-quantized layers before training and convert them to quantized operations afterward.
Recent stories
0 linked stories
No linked stories yet.