Nnieqat is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and activation as fake fp32 format.
-
Supported Platforms: Linux
-
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
-
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
-
Install nnieqat via pypi:
$ pip install nnieqat
-
Install nnieqat in docker(easy way to solve environment problems):
$ cd docker $ docker build -t nnieqat-image .
-
Install nnieqat via repo:
$ git clone https://github.com/aovoc/nnieqat-pytorch $ cd nnieqat-pytorch $ make install
-
add quantization hook.
quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... register_quantization_hook(model) ...
-
merge bn weight into conv and freeze bn
suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.train() model = merge_freeze_bn(model) #it will change bn to eval() mode during training ...
-
Unquantize weight before update it
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.apply(unquant_weight) # using original weight while updating optimizer.step() ...
-
Dump weight optimized model
from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook ... ... model.apply(quant_dequant_weight) save_checkpoint(...) model.apply(unquant_weight) ...
-
Using EMA with caution(Not recommended).
-
Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)
python test/test_cifar10.py
-
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)
python test/test_imagenet.py --pretrained path_to_imagenet_dataset
-
ImageNet
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1 --lr 0.001 --pretrained --epoch 10 # nnie_lr_e-3_ft python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1 --lr 0.0001 --pretrained --epoch 10 # lr_e-4_ft python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1 --lr 0.0001 --pretrained --epoch 10 # nnie_lr_e-4_ft
finetune result:
trt_fp32 trt_int8 nnie torchvision 0.56992 0.56424 0.56026 nnie_lr_e-3_ft 0.56600 0.56328 0.56612 lr_e-4_ft 0.57884 0.57502 0.57542 nnie_lr_e-4_ft 0.57834 0.57524 0.57730 -
coco
net: simplified yolov5s
train 300 epoches, hi3559 test result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.338
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.540
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.357
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.187
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.377
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.445
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.284
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.484
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.542
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.357
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.679
finetune 20 epoches, hi3559 test result:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.339
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.539
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.360
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.191
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.378
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.446
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.285
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.485
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.544
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.361
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.596
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
- Generate quantized model directly.
HiSVP 量化库使用指南
Quantizing deep convolutional networks for efficient inference: A whitepaper