Develop (PaddlePaddle#1379)

* cp dev -> 2.3
haohongxiang · Nov 2, 2021 · dfb26b6 · dfb26b6
1 parent 796aa90
commit dfb26b6
Show file tree

Hide file tree

Showing 40 changed files with 812 additions and 302 deletions.
diff --git a/README_ch.md b/README_ch.md
@@ -7,32 +7,26 @@
 飞桨图像识别套件PaddleClas是飞桨为工业界和学术界所准备的一个图像识别任务的工具集，助力使用者训练出更好的视觉模型和应用落地。
 
 **近期更新**
-- 2021.10.31 发布[PP-ShiTu技术报告](./docs/PP_ShiTu.pdf)，优化文档，新增饮料识别demo
-- 2021.10.23 发布PP-ShiTu图像识别系统，cpu上200ms即可完成在10w+库的图像识别。
+
+- 2021.11.1 发布[PP-ShiTu技术报告](https://arxiv.org/pdf/2111.00775.pdf)，新增饮料识别demo
+- 2021.10.23 发布轻量级图像识别系统PP-ShiTu，CPU上0.2s即可完成在10w+库的图像识别。
 [点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)立即体验
-- 2021.09.17 增加PaddleClas自研PP-LCNet系列模型, 这些模型在Intel CPU上有较强的竞争力。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](docs/zh_CN/models/PP-LCNet.md)，相关指标和预训练权重可以从 [这里](docs/zh_CN/ImageNet_models_cn.md)下载。
+- 2021.09.17 发布PP-LCNet系列超轻量骨干网络模型, 在Intel CPU上，单张图像预测速度约5ms，ImageNet-1K数据集上Top1识别准确率达到80.82%，超越ResNet152的模型效果。PP-LCNet的介绍可以参考[论文](https://arxiv.org/pdf/2109.15099.pdf), 或者[PP-LCNet模型介绍](docs/zh_CN/models/PP-LCNet.md)，相关指标和预训练权重可以从 [这里](docs/zh_CN/algorithm_introduction/ImageNet_models.md)下载。
 - [more](./docs/zh_CN/others/update_history.md)
 
 ## 特性
 
-- PP-ShiTu轻量图像识别系统：集成了目标检测、特征学习、图像检索等模块，广泛适用于各类图像识别任务。
-cpu上200ms即可完成在10w+库的图像识别。
-详细介绍见[PP-ShiTu: A Practical Lightweight Image Recognition System](./docs/PP_ShiTu.pdf)
+- PP-ShiTu轻量图像识别系统：集成了目标检测、特征学习、图像检索等模块，广泛适用于各类图像识别任务。cpu上0.2s即可完成在10w+库的图像识别。
 
-- PP-LCNet轻量级CPU骨干网络：专门为CPU设备打造轻量级骨干网络，速度、精度均超越竞品。
-详细介绍见[PP-LCNet: A Lightweight CPU Convolutional Neural Network](https://arxiv.org/pdf/2109.15099.pdf),
-或者[PP-LCNet模型介绍](docs/zh_CN/models/PP-LCNet.md)。
+- PP-LCNet轻量级CPU骨干网络：专门为CPU设备打造轻量级骨干网络，速度、精度均远超竞品。
 
-- 丰富的预训练模型库：提供了35个系列共164个ImageNet预训练模型，其中6个精选系列模型支持结构快速修改。
+- 丰富的预训练模型库：提供了36个系列共175个ImageNet预训练模型，其中7个精选系列模型支持结构快速修改。
 
 - 全面易用的特征学习组件：集成arcmargin, triplet loss等12度量学习方法，通过配置文件即可随意组合切换。
 
 - SSLD知识蒸馏：14个分类预训练模型，精度普遍提升3%以上；其中ResNet50_vd模型在ImageNet-1k数据集上的Top-1精度达到了84.0%，
 Res2Net200_vd预训练模型Top-1精度高达85.1%。
 
-- 数据增广：支持AutoAugment、Cutout、Cutmix等8种数据增广算法详细介绍、代码复现和在统一实验环境下的效果评估。
-
-
 <div align="center">
 <img src="./docs/images/recognition.gif"  width = "400" />
 </div>
@@ -47,6 +41,7 @@ Res2Net200_vd预训练模型Top-1精度高达85.1%。
 </div>
 
 ## 快速体验
+
 PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick_start_recognition.md)
 
 ## 文档教程
@@ -59,9 +54,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
     - [尝鲜版](./docs/zh_CN/quick_start/quick_start_classification_new_user.md)
     - [进阶版](./docs/zh_CN/quick_start/quick_start_classification_professional.md) 
 - [PP-ShiTu图像识别系统介绍](#图像识别系统介绍)
-  - [主体检测](./docs/zh_CN/algorithm_introduction/mainbody_detection.md)
-  - [特征学习](./docs/zh_CN/algorithm_introduction/metric_learning.md)
-  - [向量检索](./deploy/vector_search/README.md)
+- [骨干网络和预训练模型库](./docs/zh_CN/algorithm_introduction/ImageNet_models.md)
 - 数据准备
   - [图像分类数据集介绍](./docs/zh_CN/data_preparation/classification_dataset.md)
   - [图像识别数据集介绍](./docs/zh_CN/data_preparation/recognition_dataset.md)
@@ -83,7 +76,6 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
 - 算法介绍
     - [图像分类任务介绍](./docs/zh_CN/algorithm_introduction/image_classification.md)
     - [度量学习介绍](./docs/zh_CN/algorithm_introduction/metric_learning.md)
-    - [骨干网络和预训练模型库](./docs/zh_CN/algorithm_introduction/ImageNet_models.md)
 - 高阶使用
     - [数据增广](./docs/zh_CN/advanced_tutorials/DataAugmentation.md)
     - [模型量化](./docs/zh_CN/advanced_tutorials/model_prune_quantization.md)
@@ -92,7 +84,7 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
     - [社区贡献指南](./docs/zh_CN/advanced_tutorials/how_to_contribute.md)
 - FAQ
     - [图像识别精选问题](docs/zh_CN/faq_series/faq_2021_s2.md)
-    - [图像分类精选问题](docs/zh_CN/faq_series/faq.md)
+    - [图像分类精选问题](docs/zh_CN/faq_series/faq_selected_30.md)
     - [图像分类FAQ第一季](docs/zh_CN/faq_series/faq_2020_s1.md)
     - [图像分类FAQ第二季](docs/zh_CN/faq_series/faq_2021_s1.md)
 - [许可证书](#许可证书)
@@ -105,9 +97,8 @@ PP-ShiTu图像识别快速体验：[点击这里](./docs/zh_CN/quick_start/quick
 <img src="./docs/images/structure.jpg"  width = "800" />
 </div>
 
-PP-ShiTu图像识别系统分为三步：（1）通过一个目标检测模型，检测图像物体候选区域（2）对每个候选区域进行特征提取（3）与检索库中图像进行特征匹配，提取识别结果。
+PP-ShiTu是一个实用的轻量级通用图像识别系统，主要由主体检测、特征学习和向量检索三个模块组成。该系统从骨干网络选择和调整、损失函数的选择、数据增强、学习率变换策略、正则化参数选择、预训练模型使用以及模型裁剪量化8个方面，采用多种策略，对各个模块的模型进行优化，最终得到在CPU上仅0.2s即可完成10w+库的图像识别的系统。更多细节请参考[PP-ShiTu技术方案](https://arxiv.org/pdf/2111.00775.pdf)。
 
-对于新的未知类别，无需重新训练模型，只需要在检索库补入该类别图像，重新建立检索库，就可以识别该类别。
 
 <a name="识别效果展示"></a>
 ## PP-ShiTu图像识别系统效果展示 
@@ -152,4 +143,3 @@ PP-ShiTu图像识别系统分为三步：（1）通过一个目标检测模型
 - 非常感谢[nblib](https://github.com/nblib)修正了PaddleClas中RandErasing的数据增广配置文件。
 - 非常感谢[chenpy228](https://github.com/chenpy228)修正了PaddleClas文档中的部分错别字。
 - 非常感谢[jm12138](https://github.com/jm12138)为PaddleClas添加ViT，DeiT系列模型和RepVGG系列模型。
-- 非常感谢[FutureSI](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/76563)对PaddleClas代码的解析与总结。
diff --git a/README_en.md b/README_en.md
@@ -8,7 +8,8 @@ PaddleClas is an image recognition toolset for industry and academia, helping us
 
 **Recent updates**
 
-- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
+- 2021.09.17 Add PP-LCNet series model developed by PaddleClas, these models show strong competitiveness on Intel CPUs. 
+For the introduction of PP-LCNet, please refer to [paper](https://arxiv.org/pdf/2109.15099.pdf) or [PP-LCNet model introduction](docs/en/models/PP-LCNet_en.md). The metrics and pretrained model are available [here](docs/en/ImageNet_models_en.md).
 
 - 2021.06.29 Add Swin-transformer series model，Highest top1 acc on ImageNet1k dataset reaches 87.2%, training, evaluation and inference are all supported. Pretrained models can be downloaded [here](docs/en/models/models_intro_en.md).
 - 2021.06.16 PaddleClas release/2.2. Add metric learning and vector search modules. Add product recognition, animation character recognition, vehicle recognition and logo recognition. Added 30 pretrained models of LeViT, Twins, TNT, DLA, HarDNet, and RedNet, and the accuracy is roughly the same as that of the paper.

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,27 @@
+# benchmark使用说明
+
+此目录所有shell脚本是为了测试PaddleClas中不同模型的速度指标，如单卡训练速度指标、多卡训练速度指标等。
+
+## 相关脚本说明
+
+一共有3个脚本：
+
+- `prepare_data.sh`: 下载相应的测试数据，并配置好数据路径
+- `run_benchmark.sh`: 执行单独一个训练测试的脚本，具体调用方式，可查看脚本注释
+- `run_all.sh`: 执行所有训练测试的入口脚本
+
+## 使用说明
+
+**注意**：为了跟PaddleClas中其他的模块的执行目录保持一致，此模块的执行目录为`PaddleClas`的根目录。
+
+### 1.准备数据
+
+```shell
+bash benchmark/prepare_data.sh
+```
+
+### 2.执行所有模型的测试
+
+```shell
+bash benchmark/run_all.sh
+```
diff --git a/benchmark/prepare_data.sh b/benchmark/prepare_data.sh
@@ -0,0 +1,11 @@
+#!/bin/bash
+dataset_url=$1
+
+cd dataset
+rm -rf ILSVRC2012
+wget -nc ${dataset_url}
+tar xf ILSVRC2012_val.tar
+ln -s ILSVRC2012_val ILSVRC2012
+cd ILSVRC2012
+ln -s val_list.txt train_list.txt
+cd ../../
diff --git a/benchmark/run_all.sh b/benchmark/run_all.sh
@@ -0,0 +1,25 @@
+# 提供可稳定复现性能的脚本，默认在标准docker环境内py37执行： paddlepaddle/paddle:latest-gpu-cuda10.1-cudnn7  paddle=2.1.2  py=37
+# 执行目录：需说明
+# cd **
+# 1 安装该模型需要的依赖 (如需开启优化策略请注明)
+# pip install ...
+# 2 拷贝该模型需要数据、预训练模型
+# 3 批量运行（如不方便批量，1，2需放到单个模型中）
+
+model_mode_list=(MobileNetV1 MobileNetV2 MobileNetV3_large_x1_0 EfficientNetB0 ShuffleNetV2_x1_0 DenseNet121 HRNet_W48_C SwinTransformer_tiny_patch4_window7_224 alt_gvt_base)
+fp_item_list=(fp32)
+bs_list=(32 64 96 128)
+for model_mode in ${model_mode_list[@]}; do
+      for fp_item in ${fp_item_list[@]}; do
+          for bs_item in ${bs_list[@]};do
+	    echo "index is speed, 1gpus, begin, ${model_name}"
+	    run_mode=sp
+	    CUDA_VISIBLE_DEVICES=0 bash benchmark/run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode}     #  (5min)
+	    sleep 10
+            echo "index is speed, 8gpus, run_mode is multi_process, begin, ${model_name}"
+            run_mode=mp
+            CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash benchmark/run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 10 ${model_mode} 
+            sleep 10
+            done
+      done
+done
diff --git a/benchmark/run_benchmark.sh b/benchmark/run_benchmark.sh
@@ -0,0 +1,56 @@
+#!/usr/bin/env bash
+set -xe
+# 运行示例：CUDA_VISIBLE_DEVICES=0 bash run_benchmark.sh ${run_mode} ${bs_item} ${fp_item} 500 ${model_mode}
+# 参数说明
+function _set_params(){
+    run_mode=${1:-"sp"}          # 单卡sp|多卡mp
+    batch_size=${2:-"64"}
+    fp_item=${3:-"fp32"}        # fp32|fp16
+    epochs=${4:-"10"}       # 可选，如果需要修改代码提前中断
+    model_name=${5:-"model_name"}
+    run_log_path="${TRAIN_LOG_DIR:-$(pwd)}/benchmark"  # TRAIN_LOG_DIR 后续QA设置该参数
+
+#   以下不用修改   
+    device=${CUDA_VISIBLE_DEVICES//,/ }
+    arr=(${device})
+    num_gpu_devices=${#arr[*]}
+    log_file=${run_log_path}/clas_${model_name}_${run_mode}_bs${batch_size}_${fp_item}_${num_gpu_devices}
+}
+function _train(){
+    echo "Train on ${num_gpu_devices} GPUs"
+    echo "current CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES, gpus=$num_gpu_devices, batch_size=$batch_size"
+
+    if [ ${fp_item} = "fp32" ];then
+        model_config=`find ppcls/configs/ImageNet -name ${model_name}.yaml` 
+    else
+        model_config=`find ppcls/configs/ImageNet -name ${model_name}_fp16.yaml` 
+    fi
+
+    train_cmd="-c ${model_config} -o DataLoader.Train.sampler.batch_size=${batch_size} -o Global.epochs=${epochs}"   
+    case ${run_mode} in
+    sp) train_cmd="python -u tools/train.py ${train_cmd}" ;;
+    mp)
+        train_cmd="python -m paddle.distributed.launch --log_dir=./mylog --gpus=$CUDA_VISIBLE_DEVICES tools/train.py ${train_cmd}"
+        log_parse_file="mylog/workerlog.0" ;;
+    *) echo "choose run_mode(sp or mp)"; exit 1;
+    esac
+    rm -rf mylog
+# 以下不用修改
+    timeout 15m ${train_cmd} > ${log_file} 2>&1
+    if [ $? -ne 0 ];then
+        echo -e "${model_name}, FAIL"
+        export job_fail_flag=1
+    else
+        echo -e "${model_name}, SUCCESS"
+        export job_fail_flag=0
+    fi
+    kill -9 `ps -ef|grep 'python'|awk '{print $2}'`
+
+    if [ $run_mode = "mp" -a -d mylog ]; then
+        rm ${log_file}
+        cp mylog/workerlog.0 ${log_file}
+    fi
+}
+
+_set_params $@
+_train
diff --git a/deploy/configs/build_general.yaml b/deploy/configs/build_general.yaml
@@ -0,0 +1,36 @@
+Global:
+  rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer"
+  batch_size: 32
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+RecPreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+
+RecPostProcess: null
+
+# indexing engine config
+IndexProcess:
+  index_method: "HNSW32" # supported: HNSW32, IVF, Flat
+  image_root: "./drink_dataset_v1.0/gallery/"
+  index_dir: "./drink_dataset_v1.0/index"
+  data_file:  "./drink_dataset_v1.0/gallery/drink_label.txt"
+  index_operation: "new" # suported: "append", "remove", "new"
+  delimiter: "\t"
+  dist_type: "IP"
+  embedding_size: 512
diff --git a/deploy/configs/inference_general.yaml b/deploy/configs/inference_general.yaml
@@ -0,0 +1,55 @@
+Global:
+  infer_imgs: "./drink_dataset_v1.0/test_images/nongfu_spring.jpeg"
+  det_inference_model_dir: "./models/picodet_PPLCNet_x2_5_mainbody_lite_v1.0_infer"
+  rec_inference_model_dir: "./models/general_PPLCNet_x2_5_lite_v1.0_infer"
+  rec_nms_thresold: 0.05
+
+  batch_size: 1
+  image_shape: [3, 640, 640]
+  threshold: 0.2
+  max_det_results: 5
+  labe_list:
+  - foreground
+
+  # inference engine config
+  use_gpu: True
+  enable_mkldnn: True
+  cpu_num_threads: 10
+  enable_benchmark: True
+  use_fp16: False
+  ir_optim: True
+  use_tensorrt: False
+  gpu_mem: 8000
+  enable_profile: False
+
+DetPreProcess:
+  transform_ops:
+    - DetResize:
+        interp: 2
+        keep_ratio: false
+        target_size: [640, 640]
+    - DetNormalizeImage:
+        is_scale: true
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+    - DetPermute: {}
+DetPostProcess: {}
+
+RecPreProcess:
+  transform_ops:
+    - ResizeImage:
+        size: 224
+    - NormalizeImage:
+        scale: 0.00392157
+        mean: [0.485, 0.456, 0.406]
+        std: [0.229, 0.224, 0.225]
+        order: ''
+    - ToCHWImage:
+
+RecPostProcess: null
+
+# indexing engine config
+IndexProcess:
+  index_dir: "./drink_dataset_v1.0/index/"
+  return_k: 5
+  score_thres: 0.5