Skip to content

[Docs] Deployment translation #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 26, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
286 changes: 286 additions & 0 deletions docs/en/deploy/basic_deployment_guide.md
Original file line number Diff line number Diff line change
@@ -1 +1,287 @@
# Basic Deployment Guide

## Introduction of MMDeploy

MMDeploy is an open-source deep learning model deployment toolset. It is a part of the [OpenMMLab](https://openmmlab.com/) project, and provides **a unified experience of exporting different models** to various platforms and devices of the OpenMMLab series libraries. Using MMDeploy, developers can easily export the specific compiled SDK they need from the training result, which saves a lot of effort.

More detailed introduction and guides can be found [here](https://github.com/open-mmlab/mmdeploy/blob/dev-1.x/docs/en/get_started.md)

## Supported Algorithms

Currently our deployment kit supports on the following models and backends:

| Model | Task | OnnxRuntime | TensorRT | Model config |
| :----- | :-------------- | :---------: | :------: | :---------------------------------------------------------------------: |
| YOLOv5 | ObjectDetection | Y | Y | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5) |
| YOLOv6 | ObjectDetection | Y | Y | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov6) |
| YOLOX | ObjectDetection | Y | Y | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/yolox) |
| RTMDet | ObjectDetection | Y | Y | [config](https://github.com/open-mmlab/mmyolo/tree/main/configs/rtmdet) |

Note: ncnn and other inference backends support are coming soon.

## How to Write Config for MMYOLO

All config files related to the deployment are located at [`configs/deploy`](../../../configs/deploy/).

You only need to change the relative data processing part in the model config file to support either static or dynamic input for your model. Besides, MMDeploy integrates the post-processing parts as customized ops, you can modify the strategy in `post_processing` parameter in `codebase_config`.

Here is the detail description:

```python
codebase_config = dict(
type='mmyolo',
task='ObjectDetection',
model_type='end2end',
post_processing=dict(
score_threshold=0.05,
confidence_threshold=0.005,
iou_threshold=0.5,
max_output_boxes_per_class=200,
pre_top_k=5000,
keep_top_k=100,
background_label_id=-1),
module=['mmyolo.deploy'])
```

- `score_threshold`: set the score threshold to filter candidate bboxes before `nms`
- `confidence_threshold`: set the confidence threshold to filter candidate bboxes before `nms`
- `iou_threshold`: set the `iou` threshold for removing duplicates in `nums`
- `max_output_boxes_per_class`: set the maximum number of bboxes for each class
- `pre_top_k`: set the number of fixedcandidate bboxes before `nms`, sorted by scores
- `keep_top_k`: set the number of output candidate bboxs after `nms`
- `background_label_id`: set to `-1` as MMYOLO has no background class information

### Configuration for Static Inputs

#### 1. Model Config

Taking `YOLOv5` of MMYOLO as an example, here are the details:

```python
_base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'

test_pipeline = [
dict(type='LoadImageFromFile', file_client_args=_base_.file_client_args),
dict(
type='LetterResize',
scale=_base_.img_scale,
allow_scale_up=False,
use_mini_pad=False,
),
dict(type='LoadAnnotations', with_bbox=True, _scope_='mmdet'),
dict(
type='mmdet.PackDetInputs',
meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',
'scale_factor', 'pad_param'))
]

test_dataloader = dict(
dataset=dict(pipeline=test_pipeline, batch_shapes_cfg=None))
```

`_base_ = '../../yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py'` inherits the model config in the training stage.

`test_pipeline` adds the data processing piple for the deployment, `LetterResize` controls the size of the input images and the input for the converted model

`test_dataloader` adds the dataloader config for the deployment, `batch_shapes_cfg` decides whether to use the `batch_shapes` strategy. More details can be found at [yolov5 configs](../user_guides/config.md)

#### 2. Deployment Config

Here we still use the `YOLOv5` in MMYOLO as the example. We can use [`detection_onnxruntime_static.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_static.py) as the config to deploy \`YOLOv5\` to \`ONNXRuntim\` with static inputs.

```python
_base_ = ['./base_static.py']
codebase_config = dict(
type='mmyolo',
task='ObjectDetection',
model_type='end2end',
post_processing=dict(
score_threshold=0.05,
confidence_threshold=0.005,
iou_threshold=0.5,
max_output_boxes_per_class=200,
pre_top_k=5000,
keep_top_k=100,
background_label_id=-1),
module=['mmyolo.deploy'])
backend_config = dict(type='onnxruntime')
```

`backend_config` indicates the deployment backend with `type='onnxruntime'`, other information can be referred from the third section.

To deploy the `YOLOv5` to `TensorRT`, please refer to the [`detection_tensorrt_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_static-640x640.py) as follows.

```python
_base_ = ['./base_static.py']
onnx_config = dict(input_shape=(640, 640))
backend_config = dict(
type='tensorrt',
common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 640, 640],
opt_shape=[1, 3, 640, 640],
max_shape=[1, 3, 640, 640])))
])
use_efficientnms = False
```

`backend_config` indices the backend with `type=‘tensorrt’`.

Different from `ONNXRuntime` deployment configuration, `TensorRT` needs to specify the input image size and the parameters required to build the engine file, including:

- `onnx_config` specifies the input shape as `input_shape=(640, 640)`
- `fp16_mode=False` and `max_workspace_size=1 << 30` in `backend_config['common_config']` indicates whether to build the engine in the parameter format of `fp16`, and the maximum video memory for the current `gpu` device, respectively. The unit is in `GB`. For detailed configuration of `fp16`, please refer to the [`detection_tensorrt-fp16_static-640x640.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt-fp16_static-640x640.py)
- The `min_shape`/`opt_shape`/`max_shape` in `backend_config['model_inputs']['input_shapes']['input']` should remain the same under static input, the default is `[1, 3, 640, 640]`.

`use_efficientnms` is a new configuration introduced by the `MMYOLO` series, indicating whether to enable `Efficient NMS Plugin` to replace `TRTBatchedNMS plugin` in `MMDeploy` when exporting `onnx`.

You can refer to the official [efficient NMS plugins](https://github.com/NVIDIA/TensorRT/blob/main/plugin/efficientNMSPlugin/README.md) by `TensorRT` for more details.

Note: this out-of-box feature is **only available in TensorRT>=8.0**, no need to compile it by yourself.

### Configuration for Dynamic Inputs

#### 1. Model Config

When you deploy a dynamic input model, you don't need to modify any model configuration files but the deployment configuration files.

#### 2. Deployment Config

To deploy the `YOLOv5` in MMYOLO to `ONNXRuntime`, please refer to the [`detection_onnxruntime_dynamic.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_onnxruntime_dynamic.py).

```python
_base_ = ['./base_dynamic.py']
codebase_config = dict(
type='mmyolo',
task='ObjectDetection',
model_type='end2end',
post_processing=dict(
score_threshold=0.05,
confidence_threshold=0.005,
iou_threshold=0.5,
max_output_boxes_per_class=200,
pre_top_k=5000,
keep_top_k=100,
background_label_id=-1),
module=['mmyolo.deploy'])
backend_config = dict(type='onnxruntime')
```

`backend_config` indicates the backend with `type='onnxruntime'`. Other parameters stay the same as the static input section.

To deploy the `YOLOv5` to `TensorRT`, please refer to the [`detection_tensorrt_dynamic-192x192-960x960.py`](https://github.com/open-mmlab/mmyolo/blob/main/configs/deploy/detection_tensorrt_dynamic-192x192-960x960.py).

```python
_base_ = ['./base_dynamic.py']
backend_config = dict(
type='tensorrt',
common_config=dict(fp16_mode=False, max_workspace_size=1 << 30),
model_inputs=[
dict(
input_shapes=dict(
input=dict(
min_shape=[1, 3, 192, 192],
opt_shape=[1, 3, 640, 640],
max_shape=[1, 3, 960, 960])))
])
use_efficientnms = False
```

`backend_config` indicates the backend with `type='tensorrt'`. Since the dynamic and static inputs are different in `TensorRT`, please check the details at [TensorRT dynamic input official introduction](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-843/developer-guide/index.html#work_dynamic_shapes).

`TensorRT` deployment requires you to specify `min_shape`, `opt_shape` , and `max_shape`. `TensorRT` limits the size of the input image between `min_shape` and `max_shape`.

`min_shape` is the minimum size of the input image. `opt_shape` is the common size of the input image, inference performance is best under this size. `max_shape` is the maximum size of the input image.

`use_efficientnms` configuration is the same as the `TensorRT` static input configuration in the previous section.

### INT8 Quantization Support

Note: Int8 quantization support will soon be released.

## How to Convert Model

### Usage

Set the root directory of `MMDeploy` as an env parameter `MMDEPLOY_DIR` using `export MMDEPLOY_DIR=/the/root/path/of/MMDeploy` command.

```shell
python3 ${MMDEPLOY_DIR}/tools/deploy.py \
${DEPLOY_CFG_PATH} \
${MODEL_CFG_PATH} \
${MODEL_CHECKPOINT_PATH} \
${INPUT_IMG} \
--test-img ${TEST_IMG} \
--work-dir ${WORK_DIR} \
--calib-dataset-cfg ${CALIB_DATA_CFG} \
--device ${DEVICE} \
--log-level INFO \
--show \
--dump-info
```

### Parameter Description

- `deploy_cfg`: set the deployment config path of MMDeploy for the model, including the type of inference framework, whether quantize, whether the input shape is dynamic, etc. There may be a reference relationship between configuration files, e.g. `configs/deploy/detection_onnxruntime_static.py`
- `model_cfg`: set the MMYOLO model config path, e.g. `configs/deploy/model/yolov5_s-deploy.py`, regardless of the path to MMDeploy
- `checkpoint`: set the torch model path. It can start with `http/https`, more details are available in `mmengine.fileio` apis
- `img`: set the path to the image or point cloud file used for testing during model conversion
- `--test-img`: set the image file that used to test model. If not specified, it will be set to `None`
- `--work-dir`: set the work directory that used to save logs and models
- `--calib-dataset-cfg`: use for calibration only for INT8 mode. If not specified, it will be set to None and use “val” dataset in model config for calibration
- `--device`: set the device used for model conversion. The default is `cpu`, for TensorRT used `cuda:0`
- `--log-level`: set log level which in `'CRITICAL', 'FATAL', 'ERROR', 'WARN', 'WARNING', 'INFO', 'DEBUG', 'NOTSET'`. If not specified, it will be set to `INFO`
- `--show`: show the result on screen or not
- `--dump-info`: output SDK information or not

## How to Evaluate Model

### Usage

After the model is converted to your backend, you can use `${MMDEPLOY_DIR}/tools/test.py` to evaluate the performance.

```shell
python3 ${MMDEPLOY_DIR}/tools/test.py \
${DEPLOY_CFG} \
${MODEL_CFG} \
--model ${BACKEND_MODEL_FILES} \
[--out ${OUTPUT_PKL_FILE}] \
[--format-only] \
[--metrics ${METRICS}] \
[--show] \
[--show-dir ${OUTPUT_IMAGE_DIR}] \
[--show-score-thr ${SHOW_SCORE_THR}] \
--device ${DEVICE} \
[--cfg-options ${CFG_OPTIONS}] \
[--metric-options ${METRIC_OPTIONS}]
[--log2file work_dirs/output.txt]
[--batch-size ${BATCH_SIZE}]
[--speed-test] \
[--warmup ${WARM_UP}] \
[--log-interval ${LOG_INTERVERL}]
```

### Parameter Description

- `deploy_cfg`: set the deployment config file path
- `model_cfg`: set the MMYOLO model config file path
- `--model`: set the converted model. For example, if we exported a TensorRT model, we need to pass in the file path with the suffix ".engine"
- `--out`: save the output result in pickle format, use only when you need it
- `--format-only`: format the output without evaluating it. It is useful when you want to format the result into a specific format and submit it to a test server
- `--metrics`: use the specific metric supported in MMYOLO to evaluate, such as "proposal" in COCO format data.
- `--show`: show the evaluation result on screen or not
- `--show-dir`: save the evaluation result to this directory, valid only when specified
- `--show-score-thr`: show the threshold for the detected bboxes or not
- `--device`: indicate the device to run the model. Note that some backends limit the running devices. For example, TensorRT must run on CUDA
- `--cfg-options`: pass in additional configs, which will override the current deployment configs
- `--metric-options`: add custom options for metrics. The key-value pair format in xxx=yyy will be the kwargs of the dataset.evaluate() method
- `--log2file`: save the evaluation results (with the speed) to a file
- `--batch-size`: set the batch size for inference, which will override the `samples_per_gpu` in data config. The default value is `1`, however, not every model supports `batch_size > 1`
- `--speed-test`: test the inference speed or not
- `--warmup`: warm up before speed test or not, works only when `speed-test` is specified
- `--log-interval`: set the interval between each log, works only when `speed-test` is specified

Note: other parameters in `${MMDEPLOY_DIR}/tools/test.py` are used for speed test, they will not affect the evaluation results.
Loading