Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Add graph of P6 model #273

Merged
merged 14 commits into from
Nov 10, 2022
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,10 @@ The master branch works with **PyTorch 1.6+**.

MMYOLO decomposes the framework into different components where users can easily customize a model by combining different modules with various training and testing strategies.

<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="BaseModule"/>
The figure is contributed by RangeKing@GitHub, thank you very much!
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="BaseModule-P5"/>
The figure above is contributed by RangeKing@GitHub, thank you very much!

And the figure of P6 model is in [model_design.md](docs\en\algorithm_descriptions\model_design.md).

</details>

Expand Down
4 changes: 3 additions & 1 deletion README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,11 @@ MMYOLO 是一个基于 PyTorch 和 MMDetection 的 YOLO 系列算法开源工具

MMYOLO 将框架解耦成不同的模块组件,通过组合不同的模块和训练测试策略,用户可以便捷地构建自定义模型。

<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类"/>
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类-P5"/>
图为 RangeKing@GitHub 提供,非常感谢!

P6 模型图详见 [model_design.md](docs\en\algorithm_descriptions\model_design.md)。
RangeKing marked this conversation as resolved.
Show resolved Hide resolved

</details>

## 最新进展
Expand Down
16 changes: 12 additions & 4 deletions docs/en/algorithm_descriptions/model_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,19 @@

## YOLO series model basic class

The structural graph is provided by RangeKing@GitHub. Thank you RangeKing!
The structural figure is provided by RangeKing@GitHub. Thank you RangeKing!

<div align=center>
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" width=800 alt="BaseModule">
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="BaseModule-P5">
Figure 1: P5 model structure
</div>

Most YOLO series algorithms adopt a unified algorithm-building structure, typically as Darknet + PAFPN. In order to let users quickly understand the YOLO series algorithm architecture, we deliberately designed the `BaseBackbone` + `BaseYOLONeck` structure, as shown in the above graph.
<div align=center>
<img src="https://user-images.githubusercontent.com/27466624/200850066-0c434173-2d40-4c12-8de3-eda473ff172f.jpg" alt="BaseModule-P6">
Figure 2: P6 model structure
</div>

Most YOLO series algorithms adopt a unified algorithm-building structure, typically as Darknet + PAFPN. In order to let users quickly understand the YOLO series algorithm architecture, we deliberately designed the `BaseBackbone` + `BaseYOLONeck` structure, as shown in the above figure.

The benefits of the abstract `BaseBackbone` include:

Expand All @@ -20,7 +26,9 @@ The benefits of the abstract `BaseBackbone` include:

### BaseBackbone

We can see in the above graph, as for P5, `BaseBackbone` includes 1 stem layer and 4 stage layers which are similar to the basic structure of ResNet. Different backbone network algorithms inherit the `BaseBackbone`. Users can build each layer of the whole network by implementing customized basic modules through the internal `build_xx` method.
- As shown in Figure 1, for P5, `BaseBackbone` includes 1 stem layer and 4 stage layers which are similar to the basic structure of ResNet.
- As shown in Figure 2, for P6, `BaseBackbone` includes 1 stem layer and 5 stage layers.
Different backbone network algorithms inherit the `BaseBackbone`. Users can build each layer of the whole network by implementing customized basic modules through the internal `build_xx` method.

### BaseYOLONeck

Expand Down
39 changes: 30 additions & 9 deletions docs/en/algorithm_descriptions/yolov5_description.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,13 @@
## 0 Introduction

<div align=center >
<img alt="YOLOv5_structure_v3.4" src="https://user-images.githubusercontent.com/27466624/200000324-70ae078f-cea7-4189-8baa-440656797dad.jpg"/>
<img alt="YOLOv5-P5_structure_v3.4" src="https://user-images.githubusercontent.com/27466624/200000324-70ae078f-cea7-4189-8baa-440656797dad.jpg"/>
Figure 1: YOLOv5-P5 model structure
</div>

<div align=center >
<img alt="YOLOv5-P6_structure_v1.0" src="https://user-images.githubusercontent.com/27466624/200845705-c9f3300f-9847-4933-b79d-0efcf0286e16.jpg"/>
Figure 2: YOLOv5-P6 model structure
</div>

RangeKing@github provides the graph above. Thanks, RangeKing!
Expand All @@ -15,7 +21,11 @@ In short, the main features of YOLOv5 are:
2. **Fast training speed**: the training time in the case of 300 epochs is similar to most of the one-stage and two-stage algorithms under 12 epochs, such as RetinaNet, ATSS, and Faster R-CNN.
3. **Abundant optimization for corner cases**: YOLOv5 has implemented many optimizations. The functions and documentation are richer as well.

This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5.
Figures 1 and 2 show that the main differences between the P5 and P6 versions of YOLOv5 are the network structure and the image input resolution. Other differences, such as the number of anchors and loss weights, can be found in [configuration files](https://github.com/open-mmlab/mmyolo/blob/main/configs/yolov5/). This article will start with the principle of the YOLOv5 algorithm and then focus on analyzing the implementation in MMYOLO. The follow-up part includes the guide and speed benchmark of YOLOv5.
RangeKing marked this conversation as resolved.
Show resolved Hide resolved

```{hint}
Unless specified, the P5 model is described by default in this documentation.
```

We hope this article becomes your core document to start and master YOLOv5. Since YOLOv5 is still constantly updated, we will also keep updating this document. So please always catch up with the latest version.

Expand Down Expand Up @@ -245,36 +255,47 @@ This section was written by RangeKing@github. Thanks a lot!

The YOLOv5 network structure is the standard `CSPDarknet` + `PAFPN` + `non-decoupled Head`.

The size of the YOLOv5 network structure is determined by the `deepen_factor` and `widen_factor` parameters. `deepen_factor` controls the depth of the network structure, that is, the number of stacks of `DarknetBottleneck` modules in `CSPLayer`. `widen_factor` controls the width of the network structure, that is, the number of channels of the module output feature map. Take YOLOv5-l as an example. Its `deepen_factor = widen_factor = 1.0` , the overall structure is shown in the graph above.
The size of the YOLOv5 network structure is determined by the `deepen_factor` and `widen_factor` parameters. `deepen_factor` controls the depth of the network structure, that is, the number of stacks of `DarknetBottleneck` modules in `CSPLayer`. `widen_factor` controls the width of the network structure, that is, the number of channels of the module output feature map. Take YOLOv5-l as an example. Its `deepen_factor = widen_factor = 1.0`. the overall structure is shown in the graph above.

The upper part of the figure is an overview of the model; the lower part is the specific network structure, in which the modules are marked with numbers in serial, which is convenient for users to correspond to the configuration files of the YOLOv5 official repository. The middle part is the detailed composition of each sub-module.

If you want to use **netron** to visualize the details of the network structure, just open the ONNX file format exported by MMDeploy in netron.
If you want to use **netron** to visualize the details of the network structure, open the ONNX file format exported by MMDeploy in netron.

```{hint}
The shapes of the feature map in Section 1.2 are (B, C, H, W) by default.
```

#### 1.2.1 Backbone

`CSPDarknet` in MMYOLO inherits from `BaseBackbone`. The overall structure is similar to `ResNet` with a total of 5 layers of design, including one `Stem Layer` and four `Stage Layer`:

- `Stem Layer` is a `ConvModule` whose kernel size is 6x6. It is more efficient than the `Focus` module used before v6.1.
- Each of the first three `Stage Layer` consists of one `ConvModule` and one `CSPLayer`, as shown in the Details part in the graph above. `ConvModule` is a 3x3 `Conv2d` + `BatchNorm` + `SiLU activation function` module. `CSPLayer` is the C3 module in the official YOLOv5 repository, consisting of three `ConvModule` + n `DarknetBottleneck` with residual connections.
- The 4th `Stage Layer` adds an `SPPF` module at the end. The `SPPF` module is to serialize the input through multiple 5x5 `MaxPool2d` layers, which has the same effect as the `SPP` module but is faster.
- The P5 model passes the corresponding results from the second to the fourth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 640x640 input image as an example. The output features are (B, 256, 80, 80), (B,512,40,40), and (B,1024,20,20), the corresponding stride is 8/16/32.
- Except for the last `Stage Layer`, each `Stage Layer` consists of one `ConvModule` and one `CSPLayer`, as shown in the Details part in the graph above. `ConvModule` is a 3x3 `Conv2d` + `BatchNorm` + `SiLU activation function` module. `CSPLayer` is the C3 module in the official YOLOv5 repository, consisting of three `ConvModule` + n `DarknetBottleneck` with residual connections.
- The last `Stage Layer` adds an `SPPF` module at the end. The `SPPF` module is to serialize the input through multiple 5x5 `MaxPool2d` layers, which has the same effect as the `SPP` module but is faster.
- The P5 model passes the corresponding results from the second to the fourth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 640x640 input image as an example. The output features are (B, 256, 80, 80), (B,512,40,40), and (B,1024,20,20). The corresponding stride is 8/16/32.
- The P6 model passes the corresponding results from the second to the fifth `Stage Layer` to the `Neck` structure and extracts three output feature maps. Take a 1280x1280 input image as an example. The output features are (B, 256, 160, 160), (B,512,80,80), (B,768,40,40), and (B,1024,20,20). The corresponding stride is 8/16/32/64.

#### 1.2.2 Neck

There is no **Neck** part in the official YOLOv5. However, to facilitate users to correspond to other object detection networks easier, we split the `Head` of the official repository into `PAFPN` and `Head`.

Based on the `BaseYOLONeck` structure, YOLOv5's `Neck` also follows the same build process. However, for non-existed modules, we use `nn.Identity` instead.

The feature maps output by the Neck module are the same as the Backbone, which is (B,256,80,80), (B,512,40,40), and (B,1024,20,20).
The feature maps output by the Neck module is the same as the Backbone. The P5 model is (B,256,80,80), (B,512,40,40) and (B,1024,20,20); the P6 model is (B,256,160,160), (B,512,80,80), (B,768,40,40) and (B,1024,20,20).

#### 1.2.3 Head

The `Head` structure of YOLOv5 is the same as YOLOv3, which is a `non-decoupled Head`. The Head module includes three convolution modules that do not share weights. They are used only for input feature map transformation.

The `PAFPN` outputs three feature maps of different scales, whose shapes are (B,256,80,80), (B,512,40,40), and (B,1024,20,20) accordingly.

Since YOLOv5 has a non-decoupled output, that is, classification and bbox detection results are all in different channels of the same convolution module. Taking the COCO dataset as an example, when the input is 640x640 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`. `3` represents three anchors, `4` represents the bbox prediction branch, `1` represents the obj prediction branch, and `80` represents the class prediction branch of the COCO dataset.
Since YOLOv5 has a non-decoupled output, that is, classification and bbox detection results are all in different channels of the same convolution module. Taking the COCO dataset as an example:

- When the input of P5 model is 640x640 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`.

- When the input of P6 model is 1280x1280 resolution, the output shapes of the Head module are `(B, 3x(4+1+80),160,160)`, `(B, 3x(4+1+80),80,80)`, `(B, 3x(4+1+80),40,40)` and `(B, 3x(4+1+80),20,20)`.

`3` represents three anchors, `4` represents the bbox prediction branch, `1` represents the obj prediction branch, and `80` represents the class prediction branch of the COCO dataset.

### 1.3 Positive and negative sample assignment strategy

Expand Down
13 changes: 11 additions & 2 deletions docs/zh_cn/algorithm_descriptions/model_design.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,13 @@
下图为 RangeKing@GitHub 提供,非常感谢!

<div align=center>
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类">
<img src="https://user-images.githubusercontent.com/27466624/199999337-0544a4cb-3cbd-4f3e-be26-bcd9e74db7ff.jpg" alt="基类 P5">
图 1:P5 模型结构
</div>

<div align=center>
<img src="https://user-images.githubusercontent.com/27466624/200850066-0c434173-2d40-4c12-8de3-eda473ff172f.jpg" alt="基类 P6">
图 2:P6 模型结构
</div>

YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Darknet + PAFPN。为了让用户快速理解 YOLO 系列算法架构,我们特意设计了如上图中的 BaseBackbone + BaseYOLONeck 结构。
Expand All @@ -20,7 +26,10 @@ YOLO 系列算法大部分采用了统一的算法搭建结构,典型的如 Da

### BaseBackbone

如上图所示,对于 P5 而言,BaseBackbone 包括 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构,不同算法的主干网络继承 BaseBackbone,用户可以通过实现内部的 `build_xx` 方法,使用自定义的基础模块来构建每一层的内部结构。
- 如图 1 所示,对于 P5 而言,BaseBackbone 为包含 1 个 stem 层 + 4 个 stage 层的类似 ResNet 的基础结构。
- 如图 2 所示,对于 P6 而言,BaseBackbone 为包含 1 个 stem 层 + 5 个 stage 层的结构。

不同算法的主干网络继承 BaseBackbone,用户可以通过实现内部的 `build_xx` 方法,使用自定义的基础模块来构建每一层的内部结构。

### BaseYOLONeck

Expand Down
Loading