Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add dataset analysis script #172

Merged
merged 32 commits into from
Nov 3, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d23dea6
messages
Zheng-LinXiao Oct 17, 2022
27a815d
again
Zheng-LinXiao Oct 19, 2022
8057135
again1
Zheng-LinXiao Oct 19, 2022
9bd865f
again_2
Zheng-LinXiao Oct 19, 2022
69aaae3
again_3
Zheng-LinXiao Oct 20, 2022
d0525c1
again_4
Zheng-LinXiao Oct 20, 2022
5b52274
again_5
Zheng-LinXiao Oct 20, 2022
cd959b7
Update docs/zh_cn/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
ffdbdf6
Update docs/zh_cn/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
8513e81
Update docs/en/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
cdb15cb
Update docs/en/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
50f48ba
Update docs/en/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
0f85995
Update docs/zh_cn/user_guides/useful_tools.md
Zheng-LinXiao Oct 21, 2022
32e4452
Update tools/analysis_tools/dataset_analysis.py
Zheng-LinXiao Oct 21, 2022
98e55b0
Update tools/analysis_tools/dataset_analysis.py
Zheng-LinXiao Oct 21, 2022
816ca18
Update tools/analysis_tools/dataset_analysis.py
Zheng-LinXiao Oct 21, 2022
654b454
Update tools/analysis_tools/dataset_analysis.py
Zheng-LinXiao Oct 21, 2022
2b0264a
Merge branch 'dev' into branchname
PeterH0323 Oct 21, 2022
70069c4
modify code
Zheng-LinXiao Oct 22, 2022
a7d676d
Update docs/en/user_guides/useful_tools.md
Zheng-LinXiao Oct 23, 2022
8899ac2
Modify document
Zheng-LinXiao Oct 24, 2022
3dc4622
Modify document
Zheng-LinXiao Oct 24, 2022
4d5765a
Merge branch 'branchname' of github.com:Zheng-LinXiao/mmyolo into bra…
Zheng-LinXiao Oct 24, 2022
87e7a2c
new code
Zheng-LinXiao Oct 31, 2022
23910d4
revise decuments and codes
Zheng-LinXiao Oct 31, 2022
d9faedb
Revise datails
Zheng-LinXiao Nov 1, 2022
b7c4271
Update tools/analysis_tools/dataset_analysis.py
Zheng-LinXiao Nov 1, 2022
5920ff0
modify func name
Zheng-LinXiao Nov 1, 2022
78da584
code
Zheng-LinXiao Nov 2, 2022
fafb44d
Documentation and code
Zheng-LinXiao Nov 2, 2022
4dfd80c
modify error meaasge
Zheng-LinXiao Nov 3, 2022
7780e4d
deleted height,
Zheng-LinXiao Nov 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 79 additions & 1 deletion docs/en/user_guides/useful_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,85 @@ python tools/analysis_tools/browse_dataset.py 'configs/yolov5/yolov5_s-v61_syncb
--not-show
```

## Convert Dataset
### Visualize dataset analysis

`tools/analysis_tools/dataset_analysis.py` help users get the renderings of the four functions, and print the list to display the name of the dataset category and the corresponding number, and save the pictures to the `dataset_analysis` folder under the current running directory.
Description of the script's functions:
The data required by each sub function is obtained through the data preparation of `main()`.
Function 1: Generated by the sub function `show_bbox_num` to display the distribution of categories and bbox instances.

<img src="https://user-images.githubusercontent.com/90811472/196891728-4c2f1ab3-01cb-445f-a6b8-39752387c40f.jpg"/>

Function 2: Generated by the sub function `show_bbox_wh` to display the width and height distribution of categories and bbox instances.

<img src="https://user-images.githubusercontent.com/90811472/199019573-650b9652-eb14-4bc0-a5e8-650dfc578fc8.jpg"/>

Function 3: Generated by the sub function `show_bbox_wh_ratio` to display the width to height ratio distribution of categories and bbox instances.

<img src="https://user-images.githubusercontent.com/90811472/199019593-0f810a21-18d2-41ac-b4fa-baa8288bcb23.jpg"/>

Function 3: Generated by the sub function `show_bbox_area` to display the distribution map of category and bbox instance area based on area rules.

<img src="https://user-images.githubusercontent.com/90811472/199022991-5388db47-d0f3-4201-9eee-13c5fab6bca9.jpg"/>

Print List: Generated by the sub function `show_class_list`.

<img src="https://user-images.githubusercontent.com/90811472/199090989-15109bbf-f035-477d-8566-e2a28de0935d.jpg"/>

```shell
python tools/analysis_tools/dataset_analysis.py ${CONFIG} \
[-h] \
[--type ${TYPE}] \
[--class-name ${CLASS_NAME}] \
[--area-rule ${AREA_RULE}] \
[--func ${FUNC}] \
[--output-dir ${OUTPUT_DIR}]
```

E,g:

1.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, By default,the data loadingt type is `train`, the area rule interval is `[0,32**2,96**2,1e5**2]`, and all classes and four function diagrams in the dataset are generated and saved to the current running directory `./dataset_analysis` folder:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py
```

2.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, change the data loading type from the default `train` to `val`:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--type val
```

3.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, change the display of all generated classes to specific classes. Take the display of `person` classes as an example:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--class-name person car
```

4.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, redefine the area rule. Taking the new value added as `120` as an example, the command entered is ` 32 96 120`, and the area rule interval becomes `[0, 32**2, 96**2,120**2, 1e5**2]`. Only one number can be added. To customize the area rule, the entered command must include `32 96`:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--area-rule 30 70 120
```

5.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, change the display of four function renderings to only display `Function 1` as an example:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
Zheng-LinXiao marked this conversation as resolved.
Show resolved Hide resolved
--func show_bbox_num
```

6.Use `config` file `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` analyze the dataset, modify the picture saving address to `work_ir/dataset_analysis`:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--output-dir work_dir/dataset_analysis
```

## Dataset Conversion

The folder `tools/data_converters` currently contains `ballon2coco.py` and `yolo2coco.py` two dataset conversion tools.

Expand Down
80 changes: 79 additions & 1 deletion docs/zh_cn/user_guides/useful_tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ mim run mmdet print_config [CONFIG]

### 可视化 COCO 标签

脚本 `tools/analysis_tools/browse_coco_json.py` 能够使用可视化显示 COCO 标签在图片的情况
脚本 `tools/analysis_tools/browse_coco_json.py` 能够使用可视化显示 COCO 标签在图片的情况

```shell
python tools/analysis_tools/browse_coco_json.py ${DATA_ROOT} \
Expand Down Expand Up @@ -108,6 +108,84 @@ python tools/analysis_tools/browse_dataset.py 'configs/yolov5/yolov5_s-v61_syncb
--not-show
```

### 可视化数据集分析

脚本 `tools/analysis_tools/dataset_analysis.py` 能够帮助用户得到四种功能的效果图,同时打印列表显示该数据集类别名称及对应数量,并将图片保存到当前运行目录下的 `dataset_analysis` 文件夹中。
关于该脚本的功能的说明:
通过 `main()` 的数据准备,得到每个子函数所需要的数据。
功能一:显示类别和 bbox 实例个数的分布图,通过子函数 `show_bbox_num` 生成。

<img src="https://user-images.githubusercontent.com/90811472/196891728-4c2f1ab3-01cb-445f-a6b8-39752387c40f.jpg"/>

功能二:显示类别和 bbox 实例宽、高的分布图,通过子函数 `show_bbox_wh` 生成。

<img src="https://user-images.githubusercontent.com/90811472/199019573-650b9652-eb14-4bc0-a5e8-650dfc578fc8.jpg"/>

功能三:显示类别和 bbox 实例宽/高比例的分布图,通过子函数 `show_bbox_wh_ratio` 生成。

<img src="https://user-images.githubusercontent.com/90811472/199019593-0f810a21-18d2-41ac-b4fa-baa8288bcb23.jpg"/>

功能四:基于面积规则下,显示类别和 bbox 实例面积的分布图,通过子函数 `show_bbox_area` 生成。

<img src="https://user-images.githubusercontent.com/90811472/199022991-5388db47-d0f3-4201-9eee-13c5fab6bca9.jpg"/>

打印列表显示,通过脚本中子函数 `show_class_list` 生成。

<img src="https://user-images.githubusercontent.com/90811472/199090989-15109bbf-f035-477d-8566-e2a28de0935d.jpg"/>

```shell
python tools/analysis_tools/dataset_analysis.py ${CONFIG} \
[-h] \
[--type ${TYPE}] \
[--class-name ${CLASS_NAME}] \
[--area-rule ${AREA_RULE}] \
[--func ${FUNC}] \
[--output-dir ${OUTPUT_DIR}]
```

例子:

1.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,其中默认设置,数据加载类型为 `train` ,面积规则区间为 `[0,32**2,96**2,1e5**2]` ,数据集中所有类和四个功能效果图全部生成并将图片保存到当前运行目录下 `./dataset_analysis` 文件夹中:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py
```

2.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,通过 `--type` 设置将数据加载类型由默认的 `train` 改为 `val`:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--type val
```

3.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,通过 `--class-name` 设置将生成所有类显示改为特定类,以显示 `person` 为例:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
Zheng-LinXiao marked this conversation as resolved.
Show resolved Hide resolved
--class-name person
```

4.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,通过 `--area-rule` 重新定义面积规则,以新增值为 `120` 为例,输入的命令为 `32 96 120` ,面积规则区间则变为 `[0, 32**2, 96**2,120**2, 1e5**2]` 。只能添加一个数字,若要自定义面积规则,输入的命令中需要包含 `32 96`:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--area-rule 32 96 120
```

5.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,通过 `--func` 设置,将显示四个功能效果图改为只显示 `功能一` 为例:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--func show_bbox_num
```

6.使用 `config` 文件 `configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py` 分析数据集,通过 `--output-dir` 设置修改图片保存地址,以 `work_ir/dataset_analysis` 地址为例:

```shell
python tools/analysis_tools/dataset_analysis.py configs/yolov5/yolov5_s-v61_syncbn_8xb16-300e_coco.py \
--output-dir work_dir/dataset_analysis
```

## 数据集转换

文件夹 `tools/data_converters/` 目前包含 `ballon2coco.py` 和 `yolo2coco.py` 两个数据集转换工具。
Expand Down
Empty file added height,
Empty file.
Loading