This code is work for general object detection problem. not for (oriented) text detection problem. I will probably update to handle oriented bounding box as soon as possible :)
[How to use]
- you need dataset.
- dataset structure is..
/train/0.jpg, /train/0.txt, /valid/0.jpg, /valid/0.txt, ....
- 0.txt contain position and label of objects like below
(xmin, ymin, xmax, ymax, label)
1273.0 935.0 1407.0 1017.0 v1
911.0 893.0 979.0 953.0 v1
984.0 889.0 1053.0 948.0 v1
- To encode label name to integer number, you should define labels in the 'class_lable_map.xlsx"
v1 1
v2 2
....
* start from 1. not from 0. 0 will be background (in the loss.py).
-
need some settings for dataset reader.
- see train.py. you can find some code for reading dataset
'trainset = ListDataset(root="../train", gt_extension=".txt", labelmap_path="class_label_map.xlsx", is_train=True, transform=transform, input_image_size=512, num_crops=n_crops, original_img_size=2048)'
- you should set the 'input_image_size' and 'original_img_size'. 'input_image_size' is size of (cropped) image for train. And 'original_img_size' is size of (original) image. I made this parameter to handle high resolution image. if you don't need crop function, -1 for num_crops.
-
Train with your dataset!
you should define some parameter like learning rate, which optimizer to use, size of batch etc.