This project provides a pipeline to perform fine-tuning of pre-trained models for object detection. Currently it is specified on detecting tables on document images.
This project is based on MMDetection v1.2 open-source framework for convinient neural detection models construction. It requires CUDA and GPUs with high performance. Colab Google service is used to create a cloud environment to avoid the case if there are no expensive GPUs on local machine.
CascadeTabNet project config file and pre-trained models are used as base models to perform fine-tuning on.
You must have a tab_net/ directory in Google Drive.
tab_net/ folder structure:
- pretrained_models/
- images/
- training/{pretrained_model}/{dataset_type}/
- libs/
- results/
Dataset consists from 3 types of documents: payment orders (opl), invoices (fact) and other documents (misc).
Place document images to the images/ project folder.
Pattern of document images filenames: inv-XXXX.jpg (inv-0000.jpg, etc.).
- Annotate tables on document images:
- Open Label_IMG Annotator from annotations/annotators/ project folder;
- Open Dir -> images/ project folder;
- Change Save Dir -> annotations/PASCAL_VOC_annotations/xml_all/ project folder;
- Use default label bordered;
- Draw rectangles around each table on your dataset.
- Annotate type of document images:
- Open VGG_Image Annotator from annotations/annotators/ project folder;
- File attributes: "img_type": ["opl", "fact", "misc"] (specify your own if necessary);
- Annotate only file_attributes according to the document type;
- Annotations -> Export Annotations (as json) to the annotations/ project folder as vgg_json.json.
MMDetection config models work with annotations in COCO format, so you have to convert annotations from сlause 2.1.
VOC_to_COCO script is based on D. Prasad script for convertation of Pascal VOC XML annotation files to a single COCO Json file. Refer to section 8. Training of CascadeTabNet project.
VOC_to_COCO script was modified to perform train/test/unseen split of your dataset apart from convertation. Model will validate only on test split, so unseen images can be used as evaluation on unkown data.
You can choose dataset_type to use only a part of you dataset: all document types, or only specifc document types. Given the provided dataset_type, model will train only on these document types. Provided dataset_type values:
- type_all - all document types (the whole dataset);
- type_opl_fact - only payment orders (opl) and invoices (fact);
- type_opl - only payment orders (opl).
- Choose dataset_type;
- Launch VOC_to_COCO.py script from Terminal:
python VOC_TO_COCO.py --dtype dataset_type
- Output files locations:
- train/test/unseen splits of XML annotations (PASCAL VOC) for each dataset_type will be located in annotations/PASCAL_VOC_Annotations/{dataset_type}/{test/train/unseen} project folder;
- train/test COCO annotations for each dataset_type will be located in annotations/COCO_Annotations/{dataset_type} project folder;
- train/test/unseen JSON lists of image_names will be located in the same folder as COCO annotations.
All training process is done with Colab service. Colab is strongly integrated with Google Drive, and processes with files on Google Dirve are handled quite effectively.
- Place large-sized libs (pytorch, etc.) to the libs/ Google Drive folder;
- Place pretrained models to the pretrained_models/ Google Drive folder;
- Place document images dataset into images/ Google Drive folder;
- Place the inners of annotations/COCO_annotations/ project folder (type_all, type_opl_fact, type_opl) into the training/{pretrained_model}/ Google Drive folder.
Training is done through the Colab notebook with Google GPUs.
- Launch tab_net_finetuning.ipynb as Colab notebook;
- First cell installs required environment. Restart the runtime after this step;
- Then launch Define desired parameters cell. Change pretrained_model, dataset_type or total_epoches if necessary. Also you have to find out the number of epoches, during which provided pretrained_model was trained/finetuned. Specify that number in pretr_model_epochs dictionary;
- Launch Train a model cell. Checkpoints for each epoch will be stored in training/{pretrained_model}/{dataset_type}/workdir/ Google Drive folder.
- Launch cells from Save predictions to JSON section;
- Model results and training log will be stored in results/ Google Drive folder as JSON file;
- Save model results and training log to results/ project folder.
This section is based on R. Padilla open-source project Object-Detection-Metrics.
- Object detections script processes only prepared data with special structure. Launch object_detection_prepare_data.py from Terminal:
python object_detection_prepare_data.py --pretr pretrained_model --dtype dataset_type
to prepare ground truths and detection results (from model results). For more detailed information about the required structure refer to the aforementioned work.- Prepared groundtruth files will be stored in object_detection_metrics/{pretrained_model}/groundtruths_test(train/unseen)/ project folder.
- Prepared detection files will be stored in object_detection_metrics/{pretrained_model}/detections/ project folder. They are divided into epoches folders and train/test/unseen splits.
- Calculate object detection metrics of model results for each epoch and split (train/test/unseen) and store it in results/metrics/{pretrained_model}/{dataset_type}/ project folder. Launch object_detection_get_results.py from Terminal:
python object_detection_get_results.py --pretr pretrained_model --dtype dataset_type
.- Precision x Recall curve is displayed as bordered(class_name)_{epoch}_{split}(train/test/unseen).png;
- Metric values are stored as results_{epoch}_{split}(train/test/unseen).txt.
Collect all metrics, logs and results to one single dataframe. It will be used in visualization of model metrics and logs for every pretrained_model, dataset_type, evaluation_split and epoch.
Launch dataframe_results.py. Resulting dataframe is stored in results/ project folder as df.pkl. Also the script constructs df_doctypes.pkl dataframe to store the distribution of document types across the dataset.
An interactive web-app in the dashboard format is constructed to visualize model results. It provides plots of AP Precision x Recall metrics, loss and accuracy and document_types distribution. To visualize results for each pretrained_model, dataset_type, evaluation_split and epoch, simply select desired parameters on the dashboard.
Launch dash_app.py to interact with dashboard.