ClickSEG is codebase for click-based interactive segmentation developped on RITM codebase.
Compared with the repo of RITM codebase, ClickSEG has following new features:
Conditional Diffusion for Interative Segmentation (ICCV2021) [Link]
FocalClick: Towards Practical Interactive Image Segmentation (CVPR2022) [Link]
RITM codebase uses albumentations to crop and resize image-mask pairs for training. In this way, the crop size are fixed, which is not suitable for training on a combined dataset with variant image size; Besides, the NEAREST INTERPOLATION adopt in albumentations causes the mask to have 1 pixel bias towards bottom-right, which is harmful for the boundary details, especially for the Refiner of FocalClick.
Therefore, we re-write the augmentation, which is crucial for the final performance.
We add efficient backbones like MobileNets and PPLCNet. We trained all our models on COCO+LVIS dataset for the standard configuration. At the same time, we train them on a combinatory large dataset and provide the trained weight to facilitate academic research and industrial applications. The combinatory large dataset include 8 dataset with high quality annotations and Diversified scenes: COCO1, LVIS2, ADE20K3, MSRA10K4, DUT5, YoutubeVOS6, ThinObject7, HFlicker8.
1. Microsoft coco: Common objects in context
2. Lvis: A dataset for large vocabulary instance segmentation
3. Scene Parsing through ADE20K Dataset
4. Salient object detection: A benchmark
5. Learning to detect salient objects with image-level supervision
6. YouTube-VOS: A Large-Scale Video Object Segmentation Benchmark
7. Deep Interactive Thin Object Selection
8. DoveNet: Deep Image Harmonization via Domain Verification
In the paper of FocalClick, we propose a new dataset of DAVIS-585 which provides initial masks for evaluation. The dataset could be download at ClickSEG GOOGLE DIRVIE. We also provide evaluation code in this codebase.
To use this codebase to train/val your own models, please follow the steps:
- Install the requirements by excuting
pip install -r requirements.txt
-
Prepare the dataset and pretrained backbone weights following: Data_Weight_Preparation.md
-
Train or validate the model following: Train_Val_Guidance.md
The trained model weights could be downloaded at ClickSEG GOOGLE DIRVIE
CONFIG
Input Size: 384 x 384
Previous Mask: No
Iterative Training: No
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
SBD | ResNet34 (89.72 MB) |
1.86/2.18 | 1.95/3.27 | 3.61/4.51 | 4.13/5.88 | 5.18/7.89 | 5.00/6.89 | 6.68/9.59 | 5.04/7.06 |
COCO+ LVIS |
ResNet34 (89.72 MB) |
1.40/1.52 | 1.47/2.06 | 2.74/3.30 | 2.51/3.88 | 4.30/7.04 | 4.27/5.56 | 4.86/7.37 | 4.21/5.92 |
CONFIG
S1 version: coarse segmentator input size 128x128; refiner input size 256x256.
S2 version: coarse segmentator input size 256x256; refiner input size 256x256.
Previous Mask: Yes
Iterative Training: Yes
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
COCO+ LVIS |
HRNet18s-S1 (16.58 MB) |
1.64/1.88 | 1.84/2.89 | 3.24/3.91 | 2.89/4.00 | 4.74/7.29 | 4.77/6.56 | 5.62/8.08 | 2.72/3.82 |
COCO+ LVIS |
HRNet18s-S2 (16.58 MB) |
1.48/1.62 | 1.60/2.23 | 2.93/3.46 | 2.61/3.59 | 4.43/6.79 | 3.90/5.23 | 4.87/6.87 | 2.47/3.30 |
COCO+ LVIS |
HRNet32-S2 (119.11 MB) |
1.64/1.80 | 1.70/2.36 | 2.80/3.35 | 2.62/3.65 | 4.24/6.61 | 4.01/5.39 | 4.77/6.84 | 2.32/3.09 |
Combined+ Dataset |
HRNet32-S2 (119.11 MB) |
1.30/1.34 | 1.49/1.85 | 2.84/3.38 | 2.80/3.85 | 4.35/6.61 | 3.19/4.81 | 4.80/6.63 | 2.37/3.26 |
COCO+ LVIS |
SegFormerB0-S1 (14.38 MB) |
1.60/1.86 | 2.05/3.29 | 3.54/4.22 | 3.08/4.21 | 4.98/7.60 | 5.13/7.42 | 6.21/9.06 | 2.63/3.69 |
COCO+ LVIS |
SegFormerB0-S2 (14.38 MB) |
1.40/1.66 | 1.59/2.27 | 2.97/3.52 | 2.65/3.59 | 4.56/6.86 | 4.04/5.49 | 5.01/7.22 | 2.21/3.08 |
COCO+ LVIS |
SegFormerB3-S2 (174.56 MB) |
1.44/1.50 | 1.55/1.92 | 2.46/2.88 | 2.32/3.12 | 3.53/5.59 | 3.61/4.90 | 4.06/5.89 | 2.00/2.76 |
Combined Datasets |
SegFormerB3-S2 (174.56 MB) |
1.22/1.26 | 1.35/1.48 | 2.54/2.96 | 2.51/3.33 | 3.70/5.84 | 2.92/4.52 | 3.98/5.75 | 1.98/2.72 |
Efficient Baselines using MobileNets and PPLCNets
CONFIG
Input Size: 384x384.
Previous Mask: Yes
Iterative Training: Yes
Train Dataset |
Model | GrabCut | Berkeley | Pascal VOC |
COCO MVal |
SBD | DAVIS | DAVIS585 from zero |
DAVIS585 from init |
---|---|---|---|---|---|---|---|---|---|
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
NoC 85/90% |
||
COCO+ LVIS |
MobileNetV2 (7.5 MB) |
1.82/2.02 | 1.95/2.69 | 2.97/3.61 | 2.74/3.73 | 4.44/6.75 | 3.65/5.81 | 5.25/7.28 | 2.15/3.04 |
COCO+ LVIS |
PPLCNet (11.92 MB) |
1.74/1.92 | 1.96/2.66 | 2.95/3.51 | 2.72/3.75 | 4.41/6.66 | 4.40/5.78 | 5.11/7.28 | 2.03/2.90 |
Combined Datasets |
MobileNetV2 (7.5 MB) |
1.50/1.62 | 1.62/2.25 | 3.00/3.61 | 2.80/3.96 | 4.66/7.05 | 3.59/5.24 | 5.05/7.12 | 2.06/2.97 |
Combined Datasets |
PPLCNet (11.92 MB) |
1.46/1.66 | 1.63/1.99 | 2.88/3.44 | 2.75/3.89 | 4.44/6.74 | 3.65/5.34 | 5.02/6.98 | 1.96/2.81 |
The code is released under the MIT License. It is a short, permissive software license. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software/source.
The core framework of this codebase follows: https://github.com/saic-vul/ritm_interactive_segmentation
Some code and pretrained weights are brought from:
https://github.com/Tramac/Lightweight-Segmentation
https://github.com/facebookresearch/video-nonlocal-net
https://github.com/visinf/1-stage-wseg
https://github.com/frotms/PP-LCNet-Pytorch
We thank those authors for their great works.
If you find this work is useful for your research, please cite our papers:
@inproceedings{cdnet,
title={Conditional Diffusion for Interactive Segmentation},
author={Chen, Xi and Zhao, Zhiyan and Yu, Feiwu and Zhang, Yilei and Duan, Manni},
booktitle={ICCV},
year={2021}
}
@article{focalclick,
title={FocalClick: Towards Practical Interactive Image Segmentation},
author={Chen, Xi and Zhao, Zhiyan and Zhang, Yilei and Duan, Manni and Qi, Donglian and Zhao, Hengshuang},
booktitle={CVPR},
year={2022}
}