-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhance] Mask2Former Instance Segm Only #7571
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #7571 +/- ##
==========================================
+ Coverage 64.51% 64.85% +0.34%
==========================================
Files 360 351 -9
Lines 29233 28491 -742
Branches 4954 4817 -137
==========================================
- Hits 18859 18478 -381
+ Misses 9370 9038 -332
+ Partials 1004 975 -29
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
configs/mask2former/mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco_ins.py
Outdated
Show resolved
Hide resolved
Do you have the computing resource to train mask2former for instance segmentation? |
Only on a 4x GPU machine with limited memory (batch size 1). A Mask2Former Swin-T model was trained for 50e which achieved: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.432 The data pipeline included extra data augmentation and was missing: The Original Facebook implementation achieved AP 45.0. |
okay, I will help to train these two models in the next one or two weeks. |
please fix lint problems. |
I ran mask2former_r50 for two times, resutls are 43.1 and 43.2 (target 43.7), and ran mask2former_swin_tiny for once, result is 44.7 (target 45.0). I doubt that there may be difference between our code and the original code, such as data loading and processing for training. We should check them. |
Found a small difference in our filter empty annotations code:
The original implementation uses 1e-5 for min height and width. 1e-5 is the default argument in the function from Detection2. Filter_empty_instances is called by the Mask2Former code here without changing the default arguments. Sorry I missed that. I'll build the configs with the training commands for both Facebook Mask2Former and this implementation to look for other differences. |
They left a comment suggesting the filter_empty_instances call happen after augmentation. I think the augmentations can create empty instances, especially considering how extreme the ratio_range=(0.1, 2.0) resize is. Our FilterAnnotations is placed after the resize/random crop like theirs. Detectron2's filter_empty_instances function also filters empty masks. MMDet's FilterAnnotations does not consider empty masks when filtering. |
I think 1e-2 and 1e-5 are almost same for filtering bbox in a image of size 300~600. |
In mmdet, mask is also filtered out, see
In det2, they filter empty mask which has no true values. A mask has no true value if its bbox with width or height less than 1e-2. So, I think these two filter ops are almost same. |
Added a mask area test. Det2 includes instances if they meet either the bounding box size test or mask area test.
That makes sense. With image_size=(1024, 1024) and ratio_range=(.1, 2) the images can have a side as small as 103 before padding. The Det2 inclusion of masks with positive area is more permissive, but only for instances with a mask with positive area of 1 and a bbox below the 1e-5 threshold. I am wondering if Det2 included the extra mask test because something unexpected happens with the mask interpolation at small image scales and there are instances which meet the mask threshold, but not the bbox threshold. |
mmdet/datasets/pipelines/loading.py
Outdated
keep += (w > self.min_gt_bbox_wh[0]) & (h > self.min_gt_bbox_wh[1]) | ||
if self.by_mask: | ||
gt_masks = results['gt_masks'] | ||
keep += gt_masks.areas >= self.min_gt_mask_area |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this line https://github.com/facebookresearch/Mask2Former/blob/c233619f7ea011cd565174a1b211bde1f43e38db/mask2former/data/dataset_mappers/coco_instance_new_baseline_dataset_mapper.py#L177 , the empty mask with bbox (0,0,0,0). If there are one true value in mask, the width and height will be one instead of value less than one. So the 1e-2 and 1e-5 have same effect on filtering empty instance. And filtering by bbox, filtering by mask, even by both, these three ways are equivalent. After that line, the width and height of bbox are all intergers, not floats like 10.2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. No possibility of one true value in the mask the box is below the threshold since both implementations recompute boxes using the instance masks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran the lastest version, 43.2 for mask2former_r50.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to use keep = keep & (gt_masks.areas >= self.min_gt_mask_area)
to align with det2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask_former_instance_dataset_mapper.py uses 128 as the image pad value.
The Coco instance seg configs point at coco_instance_new_baseline_dataset_mapper.py instead. They create a padding mask, but I need to verify the actual value used to pad the image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked the padding values in the images at the resnet backbone's forward method for each implementation. Just looked for repeated values.
print(img[0, :, 1023, 1023])
Det2: [0.0741, 0.2052, 0.4265]
Mmdet: [0., 0., 0.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a config value to set the pad values to (128, 128, 128) like Det2.
Reversed the Norm > Pad order to Pad > Norm. Getting the same padding values: [0.0741, 0.2052, 0.4265]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mask_former_instance_dataset_mapper.py uses 128 as the image pad value.
The Coco instance seg configs point at coco_instance_new_baseline_dataset_mapper.py instead. They create a padding mask, but I need to verify the actual value used to pad the image.
Mask2Former actually used COCOInstanceNewBaselineDatasetMapper, not MaskFormerInstanceDatasetMapper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I first noticed the difference looking at MaskFormerInstanceDatasetMapper, but compared the implementation to Mmdet using COCOInstanceNewBaselineDatesetMapper.
There was a difference in padding between our implementation and theirs.
mmdet/datasets/pipelines/loading.py
Outdated
for key in keys: | ||
if key in results: | ||
results[key] = results[key][keep] | ||
if not tests[0].any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests[0].any()
-> keep.any()
Please resolve the conflict. And I will run the lastest version in the next one or two days. |
one good news and one bad news. mask2former_swin_tiny reach the target (45.0 mask AP), mask2former_r50 got 43.0 mask AP, filaed to reach the target (43.7 mask AP). And I think doing LR decay in a little early maybe helpful. Besides, could you please add other config files like (r101, swin-s), so we can verify other model's performance. |
Interesting swin-t achieved the target, but resnet50 did a bit worse than previous tests. It appears possible the different scores are due to noise or due to the padding value alignment. Since resnet50 is not hitting the target I looked at the weights used, architecture and frozen batch normalization settings. They use Torchvision resnet50 weights, converted to Det2 naming scheme. They link to resnet50-19c8e357.pth and the current Torchvision version installed with Torch 1.10 gives a file called resnet50-0676ba61.pth. Confirmed the keys and weights are identical in both versions. I couldn't find any architecture differences. I'm still looking at how they freeze batch norm, but it appears identical to Mmdet. I see additional configs for panoptic in the latest dev branch. I'll replicate those for instance seg tomorrow. Training another swin and a resnet might isolate the issue to something resnet related if that is the case. |
@PeterVennerstrom unit test failed for mask2former instance segmentstion, because we removed the confile file for instance segmentation, please updated it. And sorry about that. |
…_50e_coco-panoptic.py
…_r101_lsj_8x2_50e_coco-panoptic.py
…o.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py
…o mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py
…oco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py
… mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py
… mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py
* Mask2Former/MaskFormer instance only training/eval * obsolete config names * if cond is None fix * white space * fix tests * yapf formatting fix * semantic_seg None docstring * original config names * pan/ins unit test * show_result comment * pan/ins head unit test * redundant test * inherit configs * correct gpu # * revert version * BaseDetector.show_result comment * revert more versions * clarify comment * clarify comment * add FilterAnnotations to data pipeline * more complete Returns docstring * use pytest.mark.parametrize decorator * fix docstring formatting * lint * Include instances passing mask area test * Make FilterAnnotations generic for masks or bboxes * Duplicate assertion * Add pad config * Less hard coded padding setting * Clarify test arguments * Additional inst_seg configs * delete configs * Include original dev branch configs * Fix indent * fix lint error from merge conflict * Update .pre-commit-config.yaml * Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former_r101_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py * Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Create mask2former_r50_lsj_8x2_50e_coco.py * Create mask2former_r101_lsj_8x2_50e_coco.py * Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py * Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py * Update test_forward.py * remove gt_sem_seg Co-authored-by: Cedric Luo <[email protected]>
* Mask2Former/MaskFormer instance only training/eval * obsolete config names * if cond is None fix * white space * fix tests * yapf formatting fix * semantic_seg None docstring * original config names * pan/ins unit test * show_result comment * pan/ins head unit test * redundant test * inherit configs * correct gpu # * revert version * BaseDetector.show_result comment * revert more versions * clarify comment * clarify comment * add FilterAnnotations to data pipeline * more complete Returns docstring * use pytest.mark.parametrize decorator * fix docstring formatting * lint * Include instances passing mask area test * Make FilterAnnotations generic for masks or bboxes * Duplicate assertion * Add pad config * Less hard coded padding setting * Clarify test arguments * Additional inst_seg configs * delete configs * Include original dev branch configs * Fix indent * fix lint error from merge conflict * Update .pre-commit-config.yaml * Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former_r101_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py * Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Create mask2former_r50_lsj_8x2_50e_coco.py * Create mask2former_r101_lsj_8x2_50e_coco.py * Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py * Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py * Update test_forward.py * remove gt_sem_seg Co-authored-by: Cedric Luo <[email protected]>
* Mask2Former/MaskFormer instance only training/eval * obsolete config names * if cond is None fix * white space * fix tests * yapf formatting fix * semantic_seg None docstring * original config names * pan/ins unit test * show_result comment * pan/ins head unit test * redundant test * inherit configs * correct gpu # * revert version * BaseDetector.show_result comment * revert more versions * clarify comment * clarify comment * add FilterAnnotations to data pipeline * more complete Returns docstring * use pytest.mark.parametrize decorator * fix docstring formatting * lint * Include instances passing mask area test * Make FilterAnnotations generic for masks or bboxes * Duplicate assertion * Add pad config * Less hard coded padding setting * Clarify test arguments * Additional inst_seg configs * delete configs * Include original dev branch configs * Fix indent * fix lint error from merge conflict * Update .pre-commit-config.yaml * Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former_r101_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py * Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Create mask2former_r50_lsj_8x2_50e_coco.py * Create mask2former_r101_lsj_8x2_50e_coco.py * Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py * Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py * Update test_forward.py * remove gt_sem_seg Co-authored-by: Cedric Luo <[email protected]>
* Mask2Former/MaskFormer instance only training/eval * obsolete config names * if cond is None fix * white space * fix tests * yapf formatting fix * semantic_seg None docstring * original config names * pan/ins unit test * show_result comment * pan/ins head unit test * redundant test * inherit configs * correct gpu # * revert version * BaseDetector.show_result comment * revert more versions * clarify comment * clarify comment * add FilterAnnotations to data pipeline * more complete Returns docstring * use pytest.mark.parametrize decorator * fix docstring formatting * lint * Include instances passing mask area test * Make FilterAnnotations generic for masks or bboxes * Duplicate assertion * Add pad config * Less hard coded padding setting * Clarify test arguments * Additional inst_seg configs * delete configs * Include original dev branch configs * Fix indent * fix lint error from merge conflict * Update .pre-commit-config.yaml * Rename mask2former_r50_lsj_8x2_50e_coco.py to mask2former_r50_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_r101_lsj_8x2_50e_coco.py to mask2former_r101_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384-in21k_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco.py to mask2former_swin-b-p4-w12-384_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco.py to mask2former_swin-l-p4-w12-384-in21k_lsj_16x1_100e_coco-panoptic.py * Update and rename mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Update and rename mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py to mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco-panoptic.py * Create mask2former_r50_lsj_8x2_50e_coco.py * Create mask2former_r101_lsj_8x2_50e_coco.py * Create mask2former_swin-s-p4-w7-224_lsj_8x2_50e_coco.py * Create mask2former_swin-t-p4-w7-224_lsj_8x2_50e_coco.py * Update test_forward.py * remove gt_sem_seg Co-authored-by: Cedric Luo <[email protected]>
Motivation
MaskFormer and Mask2Former currently support panoptic segmentation, but not instance segmentation only. Minor changes enable training and evaluation using base/datasets/coco_instance.py.
Modification
detectors/maskformer.py:
dense_heads/maskformer_head.py:
models/utils/panoptic_gt_processing.py:
configs/mask2former/: