Junyi Li1* · Junfeng Wu1* · Weizhi Zhao1 · Song Bai2 · Xiang Bai1†
1Huazhong University of Science and Technology 2Bytedance Inc.
*Equal Contribution †Corresponding Author
- PartGLEE is accepted by ECCV2024!
- PartGLEE is a part-level foundation model for locating and identifying both objects and parts in images.
- PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario.
- PartGLEE achieves SOTA performance across various part-level tasks and obtain competitive results on traditional object-level tasks.
We will release the following contents for PartGLEE:
-
Demo Code
-
[√] Model Zoo
-
[√] Comprehensive User Guide
-
[√] Training Code and Scripts
-
[√] Evaluation Code and Scripts
- Installation: Please refer to INSTALL.md for more details.
- Data preparation: Please refer to DATA.md for more details.
- Training: Please refer to TRAIN.md for more details.
- Testing: Please refer to TEST.md for more details.
- Model zoo: Please refer to MODEL_ZOO.md for more details.
We present PartGLEE, a part-level foundation model for locating and identifying both objects and parts in images. Through a unified framework, PartGLEE accomplishes detection, segmentation, and grounding of instances at any granularity in the open world scenario. Specifically, we propose a Q-Former to construct the hierarchical relationship between objects and parts, parsing every object into corresponding semantic parts.
PartGLEE is comprised of an image encoder, a Q-Former, two independent decoders and a text encoder. We propose a Q-Former to establish the hierarchical relationship between objects and parts. A set of parsing queries are initialized in the Q-Former to interact with each object query, parsing objects into their corresponding parts. This Q-Former functions as a decomposer, extracting and representing parts from object queries. Hence, by training jointly on extensive object-level datasets and limited hierarchical datasets which contain object-part correspondences, our Q-Former obtains strong generalization ability to parse any novel object into its consitute parts.
To facilitate training our Q-Former, we augment the original part-level datasets with object-level annotations to establish hierarchical correspondences. Specifically, we add object-level annotations to Pascal Part, PartImageNet, Pascal-Part-116, ADE-Part-234, in order to establish the hierarchical correspondence between objects and parts. We further introduce a subset of the open-world instance segmentation dataset SA-1B and augment it into a hierarchical dataset, thus further improving the generalization capability of our model.
@article{li2024partglee,
title={PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects},
author={Li, Junyi and Wu, Junfeng and Zhao, Weizhi and Bai, Song and Bai, Xiang},
journal={arXiv preprint arXiv:2407.16696},
year={2024}
}