Official code repository for the paper:
Pose for Everything: Towards Category-Agnostic Pose Estimation
[Lumin Xu*, Sheng Jin*, Wang Zeng, Wentao Liu, Chen Qian, Wanli Ouyang, Ping Luo, and Xiaogang Wang]
Existing works on 2D pose estimation mainly focus on a certain category, e.g. human, animal, and vehicle. However, there are lots of application scenarios that require detecting the poses/keypoints of the unseen class of objects. In this paper, we introduce the task of CategoryAgnostic Pose Estimation (CAPE), which aims to create a pose estimation model capable of detecting the pose of any class of object given only a few samples with keypoint definition. To achieve this goal, we formulate the pose estimation problem as a keypoint matching problem and design a novel CAPE framework, termed POse Matching Network (POMNet). A transformer-based Keypoint Interaction Module (KIM) is proposed to capture both the interactions among different keypoints and the relationship between the support and query images. We also introduce Multi-category Pose (MP-100) dataset, which is a 2D pose dataset of 100 object categories containing over 20K instances and is well-designed for developing CAPE algorithms. Experiments show that our method outperforms other baseline approaches by a large margin. Codes and data are available at https://github.com/luminxu/Pose-for-Everything.
- Install mmpose.
- run
python setup.py develop
.
You can follow the guideline of mmpose.
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
If you can run this code on a cluster managed with slurm, you can use the script slurm_train.sh
. (This script also supports single machine training.)
./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
Here is an example of using 16 GPUs to train POMNet on the dev partition in a slurm cluster.
(Use GPUS_PER_NODE=8
to specify a single slurm cluster node with 8 GPUs, CPUS_PER_TASK=2
to use 2 cpus per task.
Assume that Test
is a valid ${PARTITION} name.)
GPUS=16 GPUS_PER_NODE=8 CPUS_PER_TASK=2 ./tools/slurm_train.sh Test pomnet \
configs/mp100/pomnet/pomnet_mp100_split1_256x256_1shot.py \
work_dirs/pomnet_mp100_split1_256x256_1shot
- The dataset is only for non-commercial research purposes.
- All images of the MP-100 dataset are from existing datasets (COCO, 300W, AFLW, OneHand10K, DeepFashion2, AP-10K, MacaquePose, Vinegar Fly, Desert Locust, CUB-200, CarFusion, AnimalWeb, Keypoint-5), which are not our property. We are not responsible for the content nor the meaning of these images.
- We provide the annotations for training and testing. However, for legal reasons, we do not host the images. Please follow the guidance to prepare MP-100 dataset.
@article{xu2022pose,
title={Pose for Everything: Towards Category-Agnostic Pose Estimation},
author={Xu, Lumin and Jin, Sheng and Zeng, Wang and Liu, Wentao and Qian, Chen and Ouyang, Wanli and Luo, Ping and Wang, Xiaogang},
booktitle={European Conference on Computer Vision (ECCV)},
year={2022},
month={October}
}
Thanks to:
This project is released under the Apache 2.0 license.