Official implementation of the IEEE/CVF CVPR 2022 paper
Watch It Move: Unsupervised Discovery of 3D Joints for Re-Posing of Articulated Objects
Atsuhiro Noguchi, Umar Iqbal, Jonathan Tremblay, Tatsuya Harada, Orazio Gallo
Project page / Paper
/ Video
Abstract: Rendering articulated objects while controlling their poses is critical to applications such as virtual reality or animation for movies. Manipulating the pose of an object, however, requires the understanding of its underlying structure, that is, its joints and how they interact with each other. Unfortunately, assuming the structure to be known, as existing methods do, precludes the ability to work on new object categories. We propose to learn both the appearance and the structure of previously unseen articulated objects by observing them move from multiple views, with no joints annotation supervision, or information about the structure. We observe that 3D points that are static relative to one another should belong to the same part, and that adjacent parts that move relative to each other must be connected by a joint. To leverage this insight, we model the object parts in 3D as ellipsoids, which allows us to identify joints. We combine this explicit representation with an implicit one that compensates for the approximation introduced. We show that our method works for different structures, from quadrupeds, to single-arm robots, to humans.
- Setup
- Steps to replicate the teaser video for spot
- Steps to train for spot
- The WIM dataset
- Dataset Preprocessing
- Training
- Pretrained Models
- Demo
- Evaluation (ZJU only)
- Visualization
- Citation
Clone this repository and create the environment.
git clone --recursive [email protected]:NVlabs/watch-it-move.git
cd watch-it-move
bash install.sh
# To run the training and rendering examples below, download the data for Spot
mkdir -p data/robots/spot
gdown https://drive.google.com/u/1/uc\?id\=1HNzCa8olJgedpKe6jBCIi-_LffLX9f8R\&export\=download -O data/robots/spot/cache.pickle
We have only tested the following code on NVIDIA A100 GPUs.
# download pretrained model for spot
mkdir -p data/output/result/spot_merge
gdown https://drive.google.com/u/1/uc\?id\=12_K-x-daAGqvIoDd3tRvRIe0LKLOALyC\&export\=download -O data/output/result/spot_merge/snapshot_latest.pth
cd <project_root>/src
# save the reconstruction video
python visualize/create_reconstruction_video.py --exp_name spot_merge
# save re-pose video
python visualize/create_repose_video.py --exp_name spot_merge --repose_config visualize/repose_configs/spot.yml --rotate
Videos in mp4 format and the png image for each frame will be saved to <project_root>/data/output/result/spot_merge/
cd <project_root>/src
CUDA_VISIBLE_DEVICES=[gpu_id] python train_single_video.py --config confs/spot.yml --default_config confs/default.yml
CUDA_VISIBLE_DEVICES=[gpu_id] python train_single_video.py --config confs/spot_merge.yml --default_config confs/default.yml
We provide multiview videos for seven different moving robots here (see LICENSE.md for terms of use). We provide both raw video data and preprocessed data. Please follow the instructions bellow to download and preprocess the data. It includes: 1000 frame videos of moving robots from 20 different viewpoints and preprocessed data of 300 frames of 5 chosen viewpoints.
- The WIM dataset is available here.
- We provide preprocessed data in the directory named preprocessed. Download, uncompress, and place it in
<project_root>data/robots/<name_of_robot>/cache.pickle
- If you run with pre-processing on your own, download tar.gz files from here, uncompress them, place them as
and run
<project_root>data/robots/<name_of_robot>/cam_<camera_id>.json <project_root>data/robots/<name_of_robot>/frame_<frame_id>_cam_<camera_id>.png
cd <project_root>/data_preprocess/robot python preprocess.py --data_root ../../data/robots --robot_name atlas --robot_name baxter --robot_name spot --robot_name cassie --robot_name iiwa --robot_name nao --robot_name pandas
-
Download the COCO instance segmentation model named R_101_dcni3_5x from Adet and copy it to
data_preprocess/zju/R_101_dcni3_5x.pth
. -
Download the ZJU MOCAP LightStage dataset and copy it in
<project_root>/data/ └── zju_mocap ├── 366 ├── 377 ├── 381 ├── 384 └── 387
-
Download the SMPL models following EasyMocap installation. You only need to download smplx models.
<project_root>/data └── smplx ├── J_regressor_body25.npy ├── J_regressor_body25_smplh.txt ├── J_regressor_body25_smplx.txt ├── J_regressor_mano_LEFT.txt ├── J_regressor_mano_RIGHT.txt └── smplx ├── SMPLX_FEMALE.pkl ├── SMPLX_MALE.pkl └── SMPLX_NEUTRAL.pkl
-
Run
cd <project_root>/data_preprocess/zju python preprocess.py --smpl_model_path ../../data/smplx --zju_path ../../data/zju_mocap --person_id 366 --person_id 377 --person_id 381 --person_id 384 --person_id 387
- Requirement (installed by
install.sh
): mask2former - Download
Mask2Former (200 queries)
model from https://github.com/facebookresearch/Mask2Former/blob/main/MODEL_ZOO.md#instance-segmentation and copy it to<project_root>/data_preprocess/rgbd_dog/model_final_e5f453.pth
. - Download RGBD-Dog dataset as
We used
<project_root>/data/rgbd_dog └── dog1 └── motion_testSeq ├── kinect_depth ├── kinect_rgb ├── motion_capture └── sony
motion_testSeq
for training. - Run
cd <project_root>/data_preprocess/rgbd_dog python preprocess.py --data_root ../../data/rgbd_dog/dog1/motion_testSeq
Run the following commands to train the model. Please specify the experiment name in [exp_name]
cd <project_root>/src
CUDA_VISIBLE_DEVICES=[gpu_id] python train_single_video.py --config confs/[exp_name].yml --default_config confs/default.yml
Pretrained models for ZJU mocap dataset, robot dataset, and dog dataset are available here.
The name of each directory corresponds to the name of a config fine under src/confs
.
Please download and place these directories in data/output/result
.
<project_root>/data/output/result
├── atlas
│ └── snapshot_latest.pth
├── baxter
...
Visualization code is available in <project_root>/src/visualize/demo_notebook.ipynb
Calculate lpips and ssim between generated and ground truth images.
cd <project_root>/src
python validation/reconsuruction.py --exp_name zju366 --exp_name zju377
python validation/lpips_ssim.py --exp_name zju366 --exp_name zju377
Results will be saved to [output_dir]/result/[exp_name]/validation
Calculate MPJPE (mm) between ground truth and regressed joint locations.
cd <project_root>/src
python validation/SMPL_regression.py --exp_name zju366 --exp_name zju377
Results will be saved to <project_root>/data/output/result/[exp_name]/reconstruction_...
.
cd <project_root>/src
python visualize/create_reconstruction_video.py --exp_name zju366 --exp_name zju377
Results will be saved to <project_root>/data/output/result/[exp_name]/repose_...
.
cd <project_root>/src
python visualize/create_repose_video.py --exp_name spot --repose_config visualize/repose_configs/spot.yml --rotate
repose_config (e.g., <project_root>/src/visualize/repose_configs/spot.yml
) includes the following parameters:
camera_id: camera id of the reference frame
frame_id: frame id of the reference frame
root: part id of the root.
first: part id and its rotation in rodrigues form for the first quarter of the video.
second: part id and its rotation in rodrigues form for the next quarter of the video.
root
, first
, and second
vary depending on the training results, even when trained on the same data.
For your pretrained models, please follow <project_root>/src/visualize/demo_notebook.ipynb
to adapt them.
Images of merged structure will be saved to <project_root>/data/output/result/[exp_name]/merge
.
cd <project_root>/src
python visualize/part_merging.py --exp_name spot --camera_id 0
Re-posing by test frames.
cd <project_root>/src
python visualize/repose_person_by_driving_pose.py --exp_name zju366 --camera_id 0 --num_video_frames 50
Results will be saved to <project_root>/data/output/result/[exp_name]/drive_..
.
@inproceedings{noguchi2022watch,
title = {Watch It Move: {U}nsupervised Discovery of {3D} Joints for Re-Posing of Articulated Objects},
author = {Atsuhiro Noguchi and Umar Iqbal and Jonathan Tremblay and Tatsuya Harada and Orazio Gallo},
journal = {CVPR},
year = {2022},
}