This repository contains code, data and a benchmark leaderboard from the paper Benchmarking Unsupervised Object Representations for Video Sequences by M.A. Weis, K. Chitta, Y. Sharma, W. Brendel, M. Bethge, A. Geiger and A.S. Ecker (2021).
Code for training OP3, TBA and SCALOR was adapted using: OP3 codebase, TBA codebase and SCALOR codebase.
Table of Contents
python3 setup.py install
Download data from OSF to ocrb/data/datasets
.
Available datasets:
- Video Multi-dSprites (VMDS)
- Sprites-MOT (SpMOT)
- Video Object Room (VOR)
- Textured Video Multi-dSprites (texVMDS)
Extract data from hdf5 files:
python3 ocrb/data/extract_data.py --path='ocrb/data/datasets/' --dataset='vmds'
To run ViMON training:
python3 ocrb/vimon/main.py --config='ocrb/vimon/config.json'
where hyperparameters are specified in the config file.
To run OP3 training:
python3 ocrb/op3/main.py --va vmds
where the --va flag can be toggled between vmds, vor, and spmot. Hyperparameters for each can be found in the corresponding file. For details see the original OP3 repository.
For TBA training, the input datasets need to be pre-processed into batches, for which we provide a function:
python3 ocrb/tba/data/create_batches.py --batch_size=64 --dataset='vmds' --mode='train'
To run training:
python3 ocrb/tba/run.py --task vmds
the --task flag can be toggled between vmds, spmot and vor. For details regarding other training flags see the original TBA repository.
To generate annotation file with mask and object id predictions per frame for each video in the test set, run:
python3 ocrb/vimon/generate_pred_json.py --config='ocrb/vimon/config.json' --ckpt_file='ocrb/vimon/ckpts/pretrained/ckpt_vimon_vmds.pt' --out_path='ocrb/vimon/ckpts/pretrained/vmds_pred_list.json'
where hyperparameters including dataset are specified in ocrb/vimon/config.json file and --ckpt_file gives the path to the trained model weights.
To generate annotation file with mask and object id predictions per frame for each video in the test set, run:
python3 ocrb/op3/generate_pred_json.py --va vmds --ckpt_file='ocrb/op3/ckpts/vmds_params.pkl' --out_path='ocrb/op3/ckpts/vmds_pred_list.json'
where hyperparameters can be found in the corresponding file and --ckpt_file gives the path to the trained model weights. For details see the original OP3 repository.
To generate annotation file for TBA, run:
python3 ocrb/tba/run.py --task vmds --metric 1 --v 2 --init_model sp_latest.pt
The annotation file is generated in the folder ocrb/tba/pic. For details regarding other evaluation flags see the original TBA repository.
To compute MOT metrics, run:
python3 ocrb/eval/eval_mot.py --gt_file='ocrb/data/gt_jsons/vmds_test.json' --pred_file='ocrb/vimon/ckpts/pretrained/vmds_pred_list.json' --results_path='ocrb/vimon/ckpts/pretrained/vmds_results.json' --exclude_bg
where --gt_file specifies the path to the ground truth annotation file, --pred_file specifies the path to the annotation file containing the model predictions and -results_path gives the path where to save the result dictionary. Set --exclude_bg to exclude background segmentations masks from the evaluation.
Analysis of SOTA object-centric representation learning models for MOT. Results shown as mean ± standard deviation of three runs with different random training seeds. Models ranked according to MOTA for each dataset. If you want to add your own method and results on any of the three datasets, please open a pull request where you add the results in the tables below.
Rank | Model | Reference | MOTA ↑ | MOTP ↑ | MD ↑ | MT ↑ | Match ↑ | Miss ↓ | ID S. ↓ | FPs ↓ | MSE ↓ |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SCALOR | Jiang et al. 2020 | 94.9 ± 0.5 | 80.2 ± 0.1 | 96.4 ± 0.1 | 93.2 ± 0.7 | 95.9 ± 0.4 | 2.4 ± 0.0 | 1.7 ± 0.4 | 1.0 ± 0.1 | 3.4 ± 0.1 |
2 | ViMON | Weis et al. 2020 | 92.9 ± 0.2 | 91.8 ± 0.2 | 87.7 ± 0.8 | 87.2 ± 0.8 | 95.0 ± 0.2 | 4.8 ± 0.2 | 0.2 ± 0.0 | 2.1 ± 0.1 | 11.1 ± 0.6 |
3 | OP3 | Veerapaneni et al. 2019 | 89.1 ± 5.1 | 78.4 ± 2.4 | 92.4 ± 4.0 | 91.8 ± 3.8 | 95.9 ± 2.2 | 3.7 ± 2.2 | 0.4 ± 0.0 | 6.8 ± 2.9 | 13.3 ± 11.9 |
4 | TBA | He et al. 2019 | 79.7 ± 15.0 | 71.2 ± 0.3 | 83.4 ± 9.7 | 80.0 ± 13.6 | 87.8 ± 9.0 | 9.6 ± 6.0 | 2.6 ± 3.0 | 8.1 ± 6.0 | 11.9 ± 1.9 |
5 | MONet | Burgess et al. 2019 | 70.2 ± 0.8 | 89.6 ± 1.0 | 92.4 ± 0.6 | 50.4 ± 2.4 | 75.3 ± 1.3 | 4.4 ± 0.4 | 20.3 ± 1.6 | 5.1 ± 0.5 | 13.0 ± 2.0 |
Rank | Model | Reference | MOTA ↑ | MOTP ↑ | MD ↑ | MT ↑ | Match ↑ | Miss ↓ | ID S. ↓ | FPs ↓ | MSE ↓ |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | OP3 | Veerapaneni et al. 2019 | 91.7 ± 1.7 | 93.6 ± 0.4 | 96.8 ± 0.5 | 96.3 ± 0.4 | 97.8 ± 0.1 | 2.0 ± 0.1 | 0.2 ± 0.0 | 6.1 ± 1.5 | 4.3 ± 0.2 |
2 | ViMON | Weis et al. 2020 | 86.8 ± 0.3 | 86.8 ± 0.0 | 86.2 ± 0.3 | 85.0 ± 0.3 | 92.3 ± 0.2 | 7.0 ± 0.2 | 0.7 ± 0.0 | 5.5 ± 0.1 | 10.7 ± 0.1 |
3 | SCALOR | Jiang et al. 2020 | 74.1 ± 1.2 | 87.6 ± 0.4 | 67.9 ± 1.1 | 66.7 ± 1.1 | 78.4 ± 1.0 | 20.7 ± 1.0 | 0.8 ± 0.0 | 4.4 ± 0.4 | 14.0 ± 0.1 |
4 | TBA | He et al. 2019 | 54.5 ± 12.1 | 75.0 ± 0.9 | 62.9 ± 5.9 | 58.3 ± 6.1 | 75.9 ± 4.3 | 21.0 ± 4.2 | 3.2 ± 0.3 | 21.4 ± 7.8 | 28.1 ± 2.0 |
5 | MONet | Burgess et al. 2019 | 49.4 ± 3.6 | 78.6 ± 1.8 | 74.2 ± 1.7 | 35.7 ± 0.8 | 66.7 ± 0.7 | 13.6 ± 1.0 | 19.7 ± 0.6 | 17.2 ± 3.1 | 22.2 ± 2.2 |
Rank | Model | Reference | MOTA ↑ | MOTP ↑ | MD ↑ | MT ↑ | Match ↑ | Miss ↓ | ID S. ↓ | FPs ↓ | MSE ↓ |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | ViMON | Weis et al. 2020 | 89.0 ± 0.0 | 89.5 ± 0.5 | 90.4 ± 0.5 | 90.0 ± 0.4 | 93.2 ± 0.4 | 6.5 ± 0.4 | 0.3 ± 0.0 | 4.2 ± 0.4 | 6.4 ± 0.6 |
2 | SCALOR | Jiang et al. 2020 | 74.6 ± 0.4 | 86.0 ± 0.2 | 76.0 ± 0.4 | 75.9 ± 0.4 | 77.9 ± 0.4 | 22.1 ± 0.4 | 0.0 ± 0.0 | 3.3 ± 0.2 | 6.4 ± 0.1 |
3 | OP3 | Veerapaneni et al. 2019 | 65.4 ± 0.6 | 89.0 ± 0.6 | 88.0 ± 0.6 | 85.4 ± 0.5 | 90.7 ± 0.3 | 8.2 ± 0.4 | 1.1 ± 0.2 | 25.3 ± 0.6 | 3.0 ± 0.1 |
4 | MONet | Burgess et al. 2019 | 37.0 ± 6.8 | 81.7 ± 0.5 | 76.9 ± 2.2 | 37.3 ± 7.8 | 64.4 ± 5.0 | 15.8 ± 1.6 | 19.8 ± 3.5 | 27.4 ± 2.3 | 12.2 ± 1.4 |
Rank | Model | Reference | MOTA ↑ | MOTP ↑ | MD ↑ | MT ↑ | Match ↑ | Miss ↓ | ID S. ↓ | FPs ↓ | MSE ↓ |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | MONet | Burgess et al. 2019 | -73.3 ± 5.5 | 67.7 ± 1.1 | 16.0 ± 3.4 | 12.3 ± 3.1 | 24.7 ± 4.7 | 73.1 ± 5.1 | 2.2 ± 0.8 | 98.0 ± 1.7 | 200.5 ± 5.7 |
2 | ViMON | Weis et al. 2020 | -85.5 ± 2.8 | 69.0 ± 0.6 | 24.2 ± 1.3 | 23.8 ± 1.4 | 34.7 ± 1.7 | 65.0 ± 1.7 | 0.3 ± 0.0 | 120.2 ± 2.5 | 171.4 ± 3.3 |
3 | SCALOR | Jiang et al. 2020 | -99.2 ± 11.7 | 74.0 ± 0.5 | 6.5 ± 0.6 | 6.3 ± 0.6 | 12.3 ± 0.4 | 87.5 ± 0.4 | 0.2 ± 0.0 | 111.5 ± 11.4 | 133.7 ± 11.1 |
4 | OP3 | Veerapaneni et al. 2019 | -110.4 ± 4.3 | 70.6 ± 0.6 | 16.5 ± 5.1 | 16.2 ± 5.0 | 22.9 ± 6.6 | 76.9 ± 6.7 | 0.2 ± 0.1 | 133.4 ± 2.9 | 132.8 ± 16.2 |
If you use this repository in your research, please cite:
@article{Weis2021,
author = {Marissa A. Weis and Kashyap Chitta and Yash Sharma and Wieland Brendel and Matthias Bethge and Andreas Geiger and Alexander S. Ecker},
title = {Benchmarking Unsupervised Object Representations for Video Sequences},
journal = {Journal of Machine Learning Research},
year = {2021},
volume = {22},
number = {183},
pages = {1-61},
url = {http://jmlr.org/papers/v22/21-0199.html}
}