- Linux or macOS with Python ≥ 3.7
- PyTorch ≥ 1.8 and torchvision that matches the PyTorch installation. Install them together at pytorch.org to make sure of this
- OpenCV is optional but needed by demo and visualization
This session is to support train and validation of the visual query detector
-
Clone our repository from here.
git clone https://github.com/facebookresearch/vq2d_cvpr.git cd vq2d_cvpr export VQ2D_ROOT=$PWD
-
Create conda environment.
conda create -n vq2d python=3.8
-
Install pytorch using conda. We rely on cuda-10.2 and cudnn-7.6.5.32 for our experiments.
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 cudatoolkit=10.2 -c pytorch
-
Install additional requirements using
pip
.pip install -r requirements.txt
-
Install detectron2.
python -m pip install detectron2 -f \ https://dl.fbaipublicfiles.com/detectron2/wheels/cu111/torch1.8/index.html
-
Install submitit for muliple node training.
pip install submitit
You will need to following steps to apply VQ2D evaluation. These are not required for the visual query detection task.
-
Install pytracking according to these instructions. Download the pre-trained KYS tracker weights to
$VQ2D_ROOT/pretrained_models/kys.pth
.cd $VQ2D_ROOT/dependencies git clone [email protected]:visionml/pytracking.git git checkout de9cb9bb4f8cad98604fe4b51383a1e66f1c45c0
-
For installing the spatial-correlation-sampler dependency for pytracking, follow these steps if the pip install fails.
cd $VQ2D_ROOT/dependencies git clone [email protected]:ClementPinard/Pytorch-Correlation-extension.git cd Pytorch-Correlation-extension python setup.py install
-
Download the annotations and videos as instructed here to
$VQ2D_ROOT/data
.ego4d --output_directory="$VQ2D_ROOT/data" --datasets full_scale annotations # Define ego4d videos directory export EGO4D_VIDEOS_DIR=$VQ2D_ROOT/data/v1/full_scale # Move out vq annotations to $VQ2D_ROOT/data mv $VQ2D_ROOT/data/v1/annotations/vq_*.json $VQ2D_ROOT/data
-
Use the updated v1.0.5 data:
# Download the data using the Ego4D CLI. ego4d --output_directory="$VQ2D_ROOT/data" --datasets annotations -y --version v1_0_5 # Move out vq annotations to $VQ2D_ROOT/data mv $VQ2D_ROOT/data/v1_0_5/annotations/vq_*.json $VQ2D_ROOT/data
-
Process the VQ dataset.
python process_vq_dataset.py --annot-root data --save-root data
-
Extract clips for validation and test data from videos.
python convert_videos_to_clips.py \ --annot-paths data/vq_val.json data/vq_test_unannotated.json \ --save-root data/clips \ --ego4d-videos-root $EGO4D_VIDEOS_DIR \ --num-workers 10 # Increase this for speed
-
Extract images for train and validation data from videos. Please be noted that the frame extraction will take longer time because we also sample some negative frames in the video.
python convert_videos_to_images.py \ --annot-paths data/vq_train.json data/vq_val.json \ --save-root data/images \ --ego4d-videos-root $EGO4D_VIDEOS_DIR \ --num-workers 10 # Increase this for speed
-
Training a model in multiple nodes by following script. It loads images from
INPUT.VQ_IMAGES_ROOT
, and the training log and checkpoints are save in--job-dir
. You could also use the original command for single node training.python slurm_8node_launch.py \ --num-gpus 8 --use-volta32 \ --config-file configs/siam_rcnn_8_gpus_e2e_125k.yaml \ --resume --num-machines 4 --name ivq2d \ --job-dir <PATH to training log dir> \ INPUT.VQ_IMAGES_ROOT <PATH to extracted frames dir> \ INPUT.VQ_DATA_SPLITS_ROOT data
-
Evaluating the our model for visual queries 2D localization on val set.
-
Download the model ckpt and configuration from google drive.
-
Query all the validation videos in parallel. To do so, pleas edit
slurm_eval_500_array.sh
to specify the paths, then submit the job array to slurm. NB 1,TRAIN_ROOT
is the folder for the downloaded checkpoint and configuration file, andEVAL_ROOT
saves the evaluation from each run. NB 2,N_PART
is the number of the splits. The script will produceN_PART
results.sbatch scripts/faster_evaluation/slurm_eval_array.sh
-
Merge all the predictions and evaluate the result. Note that is the output of the evalution script (i.e.
EVAL_ROOT
). It is different from training log dir.PYTHONPATH=. python scripts/faster_evaluation/merge_results.py \ --stats-dir <PATH to evaluation experiment dir>
-
-
Making predictions for Ego4D challenge. This is similar to step 2, but we will use different script.
- Ensure that
vq_test_unannotated.json
is copied to$VQ2D_ROOT
. - Query all the test videos in parallel. To do so, pleas edit
slurm_test_500_array.sh
to specify the paths, then submit the job array to slurm.sbatch scripts/faster_evaluation/slurm_test_array.sh
- Merge all the predictions.
PYTHONPATH=. python scripts/faster_evaluation/merge_results_test.py \ --stats-dir <the evaluation experiment dir>
- The file
$EXPT_ROOT/visual_queries_log/test_challenge_predictions.json
should be submitted on the EvalAI server. - Before submission, you can validate the format of the predictions using the following:
cd $VQ2D_ROOT python validate_challenge_predictions.py \ --test-unannotated-path <PATH TO vq_test_unannotated.json> \
- Ensure that