Learning monocular depth estimation infusing traditional stereo knowledge
Fabio Tosi, Filippo Aleotti, Matteo Poggi and Stefano Mattoccia
CVPR 2019
Paper
Supplementary material
Poster
Youtube Video
Tensorflow implementation of monocular Residual Matching (monoResMatch) network.
This code was tested with Tensorflow 1.8, CUDA 9.0 and Ubuntu 16.04.
Cityscapes
The CityScapes dataset contains stereo pairs concerning about 50 cities in Germany taken from amoving vehicle in various weather conditions. It consists of 22,973 stereo pairs splitted into train, validation and test sets. You can find the training set file in ./utils/filenames/cityscapes_train_files.txt
You will need to register in order to download the data.
python main.py --is_training \
--data_path_image [path_cityscapes] \
--data_path_proxy [path_cityscapes_proxy] \
--filenames_file ./utils/filenames/cityscapes_train_files.txt \
--batch_size 6 \
--iterations 150000 \
--learning_rate_schedule 100000,120000 \
--patch_width 512 \
--patch_height 256 \
--height 512 \
--width 1024 \
--initial_learning_rate 0.0001 \
--log_directory ./log/CS \
--dataset cityscapes
KITTI
We used the Eigen split of the data amounting 22600 training samples. You can find it in ./utils/filenames/eigen_train_files.txt folder. You can download the entire full kitti dataset by running:
wget -i utils/kitti_archives_to_download.txt -P [kitti_path]
python main.py --is_training \
--data_path_image [path_kitti] \
--data_path_proxy [path_kitti_proxy] \
--filenames_file [kitti_train_file] \
--batch_size 6 \
--iterations 300000 \
--learning_rate_schedule 180000,240000 \
--patch_height 192 \
--patch_width 640 \
--height 384 \
--width 1280 \
--initial_learning_rate 0.0001 \
--log_directory ./log/K
You can also load an existing model using --checkpoint_path
or/and fine-tune the network using the --retrain
flag.
python main.py --is_training \
--data_path_image [path_kitti] \
--data_path_proxy [path_kitti_proxy] \
--filenames_file [kitti_train_file] \
--batch_size 6 \
--iterations 300000 \
--learning_rate_schedule 180000,240000 \
--patch_height 192 \
--patch_width 640 \
--height 384 \
--width 1280 \
--initial_learning_rate 0.0001 \
--log_directory ./log/CS_K \
--checkpoint_path ./log/CS/model-150000 \
--retrain
Warning: If you want to fine-tune on KITTI raw LiDAR measurements you need to convert depth values to disparities using the baseline distance between the cameras and the camera focal length.
KITTI Eigen test split
python main.py --output_path [output_path] \
--data_path_image [path_kitti] \
--filenames_file ./utils/filenames/eigen_test_files.txt \
--checkpoint_path ./log/CS_K/model-300000
You can also save output images in png format enabling the --save_images flag.
If you want to try on a single image:
python main.py --test_single \
--image_path [image_path] \
--output_path [output_path] \
--checkpoint_path ./log/CS_K/model-300000
To evaluate run:
python utils/evaluate_kitti.py --split eigen \
--disp_folder [path_test_npy] \
--gt_path [path_kitti] \
--garg_crop
You can use the code available at https://github.com/ivankreso/stereo-vision/tree/master/reconstruction/base/rSGM to generate SGM proxy labels and train the monoResMatch network.
Warning: Proxy labels and image stereo pairs should be saved in a folder having the same structure specified in the training file. Pay attention that image stereo pairs are .jpg files whilst proxy labels are 16 bit .png files.
You can download the following pre-trained models:
Qualitative results of the proposed depth-from-mono architecture. From left to right, the input image from KITTI 2015 test set (a), the predicted depth by monoResMatch fine-tuned using (b) 200-acrt ground-truth labels, (c) 200-acrt + 700 raw LiDAR samples and (d) SGM only.
If you find this code useful in your research, please cite:
@InProceedings{Tosi_2019_CVPR,
author = {Tosi, Fabio and Aleotti, Filippo and Poggi, Matteo and Mattoccia, Stefano},
title = {Learning monocular depth estimation infusing traditional stereo knowledge},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
The evaluation code is from "Unsupervised Monocular Depth Estimation with Left-Right Consistency, by C. Godard, O Mac Aodha, G. Brostow, CVPR 2017".