Skip to content

Commit f6af2c9

Browse files
VitorGuiziliniRaresAmbrus
authored andcommitted
packnet-sfm
1 parent 32c2a48 commit f6af2c9

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2267
-2
lines changed

Diff for: LICENSE.md

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019 Toyota Research Institute (TRI)
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Diff for: Makefile

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.
2+
3+
DEPTH_TYPE ?= None
4+
CROP ?= None
5+
SAVE_OUTPUT ?= None
6+
7+
PYTHON ?= python
8+
DOCKER_IMAGE ?= packnet-sfm:master-latest
9+
DOCKER_OPTS := --name packnet-sfm --rm -it \
10+
-e DISPLAY=${DISPLAY} \
11+
-e XAUTHORITY \
12+
-e NVIDIA_DRIVER_CAPABILITIES=all \
13+
-v ~/.cache:/root/.cache \
14+
-v /data:/data \
15+
-v ${PWD}:/workspace/self-supervised-learning \
16+
-v /tmp/.X11-unix/X0:/tmp/.X11-unix/X0 \
17+
-v /dev/null:/dev/raw1394 \
18+
-w /workspace/self-supervised-learning \
19+
--shm-size=444G \
20+
--privileged \
21+
--network=host
22+
23+
.PHONY: all clean docker-build
24+
25+
all: clean
26+
27+
clean:
28+
find . -name "*.pyc" | xargs rm -f && \
29+
find . -name "__pycache__" | xargs rm -rf
30+
31+
32+
docker-build:
33+
docker build \
34+
-t ${DOCKER_IMAGE} . -f docker/Dockerfile
35+
36+
docker-start-interactive: docker-build
37+
nvidia-docker run ${DOCKER_OPTS} ${DOCKER_IMAGE} \
38+
bash
39+
40+
docker-evaluate-depth: docker-build
41+
nvidia-docker run ${DOCKER_OPTS} ${DOCKER_IMAGE} \
42+
bash -c "bash scripts/evaluate_depth.sh ${MODEL} ${INPUT_PATH} ${DEPTH_TYPE} ${CROP} ${SAVE_OUTPUT}"
43+

Diff for: README.md

+169-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,169 @@
1-
# packnet-sfm
2-
Code for "PackNet-SfM: 3D Packing for Self-Supervised Monocular Depth Estimation"
1+
[<img src="/media/figs/tri-logo.png" width="30%">](https://www.tri.global/)
2+
3+
This repository contains code for the following papers:
4+
5+
## 3D Packing for Self-Supervised Monocular Depth Estimation
6+
*Vitor Guizilini, Rares Ambrus, Sudeep Pillai, Allan Raventos and Adrien Gaidon*
7+
8+
[**[Full paper]**](https://arxiv.org/abs/1905.02693)
9+
[**[YouTube]**](https://www.youtube.com/watch?v=b62iDkLgGSI)
10+
11+
## Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances
12+
*Vitor Guizilini, Jie Li, Rares Ambrus, Sudeep Pillai and Adrien Gaidon*
13+
14+
[**[Full paper]**](https://arxiv.org/abs/1910.01765)
15+
[**[YouTube]**](https://www.youtube.com/watch?v=cSwuF-XA4sg)
16+
17+
## Two Stream Networks for Self-Supervised Ego-Motion Estimation
18+
*Rares Ambrus, Vitor Guizilini, Jie Li, Sudeep Pillai and Adrien Gaidon*
19+
20+
[**[Full paper]**](https://arxiv.org/abs/1910.01764)
21+
22+
## Semantically-Guided Representation Learning for Self-Supervised Monocular Depth
23+
*Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus and Adrien Gaidon*
24+
25+
[**[Full paper]**](https://arxiv.org/abs/2002.12319)
26+
27+
## SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation
28+
*Sudeep Pillai, Rares Ambrus and Adrien Gaidon*
29+
30+
[**[Full paper]**](https://arxiv.org/abs/1810.01849)
31+
[**[YouTube]**](https://www.youtube.com/watch?v=jKNgBeBMx0I&t=33s)
32+
33+
## Contributions
34+
35+
- **PackNet**: A new convolutional network architecture for high-resolution self-supervised monocular depth estimation. We propose new packing and unpacking blocks that jointly leverage 3D convolutions to learn representations that maximally propagate dense appearance and geometric information while still being able to run in real time.
36+
37+
- **Weak Velocity Supervision**: A novel optional loss that can leverage the camera’s velocity when available (e.g. from cars, robots, mobile phones) to solve the inherent scale ambiguity in monocular vision.
38+
39+
- **Dense Depth for Automated Driving (DDAD)**: A new dataset that leverages diverse logs from a fleet of well-calibrated self-driving cars equipped with cameras and high-accuracy long-range LiDARs. Compared toexisting benchmarks, DDAD enables much more accurate depth evaluation at range, which is key for high resolution monocular depth estimation methods.
40+
41+
## Qualitative Results
42+
43+
### Self-Supervised - KITTI
44+
45+
<img src="/media/figs/teaser27.png" width="49%"> <img src="/media/figs/teaser51.png" width="49%">
46+
<img src="/media/figs/teaser305.png" width="49%"> <img src="/media/figs/teaser291.png" width="49%">
47+
48+
### Self-Supervised - DDAD
49+
50+
<img src="/media/figs/ddad1.png" width="49%"> <img src="/media/figs/ddad2.png" width="49%">
51+
<img src="/media/figs/ddad3.png" width="49%"> <img src="/media/figs/ddad4.png" width="49%">
52+
53+
### Semi-Supervised - KITTI
54+
55+
<img src="/media/figs/beams_full.jpg" width="32%" height="170cm"> <img src="/media/figs/beams_64.jpg" width="32%" height="170cm"> <img src="/media/figs/beams_32.jpg" width="32%" height="170cm">
56+
<img src="/media/figs/beams_16.jpg" width="32%" height="170cm"> <img src="/media/figs/beams_8.jpg" width="32%" height="170cm"> <img src="/media/figs/beams_4.jpg" width="32%" height="170cm">
57+
58+
### Semantically-Guided Self-Supervised Depth - KITTI
59+
60+
<img src="/media/figs/semguided.png" width="98%">>
61+
62+
### Solving the Infinite Depth Problem
63+
64+
<img src="/media/figs/infinite_depth.png" width="98%">
65+
66+
## How to Use
67+
68+
### Step 1: Clone this repository
69+
70+
```
71+
git clone https://github.com/vguizilini/packnet-sfm.git
72+
```
73+
74+
### Step 2: Create symbolic link to data folder
75+
76+
```
77+
sudo ln -s path/to/data /data
78+
```
79+
80+
### Step 3: Download datasets into /data/datasets
81+
82+
#### [KITTI_raw](http://www.cvlibs.net/datasets/kitti/raw_data.php)
83+
- For convenience, we also provide the pre-computed depth maps used in our papers (unzip into the same root folder)
84+
```
85+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/depth_maps/KITTI_raw_velodyne.tar.gz
86+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/depth_maps/KITTI_raw_groundtruth.tar.gz
87+
```
88+
89+
### Step 4: Download pre-trained models into /data/models
90+
91+
#### KITTI
92+
- Self-Supervised (192x640, K)
93+
```
94+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet_MR_selfsup_K.pth.tar
95+
```
96+
- Self-Supervised (192x640, CS)
97+
```
98+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet_MR_selfsup_CS.pth.tar
99+
```
100+
- Self-Supervised Scale-Aware (192x640, CS &rightarrow; K)
101+
```
102+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet_MR_velsup_CStoK.pth.tar
103+
```
104+
- Semi-Supervised (Annotated depth maps) (192x640, CS &rightarrow; K)
105+
```
106+
wget https://tri-ml-public.s3.amazonaws.com/github/packnet-sfm/models/PackNet_MR_semisup_CStoK.pth.tar
107+
```
108+
109+
### Step 5: Inference
110+
```
111+
bash evaluate_kitti.sh
112+
```
113+
114+
### License
115+
116+
The source code is released under the [MIT license](LICENSE.md).
117+
118+
### Citations
119+
Depending on the application, please use the following citations when referencing our work:
120+
121+
```
122+
@misc{packnet-sfm-selfsup,
123+
author = {Vitor Guizilini and Rares Ambrus and Sudeep Pillai and Allan Raventos and Adrien Gaidon},
124+
title = {3D Packing for Self-Supervised Monocular Depth Estimation},
125+
archivePrefix = {arXiv:1905.02693},
126+
primaryClass = {cs.CV}
127+
year = {2019},
128+
}
129+
```
130+
131+
```
132+
@proceedings{packnet-sfm-semisup,
133+
author = {Vitor Guizilini and Jie Li and Rares Ambrus and Sudeep Pillai and Adrien Gaidon},
134+
title = {Robust Semi-Supervised Monocular Depth Estimation with Reprojected Distances},
135+
booktitle = {In Proceedings of the 3rd Annual Conference on Robot Learning (CoRL)}
136+
month = {October},
137+
year = {2019},
138+
}
139+
```
140+
141+
```
142+
@proeedings{packnet-sfm-twostream,
143+
author = {Rares Ambrus and Vitor Guizilini and Jie Li and Sudeep Pillai and Adrien Gaidon},
144+
title = {{Two Stream Networks for Self-Supervised Ego-Motion Estimation}},
145+
booktitle = {In Proceedings of the 3rd Annual Conference on Robot Learning (CoRL)}
146+
month = {October},
147+
year = {2019},
148+
}
149+
```
150+
151+
```
152+
@proceedings{packnet-sfm-semguided,
153+
author = {Vitor Guizilini and Rui Hou and Jie Li and Rares Ambrus and Adrien Gaidon},
154+
title = {Semantically-Guided Representation Learning for Self-Supervised Monocular Depth},
155+
booktitle = {In Proceedings of the 8th International Conference on Learning Representations (ICLR)}
156+
month = {April},
157+
year = {2020},
158+
}
159+
```
160+
161+
```
162+
@proceedings{superdepth,
163+
author = {Sudeep Pillai and Rares Ambrus and Adrien Gaidon},
164+
title = {SuperDepth: Self-Supervised, Super-Resolved Monocular Depth Estimation},
165+
booktitle = {In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)}
166+
month = {May},
167+
year = {2019},
168+
}
169+
```

Diff for: docker/Dockerfile

+77
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.
2+
3+
FROM nvidia/cuda:10.0-devel-ubuntu18.04
4+
5+
ENV PYTORCH_VERSION=1.1.0
6+
ENV TORCHVISION_VERSION=0.3.0
7+
ENV CUDNN_VERSION=7.6.0.64-1+cuda10.0
8+
ENV NCCL_VERSION=2.4.7-1+cuda10.0
9+
10+
# Python 2.7 or 3.6 is supported by Ubuntu Bionic out of the box
11+
ARG python=3.6
12+
ENV PYTHON_VERSION=${python}
13+
ENV DEBIAN_FRONTEND=noninteractive
14+
15+
# Set default shell to /bin/bash
16+
SHELL ["/bin/bash", "-cu"]
17+
18+
RUN apt-get update && apt-get install -y --allow-downgrades --allow-change-held-packages --no-install-recommends \
19+
build-essential \
20+
cmake \
21+
g++-4.8 \
22+
git \
23+
curl \
24+
docker.io \
25+
vim \
26+
wget \
27+
ca-certificates \
28+
libcudnn7=${CUDNN_VERSION} \
29+
libnccl2=${NCCL_VERSION} \
30+
libnccl-dev=${NCCL_VERSION} \
31+
libjpeg-dev \
32+
libpng-dev \
33+
python${PYTHON_VERSION} \
34+
python${PYTHON_VERSION}-dev \
35+
python3-tk \
36+
librdmacm1 \
37+
libibverbs1 \
38+
ibverbs-providers \
39+
libgtk2.0-dev \
40+
unzip \
41+
bzip2 \
42+
htop \
43+
gnuplot \
44+
ffmpeg
45+
46+
# Instal Python and pip
47+
RUN if [[ "${PYTHON_VERSION}" == "3.6" ]]; then \
48+
apt-get install -y python${PYTHON_VERSION}-distutils; \
49+
fi
50+
51+
RUN ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python
52+
53+
RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
54+
python get-pip.py && \
55+
rm get-pip.py
56+
57+
# Install PyTorch
58+
RUN pip install future typing numpy awscli
59+
RUN pip install https://download.pytorch.org/whl/cu100/torch-${PYTORCH_VERSION}-cp36-cp36m-linux_x86_64.whl
60+
RUN pip install https://download.pytorch.org/whl/cu100/torchvision-${TORCHVISION_VERSION}-cp36-cp36m-linux_x86_64.whl
61+
RUN pip install numpy h5py
62+
63+
# Configure environment variables - default working directory is "/workspace"
64+
WORKDIR /workspace
65+
ENV PYTHONPATH="/workspace"
66+
67+
RUN pip install awscli tqdm numpy-quaternion termcolor path.py pillow==6.1 opencv-python-headless matplotlib
68+
69+
# self-supervised-learning copy
70+
RUN mkdir -p /workspace/experiments
71+
RUN mkdir -p /workspace/self-supervised-learning
72+
WORKDIR /workspace/self-supervised-learning
73+
74+
# Copy self-supervised learning source
75+
COPY . /workspace/self-supervised-learning
76+
ENV PYTHONPATH="/workspace/self-supervised-learning:$PYTHONPATH"
77+

Diff for: evaluate_kitti.sh

+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.
2+
3+
# Example of evaluation script for KITTI
4+
5+
make docker-evaluate-depth \
6+
MODEL=/data/models/packnet/PackNet_MR_selfsup_K.pth.tar \
7+
INPUT_PATH=/data/datasets/KITTI_raw/data_splits/eigen_test_files.txt \
8+
DEPTH_TYPE=velodyne \
9+
CROP=garg \
10+
SAVE_OUTPUT=output
11+
12+

Diff for: media/figs/beams_16.jpg

396 KB
Loading

Diff for: media/figs/beams_32.jpg

520 KB
Loading

Diff for: media/figs/beams_4.jpg

418 KB
Loading

Diff for: media/figs/beams_64.jpg

585 KB
Loading

Diff for: media/figs/beams_8.jpg

448 KB
Loading

Diff for: media/figs/beams_full.jpg

663 KB
Loading

Diff for: media/figs/ddad1.png

1.24 MB
Loading

Diff for: media/figs/ddad2.png

1.27 MB
Loading

Diff for: media/figs/ddad3.png

1.15 MB
Loading

Diff for: media/figs/ddad4.png

1.09 MB
Loading

Diff for: media/figs/infinite_depth.png

402 KB
Loading

Diff for: media/figs/semguided.png

1.72 MB
Loading

Diff for: media/figs/sparse_beams.png

1.5 MB
Loading

Diff for: media/figs/teaser27.png

3.56 MB
Loading

Diff for: media/figs/teaser291.png

391 KB
Loading

Diff for: media/figs/teaser305.png

3.56 MB
Loading

Diff for: media/figs/teaser51.png

3.56 MB
Loading

Diff for: media/figs/tri-logo.png

9.04 KB
Loading

Diff for: monodepth/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.

Diff for: monodepth/datasets/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.

Diff for: monodepth/datasets/data_augmentation.py

+62
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Copyright 2020 Toyota Research Institute. All rights reserved.
2+
3+
"""
4+
Data augmentation functions
5+
"""
6+
7+
import numpy as np
8+
import torchvision.transforms as transforms
9+
from PIL import Image
10+
11+
12+
def filter_dict(dict, keywords):
13+
"""
14+
Returns only keywords that are present in a dictionary
15+
"""
16+
return [key for key in keywords if key in dict]
17+
18+
19+
def resize_sample_image_and_intrinsics(sample, image_shape, image_interpolation=Image.ANTIALIAS):
20+
"""
21+
Takes a sample and resizes the input image ['left_image'].
22+
It also resizes the corresponding camera intrinsics ['left_intrinsics'] and ['right_intrinsics']
23+
"""
24+
# Resize image and corresponding intrinsics
25+
image_transform = transforms.Resize(image_shape, interpolation=image_interpolation)
26+
original_shape = sample['left_image'].size
27+
(orig_w, orig_h) = original_shape
28+
(out_h, out_w) = image_shape
29+
30+
for key in filter_dict(sample, [
31+
'left_intrinsics', 'right_intrinsics'
32+
]):
33+
# Note this is swapped here because PIL.Image.size -> (w,h)
34+
# but we specify image_shape -> (h,w) for rescaling
35+
y_scale = out_h / orig_h
36+
x_scale = out_w / orig_w
37+
# scale fx and fy appropriately
38+
intrinsics = np.copy(sample[key])
39+
intrinsics[0] *= x_scale
40+
intrinsics[1] *= y_scale
41+
sample[key] = intrinsics
42+
43+
# Scale image (default is antialias)
44+
for key in filter_dict(sample, [
45+
'left_image', 'right_image',
46+
]):
47+
sample[key] = image_transform(sample[key])
48+
49+
return sample
50+
51+
52+
def to_tensor_sample(sample, tensor_type='torch.FloatTensor'):
53+
"""
54+
Converts all fields from a sample to tensor.
55+
"""
56+
transform = transforms.ToTensor()
57+
for key in filter_dict(sample, [
58+
'left_image', 'right_image',
59+
'left_depth', 'right_depth',
60+
]):
61+
sample[key] = transform(sample[key]).type(tensor_type)
62+
return sample

0 commit comments

Comments
 (0)