Official implementation of the paper
Rethinking Inductive Biases for Surface Normal Estimation
CVPR 2024 [oral]
Gwangbin Bae and Andrew J. Davison
[paper.pdf] [arXiv] [youtube] [project page]
Despite the growing demand for accurate surface normal estimation models, existing methods use general-purpose dense prediction models, adopting the same inductive biases as other tasks. In this paper, we discuss the inductive biases needed for surface normal estimation and propose to (1) utilize the per-pixel ray direction and (2) encode the relationship between neighboring surface normals by learning their relative rotation. The proposed method can generate crisp — yet, piecewise smooth — predictions for challenging in-the-wild images of arbitrary resolution and aspect ratio. Compared to a recent ViT-based state-of-the-art model, our method shows a stronger generalization ability, despite being trained on an orders of magnitude smaller dataset.
We provide the instructions in four steps (click "▸" to expand). For example, if you just want to test DSINE on some images, you can stop after Step 1. This would minimize the amount of installation/downloading.
Step 1. Test DSINE on some images (requires minimal dependencies)
Start by installing dependencies.
conda create --name DSINE python=3.10
conda activate DSINE
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
python -m pip install geffnet
Then, download the model weights from this link and save it under projects/dsine/checkpoints/
. Note that it should maintain the same folder structure as the google drive. For example, checkpoints/exp001_cvpr2024/dsine.pt
(in google drive) is our best model. It should be saved as projects/dsine/checkpoints/exp001_cvpr2024/dsine.pt
. The corresponding config file is projects/dsine/experiments/exp001_cvpr2024/dsine.txt
.
The models under checkpoints/exp002_kappa/
(in google drive) are the ones that can also estimate uncertainty.
Then, move to the folder projects/dsine/
, and run
python test_minimal.py ./experiments/exp001_cvpr2024/dsine.txt
This will generate predictions for the images under projects/dsine/samples/img/
. The result will be saved under projects/dsine/samples/output/
.
Our model assumes known camera intrinsics, but providing approximate intrinsics still gives good results. For some images in projects/dsine/samples/img/
, the corresponding camera intrinsics (fx, fy, cx, cy - assuming perspective camera with no distortion) is provided as a .txt
file. If such a file does not exist, the intrinsics will be approximated, by assuming
Step 2. Test DSINE on benchmark datasets & run a real-time demo
Install additional dependencies.
python -m pip install tensorboard
python -m pip install opencv-python
python -m pip install matplotlib
python -m pip install pyrealsense2 # needed only for demo using a realsense camera
python -m pip install vidgear # needed only for demo on YouTube videos
python -m pip install yt_dlp # needed only for demo on YouTube videos
python -m pip install mss # needed only for demo on screen capture
Download the evaluation datasets (dsine_eval.zip
) from this link.
NOTE: By downloading the dataset, you are agreeing to the respective LICENSE of each dataset. The link to the dataset can be found in the respective readme.txt
.
If you go to projects/__init__.py
, there is a variable called DATASET_DIR
and EXPERIMENT_DIR
:
DATASET_DIR
is where your dataset should be stored. For example, thedsine_eval
dataset (downloaded from the link above) should be saved underDATASET_DIR/dsine_eval
. Update this variable.EXPERIMENT_DIR
is where the experiments (e.g. model weights, log, etc) will be saved. Update this variable.
Then, move to the folder projects/dsine/
, and run:
# getting benchmark performance on the six evaluation datasets
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode benchmark
# getting benchmark performance on the six evaluation datasets (with visualization)
# it will be saved under EXPERIMENT_DIR/dsine/exp001_cvpr2024/dsine/test/
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode benchmark --visualize
# generate predictions for the images in `projects/dsine/samples/img/
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode samples
# measure the throughput (inference speed) on your device
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode throughput
You can also run a real-time demo by running:
# captures your screen and makes prediction
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode screen
# demo using webcam
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode webcam
# demo using a realsense camera
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode rs
# demo on a Youtube video (replace with a different link)
python test.py ./experiments/exp001_cvpr2024/dsine.txt --mode https://www.youtube.com/watch?v=X-iEq8hWd6k
For each input option, there are some additional parameters. See projects/dsine/test.py
for more information.
You can also try building your own real-time demo. Please see this notebook for more information.
Step 3. Train DSINE
In projects/dsine/
, run:
python train.py ./experiments/exp000_test/test.txt
And do tensorboard --logdir EXPERIMENT_DIR/dsine/exp000_test/test/log
to open the tensorboard.
This will train the model on the train split of the NYUv2 dataset, which should be under DATASET_DIR/dsine_eval/nyuv2/train/
. There are only 795 images here, and the performance will not be good. To get better results you need to:
(1) Create a custom dataloader
We are checking if we can release the entire training dataset (~400GB). Before the release, you can try building your custom dataloader. You need to define a
get_sample(args, sample_path, info)
function and provide a data split indata/datasets
. Check how they are defined/provided for other datasets. You also need to updateprojects/baseline_normal/dataloader.py
so the newly definedget_sample
function can be used.
(2) Generate GT surface normals (optional)
In case your dataset does not come with ground truth surface normal maps, you can try generating them from the ground truth depth maps. Please see this notebook for more information.
(3) Customize data augmentation
In case you are using synthetic images, you need the right set of data augmentation functions to minimize the synthetic-to-real domain gap. We provide a wide range of augmentation functions, but the hyperparameters are not finetuned and you can potentially get better results by finetuning them. Please see this notebook for more information.
Step 4. Start your own surface normal estimation project
If you want to start your own surface normal estimation project, you can do so very easily.
First of all, have a look at projects/baseline_normal
. This is a place where you can try different CNN architectures without worrying about the camera intrinsics and rotation estimation. You can try popular architectures like U-Net, and try different backbones. In this folder, you can run:
python train.py ./experiments/exp000_test/test.txt
The project-specific config
is defined in projects/baseline_normal/config.py
. Default config, which is shared across all projects are in projects/__init__.py
.
The dataloaders are in projects/baseline_normal/dataloader.py
. We use the same dataloaders in dsine
project, so we don't have projects/dsine/dataloader.py
.
The losses are defined in projects/baseline_normal/losses.py
. These are building blocks for your custom loss functions in your own project. For example, in the DSINE project, we produce a list of predictions and the loss is the weighted sum of the losses computed for each prediction. You can see how this is done in projects/dsine/losses.py
.
You can start a new project by copying the folder projects/dsine
to create projects/NEW_PROJECT_NAME
. Then, update the config.py
and losses.py
.
Lastly, you can should train.py
and test.py
. For things that should be different in different projects, we made a note like following:
#↓↓↓↓
#NOTE: forward pass
img = data_dict['img'].to(device)
intrins = data_dict['intrins'].to(device)
...
pred_list = model(img, intrins=intrins, mode='test')
norm_out = pred_list[-1]
#↑↑↑↑
Search for the arrows (↓↓↓↓/↑↑↑↑) to see where things should be modified in different projects.
The test commands above (e.g. for getting the benchmark performance & running real-time demo) should apply the same for all projects.
If you want to make contributions to this repo, please make a pull request and add instructions in the following format.
Using torch hub to predict normal (contribution by hugoycj, updated by Pierre M.)
import torch
import cv2
import numpy as np
import sys
normal_predictor = torch.hub.load("pierremerriaux-leddartech/DSINE", "DSINE", trust_repo=True, source='github')
image = cv2.imread('projects/dsine/samples/img/office_01.png', cv2.IMREAD_COLOR)
h, w = image.shape[:2]
# Use the model to infer the normal map from the input image
with torch.inference_mode():
normal = normal_predictor.infer_cv2(image)[0] # Output shape: (H, W, 3)
normal = (normal + 1) / 2 # Convert values to the range [0, 1]
# Convert the normal map to a displayable format
normal = (normal * 255).cpu().numpy().astype(np.uint8).transpose(1, 2, 0)
normal = cv2.cvtColor(normal, cv2.COLOR_RGB2BGR)
# Save the output normal map to a file
cv2.imwrite('projects/dsine/samples/img/office_01_result.png', normal)
If the network is unavailable to retrieve weights, you can use local weights for torch hub as shown below:
normal_predictor = torch.hub.load("pierremerriaux-leddartech/DSINE", "DSINE", local_file_path='./checkpoints/dsine.pt', trust_repo=True)
Generating ground truth surface normals
We provide the code used to generate the ground truth surface normals from ground truth depth maps. See this notebook for more information.About the coordinate system
We use the right-handed coordinate system with (X, Y, Z) = (right, down, front). An important thing to note is that both the ground truth normals and our prediction are the outward normals. For example, in the case of a fronto-parallel wall facing the camera, the normals would be (0, 0, 1), not (0, 0, -1). If you instead need to use the inward normals, please donormals = -normals
.
Sharing your model weights
If you wish to share your model weights, please make a pull request by providing the corresponding config file and the link to the weights.If you find our work useful in your research please consider citing our paper:
@inproceedings{bae2024dsine,
title = {Rethinking Inductive Biases for Surface Normal Estimation},
author = {Gwangbin Bae and Andrew J. Davison},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2024}
}
If you use the models that also estimate the uncertainty, please also cite the following paper, where we introduced the loss function:
@InProceedings{bae2021eesnu,
title = {Estimating and Exploiting the Aleatoric Uncertainty in Surface Normal Estimation}
author = {Gwangbin Bae and Ignas Budvytis and Roberto Cipolla},
booktitle = {International Conference on Computer Vision (ICCV)},
year = {2021}
}