This repository represents the official implementation of the paper titled "DepthLab: From Partial to Complete".
Zhiheng Liu*
·
Ka Leong Cheng*
·
Qiuyu Wang
·
Shuzhe Wang
·
Hao Ouyang
·
Bin Tan
·
Kai Zhu
·
Yujun Shen
·
Qifeng Chen
·
Ping Luo
We present DepthLab, a robust depth inpainting foundation model that can be applied to various downstream tasks to enhance performance. Many tasks naturally contain partial depth information, such as (1) 3D Gaussian inpainting, (2) LiDAR depth completion, (3) sparse-view reconstruction with DUSt3R, and (4) text-to-scene generation. Our model leverages this known information to achieve improved depth estimation, enhancing performance in downstream tasks. We hope to motivate more related tasks to adopt DepthLab.
- 2024-12-25: Inference code and paper is released.
- [To-do]: Release the training code to facilitate fine-tuning, allowing adaptation to different mask types in your downstream tasks.
Clone the repository (requires git):
git clone https://github.com/Johanan528/DepthLab.git
cd DepthLab
Install with conda
:
conda env create -f environment.yaml
conda activate DepthLab
Download the Marigold checkpoint here, the image encoder checkpoint here, and our checkpoints at Hugging Face. The downloaded checkpoint directory has the following structure:
.
`-- checkpoints
|-- marigold-depth-v1-0
|-- CLIP-ViT-H-14-laion2B-s32B-b79K
`-- DepthLab
|-- denoising_unet.pth
|-- reference_unet.pth
`-- mapping_layer.pth
Masks: PNG/JPG or Numpy, where black (0) represents the known regions, and white (1) indicates the predicted areas.
Known depths: Numpy
Images: PNG/JPG
cd scripts
bash infer.sh
You can find all results in output/in-the-wild_example
. Enjoy!
The default settings are optimized for the best result. However, the behavior of the code can be customized:
--denoise_steps
: Number of denoising steps of each inference pass. For the original (DDIM) version, it's recommended to use 20-50 steps.--processing_res
: The processing resolution. For cases where the mask is sparse, such as in depth completion scenarios, it is advisable to set the 'processing_res' and the mask size to be the same in order to avoid accuracy loss in the mask due to resizing.--normalize_scale
: When the known depth scale cannot encompass the global scale, it is possible to reduce the normalization scale, allowing the model to better predict the depth of distant objects.--strength
: When set to 1, the prediction is entirely based on the model itself. When set to a value less than 1, the model is partially assisted by interpolated masked depth to some extent.--blend
: Whether to use Blend Diffusion, a commonly used technique in image inpainting.--refine
: If you want to refine depthmap of DUSt3R, or you have a full initial depthmap, turn this option on.
This project is developped on the codebase of Marigold and MagicAnimate. We appreciate their great works!
Please cite our paper:
@article{liu2024depthlab,
title={DepthLab: From Partial to Complete},
author={Liu, Zhiheng and Cheng, Ka Leong and Wang, Qiuyu and Wang, Shuzhe and Ouyang, Hao and Tan, Bin and Zhu, Kai and Shen, Yujun and Chen, Qifeng and Luo, Ping},
journal={arXiv preprint arXiv:2412.18153},
year={2024}
}