Boosting Guided Depth Super-Resolution Through Large Depth Estimation Model and Alignment-then-Fusion Strategy
Yuan-lin Zhang #,Xin-Ni Jiang#,Chun-Le Guo*, Xiong-Xin Tang*, Guo-Qing Wang, Wei Li, Xun Liu,Chong-Yi Li
[Paper] [Project Page]
Guided Depth Super-Resolution (GDSR) presents two primary challenges: the resolution gap between Low-Resolution (LR) depth maps and High-Resolution (HR) RGB images, and the modality gap between depth and RGB data. In this study, we leverage the powerful zero-shot capabilities of large pre-trained monocular depth estimation models to address these issues. Specifically, we utilize the output of monocular depth estimation as pseudo-depth to mitigate both gaps. The pseudo-depth map is aligned with the resolution of the RGB image, offering more detailed boundary information than the LR depth map, particularly at larger scales. Furthermore, pseudo-depth provides valuable relative positional information about objects, serving as a critical scene prior to enhance edge alignment and reduce texture overtransfer. However, effectively bridging the cross-modal differences between the guidance inputs (RGB and pseudo-depth) and LR depth remains a significant challenge. To tackle this, we analyze the modality gap from three key perspectives: distribution misalignment, geometrical misalignment, and texture inconsistency. Based on these insights, we propose an alignment-then-fusion strategy, introducing a novel and efficient Dynamic Dual-Aligned and Aggregation Network (D2A2). By leveraging large pre-trained monocular depth estimation models, our approach achieves state-of-the-art performance on multiple benchmark datasets, excelling particularly in the challenging ×16 GDSR task.
The conda environment with all required dependencies can be generated by running
conda env create -f environment.yml
conda activate GDSR-D2A2
cd models/Deformable_Convolution_V2
sh make.sh
The NYUv2 dataset can be downloaded here. Your folder structure should look like this:
NYUv2
└───Depth
│ │ 0.npy
│ │ 1.npy
│ │ 2.npy
│ │ ...
│ │ 1448.npy
└───RGB
│ │ 0.jpg
│ │ 1.jpg
│ │ 2.jpg
│ │ ...
│ │ 1448.jpg
└───MDE_relative
│ │ 0.png
│ │ 1.png
│ │ 2.png
│ │ ...
│ │ 1448.png
Lu, Middlebury and RGBDD datasets are only used for testing and can be downloaded here.
The pseudo lable is obtained from Depth-Anything-V2, use the model checkpoint Depth-Anything-V2-Large , or you can directly download the monocular depth estimate results from [Google Drive], [Baidu Cloud].
Download pretrained models from [Google Drive], [Baidu Cloud] and put them in the pretrained folder.
Please modify the '--scale'
,'--dataset_dir'
in file 'option.py'
.
python train_d2a2.py
- Please modify the
'--scale'
,'--dataset_dir'
in file'option.py'
. - To resume from a checkpoint file, simply use the
'-- net_path'
argument inoption.py
to specify the checkpoint. - Try D2A2 on your images!
python test_d2a2.py
- Check your results in
result/testresult/D2A2-dataset-******/
!
All the test datasets results are also available. [Google Drive] [Baidu Cloud]
We thank these repos sharing their codes: DepthAnythingv2.