Skip to content
/ D2A2 Public
forked from JiangXinni/D2A2

Official PyTorch implementation of "The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation"

Notifications You must be signed in to change notification settings

c-pupil/D2A2

 
 

Repository files navigation

Boosting Guided Depth Super-Resolution Through Large Depth Estimation Model and Alignment-then-Fusion Strategy

Yuan-lin Zhang #,Xin-Ni Jiang#,Chun-Le Guo*, Xiong-Xin Tang*, Guo-Qing Wang, Wei Li, Xun Liu,Chong-Yi Li

[Paper] [Project Page]

model

Guided Depth Super-Resolution (GDSR) presents two primary challenges: the resolution gap between Low-Resolution (LR) depth maps and High-Resolution (HR) RGB images, and the modality gap between depth and RGB data. In this study, we leverage the powerful zero-shot capabilities of large pre-trained monocular depth estimation models to address these issues. Specifically, we utilize the output of monocular depth estimation as pseudo-depth to mitigate both gaps. The pseudo-depth map is aligned with the resolution of the RGB image, offering more detailed boundary information than the LR depth map, particularly at larger scales. Furthermore, pseudo-depth provides valuable relative positional information about objects, serving as a critical scene prior to enhance edge alignment and reduce texture overtransfer. However, effectively bridging the cross-modal differences between the guidance inputs (RGB and pseudo-depth) and LR depth remains a significant challenge. To tackle this, we analyze the modality gap from three key perspectives: distribution misalignment, geometrical misalignment, and texture inconsistency. Based on these insights, we propose an alignment-then-fusion strategy, introducing a novel and efficient Dynamic Dual-Aligned and Aggregation Network (D2A2). By leveraging large pre-trained monocular depth estimation models, our approach achieves state-of-the-art performance on multiple benchmark datasets, excelling particularly in the challenging ×16 GDSR task.

Setup

Dependencies

The conda environment with all required dependencies can be generated by running

conda env create -f environment.yml
conda activate GDSR-D2A2
cd models/Deformable_Convolution_V2
sh make.sh

Datasets

The NYUv2 dataset can be downloaded here. Your folder structure should look like this:

NYUv2
└───Depth
│   │   0.npy
│   │   1.npy
│   │   2.npy
│   │   ...
│   │   1448.npy 
└───RGB
│   │   0.jpg
│   │   1.jpg
│   │   2.jpg
│   │   ...
│   │   1448.jpg
└───MDE_relative
│   │   0.png
│   │   1.png
│   │   2.png
│   │   ...
│   │   1448.png

Lu, Middlebury and RGBDD datasets are only used for testing and can be downloaded here.

The pseudo lable is obtained from Depth-Anything-V2, use the model checkpoint Depth-Anything-V2-Large , or you can directly download the monocular depth estimate results from [Google Drive], [Baidu Cloud].

Pretrained Model

Download pretrained models from [Google Drive], [Baidu Cloud] and put them in the pretrained folder.

Training

Please modify the '--scale','--dataset_dir' in file 'option.py'.

python train_d2a2.py  

Testing

  1. Please modify the '--scale','--dataset_dir' in file 'option.py'.
  2. To resume from a checkpoint file, simply use the '-- net_path' argument in option.py to specify the checkpoint.
  3. Try D2A2 on your images!
    python test_d2a2.py
  4. Check your results in result/testresult/D2A2-dataset-******/!

The test results

All the test datasets results are also available. [Google Drive] [Baidu Cloud]

Acknowledgements

We thank these repos sharing their codes: DepthAnythingv2.

Citation


About

Official PyTorch implementation of "The Devil is in the Details: Boosting Guided Depth Super-Resolution via Rethinking Cross-Modal Alignment and Aggregation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 59.3%
  • Cuda 32.0%
  • C++ 8.7%