Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common interface of monodepth estimations #7

Open
hardikdava opened this issue Apr 11, 2024 · 2 comments
Open

common interface of monodepth estimations #7

hardikdava opened this issue Apr 11, 2024 · 2 comments

Comments

@hardikdava
Copy link

Great work, I see there are many interfaces which computes the monodepths from trained model. It would be good to have a single intrerface for monocular depth estimation. Let me know if I can help with anything. I am willing to contribute to the project.

@maturk
Copy link
Owner

maturk commented Apr 11, 2024

Hi @hardikdava, you are correct, there is some legacy code that can be misleading. Let me try to explain here some of the main code.

python dn_splatter/scripts/depth_from_pretrain.py is used to obtain only monocular depth estimates (in this case from Zoe depth model). This takes a directory of images or a transforms.json file and stores the output monocular depth estimates into a folder called dataset_root/mono_depth with the naming of the original rgb images as .npy files like so: mono_depth/rgb_img_name.npy

On the other-hand python dn_splatter/scripts/depth_align.py tries to do a bit more advanced monocular depth estimate. It generates monocular depth estimates (like above) and also SfM sparse depth maps from a COLMAP database file. In addition, the script combines the monocular depth estimates and SfM sparse depth maps by solving for a scale alignment parameter per frame. These are saved in dataset_root/mono_depth with the _aligned.npy extension to distinguish from the original monocular depth estimates. This is a similar idea as in the work https://arxiv.org/abs/2311.13398, but instead of gradient descent, we use the closed-form linear solution to the least squares alignment problem. (although both methods are also supported). I am not entirely sure if the method in the above paper, and my implementation, are exactly the same; however, i suspect they should be similar. This is explained in Sec 4.1 in my arxiv submission and also in the above work. Note, the performance of monocular depth supervision compared to real sensor depth supervision is not equal. Monocular depth supervision can help in sparse-input views, but its impact on dense indoor captures is not comparable. This is a limitation of this work.

Let me know if you have any questions, and I totally agree, there are quite a few interfaces as well as some legacy code that makes readability and understanding difficult. I hope to extend and improve the code-base.

@hardikdava
Copy link
Author

@maturk Thanks for the clarification. Now things are much clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants