-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pose optimization to Splatfacto #2885
Add pose optimization to Splatfacto #2885
Conversation
Do you have more experiments? I am not sure back-propagating gradients to camera pose would work robustly for splatfacto, considering the following factors: (1) training splatfacto is not like a typical nerf, it has non-grad operations (splitting, culling, resetting gaussians), computing gradients right before / after those operations can be very unstable. Regardless my concerns, I think this is a research area worth exploration. I recommend to experiment with more datasets. One way to validate the work is to start from poses that are known to be less accurate than SfM poses and experiment to check if pose opt can bring back the quality as good as SfM poses. For example, iphone provides online pose estimates in its ARKit ( https://github.com/apple/ARKitScenes ), one can test if training directly from ARKit poses with pose optimizer can produce equally good results. Besides qualitative evaluation of rendering videos, one can at least monitor the training loss and see if camera_opt reduces training loss by a significant margin. Another quantitative evaluation method is to backpropagating gradients to only optimize validation cameras using validation images and evaluate against standard metrics, this is similar to what has been in the original nerfstudio paper. |
Interesting. Let me check the quality for another dataset |
Hey, any documentation how to test it? |
As per your concerns: Regarding your suggestion on using poses from Apple's ARKit, the dataset I captured already uses the pose acquired from it, since Record3D uses Apple's native AR routines to estimate poses online. I agree with your suggestions on needing quantitative results. I will try to find a way to show some with the tools given. I'm thinking maybe comparing the Record3D poses, poses from my splatfacto update, and those from COLMAP. I'll have to find a way to extract the optimized poses. |
Thanks for fast response @jh-surh I got error with this way
This is the error:
Another attempt:
Yes it works, but a new error happen when I run
|
Hey, sorry for the confusion you have to initialize the submodules as well, i.e.: git submodule update --init --recursive |
I am afraid this is case by case: the culling and splitting operation happens locally to those specific gaussians being edited. They have a limited impact of the overall loss function. But for alpha resetting operation, the impact is very global, I am interested how pose grad changes right after alpha resetting.
Deep learning doesn't care if the solution converges to a global minimum, but we had only one global optimal solution for pose estimation. That's why most popular pose estimation optimize on a bundle of frames together to reduce the noise of gradient estimation. For our problem, we essentially assume the initial pose is close enough so stochastic gradient won't move the pose away from global optimum. This is something that have no guarantees. Setting learning rate too small has no impact on the final results, and setting it too large would converge to something unwanted. I found this to be very non-trivial.
Look forward to your results. Put my concerns aside, I think we can have this camera opt option in the main branch as long as we found it is useful on some datasets, and other people are aware of how to use it properly. Nerfstudio is a research project and should welcome innovations, but I am more comfortable to set this option to false by default. |
nerfstudio/models/splatfacto.py
Outdated
@@ -157,6 +158,8 @@ class SplatfactoModelConfig(ModelConfig): | |||
""" | |||
output_depth_during_training: bool = False | |||
"""If True, output depth during training. Otherwise, only output depth during evaluation.""" | |||
camera_optimizer: CameraOptimizerConfig = field(default_factory=lambda: CameraOptimizerConfig(mode="SO3xR3")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I set this camera optimizer to be off, would it still trigger computation overheads w.r.t. computing the gradients w.r.t. to view_mat and proj_mat? If yes, how much overheads it brings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will check this. Thank you for the suggestion!
@ichsan2895, sorry for the confusion, I had some changes that were not updated in nerfstudio-project/gsplat#123 |
Sounds good to me! I will try to get the results ASAP |
Unfortunatelly, the result is not good. @jh-surh @jb-ye This is This is My suggestion: |
Thanks for the effort on this! @jh-surh regarding " I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility", you can already roughly simulate this behavior by adding gradient accumulation (I believe this is already on for camera optimization (see here)). You can try adding it to gaussian parameter groups too to accumulate positional gradients from multiple cameras. I experimented with this behavior early on and found it helped with floaters at the cost of overall quality, but a lot of things have changed since then so it's worth experimenting with again. As for testing, agree with the suggestions @jb-ye has; the two main things to test are 1) performance on non-COLMAP datasets like ARKit or Polycam, where pose optimization should help a great deal, and 2) performance on COLMAP datasets, where pose optimization could actually hurt, but it's important to quantify how much. Learning rate scheduling and regularizers on the pose deviation of each camera should hopefully help bring the quality drop in 2) down. If you want to be fancy, you can also try adding coarse-to-fine optimization by blurring images early in training and slowly re-adding high frequency information (similar in spirit to barf) |
Oof, I'll try to tweak some of the hyperparameters. Thank you for your experimentation! Thank you for your suggestions! I will try them and see how things work out. |
We (Spectacular AI) have also been experimenting with pose optimization with Splatfacto/gsplat and could add the contributions here (I created a new PR #2891 for them since that was technically simplest at this point). To recap what has already been said in the thread: The pose optimization is not required and can hurt with still image-based datasets succesfully optimized with COLMAP (or synthetic data). This is what most people in the academia are benchmarking with, which make them seem like the primary use case. However, if image data has been collected with a moving camera, the situation is very different. COLMAP does not give perfect results, and there are alternatives, none of which necessarily result to pixel-perfect SfM results for various reasons, or even aim to do so. These include proprietary systems like ARKit/Record3D, PolyCam and our SDK & tools. For this latter use case, some level of pose optimization is very useful. I fully agree with #2885 (comment). It's unclear how, I theory, things like gradient accumulation etc. should exactly work with alpha reset. However, this approach seems to work in practice nevertheless. Main additions:
|
#2891 is merged, closing this PR. |
Requires nerfstudio-project/gsplat#123
Installation requirements: