Add pose optimization to Splatfacto #2885

jh-surh · 2024-02-08T00:47:30Z

Installation requirements:

git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git submodule update --init --recursive
git checkout jhsurh/add-pose-grad
pip install -e .

Added pose optimization to Splatfacto
Tested on a Record3D dataset:
- before: https://github.com/nerfstudio-project/nerfstudio/assets/110445540/75cdb9c9-8861-43cb-89af-1169e3c1693c
- after: https://github.com/nerfstudio-project/nerfstudio/assets/110445540/4dfa92fe-289d-47e4-bdc2-a6237e15b576

jb-ye · 2024-02-08T03:34:52Z

Do you have more experiments? I am not sure back-propagating gradients to camera pose would work robustly for splatfacto, considering the following factors:

(1) training splatfacto is not like a typical nerf, it has non-grad operations (splitting, culling, resetting gaussians), computing gradients right before / after those operations can be very unstable.
(2) the gradient of each step is computed w.r.t. an image. If a user has 1000 input images, the 30k training steps only allow each camera to be updated with gradients by 30 times. I would imagine this easily breaks the cross-frame consistency (I might be wrong).

Regardless my concerns, I think this is a research area worth exploration. I recommend to experiment with more datasets. One way to validate the work is to start from poses that are known to be less accurate than SfM poses and experiment to check if pose opt can bring back the quality as good as SfM poses. For example, iphone provides online pose estimates in its ARKit ( https://github.com/apple/ARKitScenes ), one can test if training directly from ARKit poses with pose optimizer can produce equally good results.

Besides qualitative evaluation of rendering videos, one can at least monitor the training loss and see if camera_opt reduces training loss by a significant margin.

Another quantitative evaluation method is to backpropagating gradients to only optimize validation cameras using validation images and evaluate against standard metrics, this is similar to what has been in the original nerfstudio paper.

ichsan2895 · 2024-02-08T12:51:29Z

Interesting. Let me check the quality for another dataset

ichsan2895 · 2024-02-08T12:54:07Z

Hey, any documentation how to test it?
or just git clone your repo which you give PR then it run out of the box?

ichsan2895 · 2024-02-08T13:58:06Z

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.

But the value always zero. Is that right?

Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1

I am afraid I dont install it properly or maybe missing a new settings.

jh-surh · 2024-02-08T14:12:32Z

Do you have more experiments? I am not sure back-propagating gradients to camera pose would work robustly for splatfacto, considering the following factors:

(1) training splatfacto is not like a typical nerf, it has non-grad operations (splitting, culling, resetting gaussians), computing gradients right before / after those operations can be very unstable.
(2) the gradient of each step is computed w.r.t. an image. If a user has 1000 input images, the 30k training steps only allow each camera to be updated with gradients by 30 times. I would imagine this easily breaks the cross-frame consistency (I might be wrong).

Regardless my concerns, I think this is a research area worth exploration. I recommend to experiment with more datasets. One way to validate the work is to start from poses that are known to be less accurate than SfM poses and experiment to check if pose opt can bring back the quality as good as SfM poses. For example, iphone provides online pose estimates in its ARKit ( https://github.com/apple/ARKitScenes ), one can test if training directly from ARKit poses with pose optimizer can produce equally good results.

Besides qualitative evaluation of rendering videos, one can at least monitor the training loss and see if camera_opt reduces training loss by a significant margin.

Another quantitative evaluation method is to backpropagating gradients to only optimize validation cameras using validation images and evaluate against standard metrics, this is similar to what has been in the original nerfstudio paper.

@jb-ye

As per your concerns:
(1) The non-grad operations you mention happens in between training iterations so they should not make an impact on the gradient flow during the training phase. The refinement for the poses happens at the same time as that for the gaussians, so if the timing for gradient update is a problem for the pose, the same would be the case for the gaussians.
(2) As long as the magnitude of the learning rate is not too large, this should not be a problem - like most things in deep learning. However, it is true that if there a lot of images, each camera will get few pose updates. I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility.

Regarding your suggestion on using poses from Apple's ARKit, the dataset I captured already uses the pose acquired from it, since Record3D uses Apple's native AR routines to estimate poses online.

I agree with your suggestions on needing quantitative results. I will try to find a way to show some with the tools given. I'm thinking maybe comparing the Record3D poses, poses from my splatfacto update, and those from COLMAP. I'll have to find a way to extract the optimized poses.

jh-surh · 2024-02-08T14:18:59Z

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.

But the value always zero. Is that right?

Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1

I am afraid I dont install it properly or maybe missing a new settings.

@ichsan2895

Hey, have you installed my implementation of gsplat?

git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

ichsan2895 · 2024-02-08T14:51:27Z

Hey, any documentation how to test it? or just git clone your repo which you give PR then it run out of the box?

I see new metric called Train Loss Dict/camera_opt_regularizer & Train Metrics Dict/camera_opt_rotation.
But the value always zero. Is that right?
Also, the are no change in quality. PSNR, SSIM, LPIPS still same as Nerfstudio v1.0.1

I am afraid I dont install it properly or maybe missing a new settings.

@ichsan2895

Hey, have you installed my implementation of gsplat?
git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

I think so, but let me reinstall gsplat again. I will inform the result

ichsan2895 · 2024-02-08T15:13:48Z

Thanks for fast response @jh-surh

I got error with this way

sudo apt-get install cuda-toolkit-11-8 -y
git clone https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

This is the error:

In file included from gsplat/cuda/csrc/backward.cu:2:
    gsplat/cuda/csrc/helpers.cuh:3:10: fatal error: third_party/glm/glm/glm.hpp: No such file or directory
        3 | #include "third_party/glm/glm/glm.hpp"
          |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    compilation terminated.
    error: command '/usr/local/cuda/bin/nvcc' failed with exit code 1
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

[notice] A new release of pip is available: 23.3.1 -> 24.0
[notice] To update, run: python -m pip install --upgrade pip

Another attempt:

sudo apt-get install cuda-toolkit-11-8 -y
git clone--recursive  https://github.com/jh-surh/gsplat.git
cd gsplat
git checkout jhsurh/add-pose-grad
pip install -e .

Yes it works, but a new error happen when I run ns-train splatfacto

File "/usr/local/lib/python3.10/dist-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: Function _ProjectGaussiansBackward returned an invalid gradient at index 3 - got [4, 4] but expected shape compatible with [3, 4]

jh-surh · 2024-02-08T15:23:45Z

@ichsan2895

Hey, sorry for the confusion you have to initialize the submodules as well, i.e.:

git submodule update --init --recursive

jb-ye · 2024-02-08T15:24:01Z

As per your concerns: (1) The non-grad operations you mention happens in between training iterations so they should not make an impact on the gradient flow during the training phase. The refinement for the poses happens at the same time as that for the gaussians, so if the timing for gradient update is a problem for the pose, the same would be the case for the gaussians.

I am afraid this is case by case: the culling and splitting operation happens locally to those specific gaussians being edited. They have a limited impact of the overall loss function. But for alpha resetting operation, the impact is very global, I am interested how pose grad changes right after alpha resetting.

(2) As long as the magnitude of the learning rate is not too large, this should not be a problem - like most things in deep learning. However, it is true that if there a lot of images, each camera will get few pose updates. I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility.

Deep learning doesn't care if the solution converges to a global minimum, but we had only one global optimal solution for pose estimation. That's why most popular pose estimation optimize on a bundle of frames together to reduce the noise of gradient estimation. For our problem, we essentially assume the initial pose is close enough so stochastic gradient won't move the pose away from global optimum. This is something that have no guarantees.

Setting learning rate too small has no impact on the final results, and setting it too large would converge to something unwanted. I found this to be very non-trivial.

Regarding your suggestion on using poses from Apple's ARKit, the dataset I captured already uses the pose acquired from it, since Record3D uses Apple's native AR routines to estimate poses online.
I am appreciating your attempt. But would it make sense to use public datasets so others can reproduce your experiment more easily?

I agree with your suggestions on needing quantitative results. I will try to find a way to show some with the tools given. I'm thinking maybe comparing the Record3D poses, poses from my splatfacto update, and those from COLMAP. I'll have to find a way to extract the optimized poses.

Look forward to your results. Put my concerns aside, I think we can have this camera opt option in the main branch as long as we found it is useful on some datasets, and other people are aware of how to use it properly. Nerfstudio is a research project and should welcome innovations, but I am more comfortable to set this option to false by default.

jb-ye · 2024-02-08T15:27:54Z

nerfstudio/models/splatfacto.py

@@ -157,6 +158,8 @@ class SplatfactoModelConfig(ModelConfig):
    """
    output_depth_during_training: bool = False
    """If True, output depth during training. Otherwise, only output depth during evaluation."""
+    camera_optimizer: CameraOptimizerConfig = field(default_factory=lambda: CameraOptimizerConfig(mode="SO3xR3"))


if I set this camera optimizer to be off, would it still trigger computation overheads w.r.t. computing the gradients w.r.t. to view_mat and proj_mat? If yes, how much overheads it brings?

I will check this. Thank you for the suggestion!

jh-surh · 2024-02-08T15:35:06Z

@ichsan2895, sorry for the confusion, I had some changes that were not updated in nerfstudio-project/gsplat#123
I have pushed them, it should resolve your problem.

jh-surh · 2024-02-08T15:37:28Z

@jb-ye

Look forward to your results. Put my concerns aside, I think we can have this camera opt option in the main branch as long as we found it is useful on some datasets, and other people are aware of how to use it properly. Nerfstudio is a research project and should welcome innovations, but I am more comfortable to set this option to false by default.

Sounds good to me! I will try to get the results ASAP

ichsan2895 · 2024-02-08T16:36:25Z

Unfortunatelly, the result is not good. @jh-surh @jb-ye

This is kitchen dataset from ns-download-data. Red line is this implementation, The another one is Nerfstudio v1.0.1 and dataset is processed by colmap 3.8.

This is playroom dataset from original Inria Gaussian-splatting. Red line is this implementation, The another one is Nerfstudio v1.0.1 and dataset is processed by colmap 3.8.

My suggestion:
Try to tweak hyperparameters for example magnitude of the learning rate is not too large.

nerfstudio/models/splatfacto.py

kerrj · 2024-02-08T19:26:42Z

Thanks for the effort on this! @jh-surh regarding " I think the next thing to add to gsplat to address this issue would be multi-image splatting, although I question it's feasibility", you can already roughly simulate this behavior by adding gradient accumulation (I believe this is already on for camera optimization (see here)). You can try adding it to gaussian parameter groups too to accumulate positional gradients from multiple cameras. I experimented with this behavior early on and found it helped with floaters at the cost of overall quality, but a lot of things have changed since then so it's worth experimenting with again.

As for testing, agree with the suggestions @jb-ye has; the two main things to test are 1) performance on non-COLMAP datasets like ARKit or Polycam, where pose optimization should help a great deal, and 2) performance on COLMAP datasets, where pose optimization could actually hurt, but it's important to quantify how much. Learning rate scheduling and regularizers on the pose deviation of each camera should hopefully help bring the quality drop in 2) down. If you want to be fancy, you can also try adding coarse-to-fine optimization by blurring images early in training and slowly re-adding high frequency information (similar in spirit to barf)

jh-surh · 2024-02-09T00:12:45Z

@ichsan2895

Oof, I'll try to tweak some of the hyperparameters. Thank you for your experimentation!

@kerrj

Thank you for your suggestions! I will try them and see how things work out.

oseiskar · 2024-02-09T18:28:00Z

We (Spectacular AI) have also been experimenting with pose optimization with Splatfacto/gsplat and could add the contributions here (I created a new PR #2891 for them since that was technically simplest at this point).

To recap what has already been said in the thread: The pose optimization is not required and can hurt with still image-based datasets succesfully optimized with COLMAP (or synthetic data). This is what most people in the academia are benchmarking with, which make them seem like the primary use case. However, if image data has been collected with a moving camera, the situation is very different. COLMAP does not give perfect results, and there are alternatives, none of which necessarily result to pixel-perfect SfM results for various reasons, or even aim to do so. These include proprietary systems like ARKit/Record3D, PolyCam and our SDK & tools. For this latter use case, some level of pose optimization is very useful.

I fully agree with #2885 (comment). It's unclear how, I theory, things like gradient accumulation etc. should exactly work with alpha reset. However, this approach seems to work in practice nevertheless.

Main additions:

I don't think apply_to_camera currently works well for Gaussian splatting. The approach is not numerically stable and the priors do not work as supposed to. This commit attempts to fix this 5fe47c9
Things seems to generally work nicer with this kind of hyperparameters: 2e37af2 , which start the pose optimization only after the initial reconstruction has somewhat converged (added a separate "step" ramp mode to support this 04149f7). When the optimization should start may also depend on how much off the initial poses are expected to be.
There is a simpler (approximate) alternative for view matrix gradients in gsplat: Approximate view matrix gradient for pose optimization gsplat#127 (drop-in replacement Add camera pose and projection gradient flow gsplat#123)

jb-ye · 2024-04-11T18:24:25Z

#2891 is merged, closing this PR.

Add pose optimization to Splatfacto

0b3ad41

jh-surh mentioned this pull request Feb 8, 2024

Add camera pose and projection gradient flow nerfstudio-project/gsplat#123

Closed

Merge branch 'main' into jhsurh/splatfacto_add_pose_optim

c51c100

jb-ye reviewed Feb 8, 2024

View reviewed changes

kerrj reviewed Feb 8, 2024

View reviewed changes

nerfstudio/models/splatfacto.py Outdated Show resolved Hide resolved

Revert git merge conflict

9f70592

This was referenced Feb 9, 2024

Approximate view matrix gradient for pose optimization nerfstudio-project/gsplat#127

Merged

Camera pose optimization for Splatfacto #2891

Merged

Merge branch 'main' into jhsurh/splatfacto_add_pose_optim

3e36949

jb-ye closed this Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pose optimization to Splatfacto #2885

Add pose optimization to Splatfacto #2885

jh-surh commented Feb 8, 2024 •

edited

Loading

jb-ye commented Feb 8, 2024 •

edited

Loading

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024

jh-surh commented Feb 8, 2024 •

edited

Loading

jh-surh commented Feb 8, 2024 •

edited

Loading

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024 •

edited

Loading

jh-surh commented Feb 8, 2024

jb-ye commented Feb 8, 2024 •

edited

Loading

jb-ye Feb 8, 2024 •

edited

Loading

jh-surh Feb 8, 2024

jh-surh commented Feb 8, 2024

jh-surh commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024 •

edited

Loading

kerrj commented Feb 8, 2024

jh-surh commented Feb 9, 2024

oseiskar commented Feb 9, 2024

jb-ye commented Apr 11, 2024

Add pose optimization to Splatfacto #2885

Add pose optimization to Splatfacto #2885

Conversation

jh-surh commented Feb 8, 2024 • edited Loading

jb-ye commented Feb 8, 2024 • edited Loading

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024

jh-surh commented Feb 8, 2024 • edited Loading

jh-surh commented Feb 8, 2024 • edited Loading

ichsan2895 commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024 • edited Loading

jh-surh commented Feb 8, 2024

jb-ye commented Feb 8, 2024 • edited Loading

jb-ye Feb 8, 2024 • edited Loading

Choose a reason for hiding this comment

jh-surh Feb 8, 2024

Choose a reason for hiding this comment

jh-surh commented Feb 8, 2024

jh-surh commented Feb 8, 2024

ichsan2895 commented Feb 8, 2024 • edited Loading

kerrj commented Feb 8, 2024

jh-surh commented Feb 9, 2024

oseiskar commented Feb 9, 2024

jb-ye commented Apr 11, 2024

jh-surh commented Feb 8, 2024 •

edited

Loading

jb-ye commented Feb 8, 2024 •

edited

Loading

jh-surh commented Feb 8, 2024 •

edited

Loading

jh-surh commented Feb 8, 2024 •

edited

Loading

ichsan2895 commented Feb 8, 2024 •

edited

Loading

jb-ye commented Feb 8, 2024 •

edited

Loading

jb-ye Feb 8, 2024 •

edited

Loading

ichsan2895 commented Feb 8, 2024 •

edited

Loading