Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render nearest training view #2384

Merged
merged 17 commits into from
Nov 13, 2023
Merged

Render nearest training view #2384

merged 17 commits into from
Nov 13, 2023

Conversation

AdamRashid96
Copy link
Contributor

  • Added logic to render the nearest training view to each camera in the rendered path.
  • The nearest training image is picked through a weighted sum of positional and rotational distance. Cameras that are occluded from the current rendered view are not selected unless there are no other training cameras unoccluded (for cases if the rendered view is in an object)
  • Not sure whats the best way to inform people about this. Right now you can select this by adding "nearest_camera" to rendered_output_names
test1.mp4

Copy link
Collaborator

@ethanweber ethanweber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. I think we could Tweet about this feature at some point, as the way to let people know about it.

@@ -47,11 +47,19 @@
from torch import Tensor
from typing_extensions import Annotated

from scipy.spatial.transform import Rotation as R
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'm not a fan of using R just in case someone uses that letter somewhere in the file. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed I got rid of the as R

true_max_dist, true_max_idx = None, None
for i in range(len(train_cameras)):
train_cam_pos = train_cameras[i].camera_to_worlds[:, 3].cpu()
# Make sure the line of sight from rendered cam to training cam is not blocked by any object
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I think it would be nice if the result is deterministic, regardless of how the NeRF is. Have you found it to be needed in some cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a flag check-occlusions so you can toggle whether or not you want the check

max_dist = dist
max_idx = i

if max_idx is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will never be done? Because of the lines above if max_dist is None or dist < max_dist:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this will occur in cases where every camera is occluded because of the continue in the if statement above

Comment on lines 217 to 223
img = train_dataset.get_image(max_idx)
img = img.permute(2, 0, 1)
img = transforms.ToPILImage()(img)
resized_image = img.resize(
(int(cameras.image_width[0]), int(cameras.image_height[0])), PIL.Image.Resampling.LANCZOS
)
resized_image = transforms.ToTensor()(resized_image).permute(1, 2, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could go down to two lines and you could avoid the extra imports.

img = train_dataset.get_image(max_idx)
img = torch.nn.functional.interpolate(
    img.permute(2, 0, 1)[None],
    size=(int(cameras.image_height[0]), int(cameras.image_width[0]))
)[0].permute(1, 2, 0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this looks good. Got rid of the PIL import

Comment on lines 129 to 132
assert pipeline.datamanager.train_dataset is not None
train_dataset = pipeline.datamanager.train_dataset
train_cameras = train_dataset.cameras.to(pipeline.device)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure all pipelines will have this. For example, I'm not sure what would happen with the generative pipeines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this check inside the if render_output_names check so it shouldn't error if other pipelines that don't have train_datasets render.

max_dist, max_idx = -1, -1
true_max_dist, true_max_idx = -1, -1

if "nearest_camera" in rendered_output_names:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expose this some other way than through render_output_names. It is not discoverable at the moment. Maybe a render flag --render-nearest-camera?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I made it into a flag

@@ -47,13 +47,18 @@
from torch import Tensor
from typing_extensions import Annotated

from scipy.spatial.transform import Rotation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency might be nice to use the viser transform library

@@ -75,6 +80,8 @@ def _render_trajectory_video(
image_format: Literal["jpeg", "png"] = "jpeg",
jpeg_quality: int = 100,
colormap_options: colormaps.ColormapOptions = colormaps.ColormapOptions(),
render_nearest_camera=False,
check_occlusions: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to docstring

@@ -121,6 +128,20 @@ def _render_trajectory_video(
with ExitStack() as stack:
writer = None

train_cameras = Cameras(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be inside the if render_nearest_camera statement below

render_nearest_camera: bool = False
"""Whether to render the nearest training camera to the rendered camera."""
check_occlusions: bool = False
"""Whether to check occlusions."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda vague: "if true, checks line-of-sight occlusions when computing camera distance and rejects cameras not visible to each other"

if render_nearest_camera:
img = train_dataset.get_image(max_idx)
resized_image = torch.nn.functional.interpolate(
img.permute(2, 0, 1)[None], size=(int(cameras.image_height[0]), int(cameras.image_width[0]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't work properly if the aspect ratio of the render camera is different than the train camera, it should resize the height to render height, and automatically calculate width based on the aspect ratio

Copy link
Collaborator

@kerrj kerrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

@AdamRashid96 AdamRashid96 merged commit 5f5562b into main Nov 13, 2023
4 checks passed
@AdamRashid96 AdamRashid96 deleted the adam/render_training branch November 13, 2023 18:23
@michaelrubloff
Copy link

Left a few comments. I think we could Tweet about this feature at some point, as the way to let people know about it.

Added an article to help inform people! Let me know if I missed anything or if it needs to be corrected: https://neuralradiancefields.io/nerfstudio-introduces-nearest-camera-feature/

@AdamRashid96
Copy link
Contributor Author

Left a few comments. I think we could Tweet about this feature at some point, as the way to let people know about it.

Added an article to help inform people! Let me know if I missed anything or if it needs to be corrected: https://neuralradiancefields.io/nerfstudio-introduces-nearest-camera-feature/

This is awesome! One thing is that I changed it from adding nearest_camera to rendered_output_names instead to a flag --render-nearest-camera. If you could update the article that would be great.

@michaelrubloff
Copy link

Updated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants