You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
After 2000 steps of ns-train command (nerfstudio==1.1.3), the following happens.
Printing profiling stats, from longest to shortest duration in seconds
VanillaPipeline.get_average_eval_image_metrics: 0.2351
VanillaPipeline.get_average_image_metrics: 0.2226
VanillaPipeline.get_eval_image_metrics_and_images: 0.0689
Trainer.train_iteration: 0.0501
VanillaPipeline.get_train_loss_dict: 0.0414
Trainer.eval_iteration: 0.0010
Traceback (most recent call last):
File "/usr/local/bin/ns-train", line 8, in <module>
sys.exit(entrypoint())
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 262, in entrypoint
main(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 247, in main
launch(
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 189, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/scripts/train.py", line 100, in train_loop
trainer.train()
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/engine/trainer.py", line 298, in train
self.eval_iteration(step)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/utils/decorators.py", line 70, in wrapper
ret = func(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/utils/profiler.py", line 112, in inner
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/engine/trainer.py", line 545, in eval_iteration
metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/utils/profiler.py", line 112, in inner
out = func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/pipelines/base_pipeline.py", line 341, in get_eval_image_metrics_and_images
metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
File "/usr/local/lib/python3.10/dist-packages/nerfstudio/models/splatfacto.py", line 926, in get_image_metrics_and_images
combined_rgb = torch.cat([gt_rgb, predicted_rgb], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 239 but got size 959 for tensor number 1 in the list.
To Reproduce
It happens locally while using the viewer but I don't have a simple way of reproduction.
Expected behavior
No RuntimeError is raised here.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
When I set the PDB debugger there for quick debugging, I saw that the model is in train mode instead of eval mode somehow. Replacing
with the following fixed the error for me but it probably needs more permanent fix.
try:
metrics_dict, images_dict=self.model.get_image_metrics_and_images(outputs, batch)
exceptException:
self.model.eval() # The code fails here due to model.training == True for some reason.metrics_dict, images_dict=self.model.get_image_metrics_and_images(outputs, batch)
It is cryptic why the model is still in train mode at this line but that's how it worked for me.
I had the same error at the following line and the same fix worked.
An additional note (2 weeks after initial posting)
I found that the above try-except statement still fails at the except block in some cases if I make lots of interaction from the viewer when the number of images are large. Probably the following is a better (temporary) fix.
whileTrue:
try:
metrics_dict, images_dict=self.model.get_image_metrics_and_images(outputs, batch)
breakexceptException:
self.model.eval() # The code fails here due to model.training == True for some reason.
The text was updated successfully, but these errors were encountered:
Seeing the same issue here, while using both viewer and tensorboard. My take is that the eval images are already downsampled when loaded according to the path and are getting downsampled again.
Describe the bug
After 2000 steps of ns-train command (nerfstudio==1.1.3), the following happens.
To Reproduce
It happens locally while using the viewer but I don't have a simple way of reproduction.
Expected behavior
No RuntimeError is raised here.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
When I set the PDB debugger there for quick debugging, I saw that the model is in train mode instead of eval mode somehow. Replacing
nerfstudio/nerfstudio/pipelines/base_pipeline.py
Line 341 in 9b3cbc7
It is cryptic why the model is still in train mode at this line but that's how it worked for me.
I had the same error at the following line and the same fix worked.
nerfstudio/nerfstudio/pipelines/base_pipeline.py
Line 388 in 9b3cbc7
An additional note (2 weeks after initial posting)
I found that the above try-except statement still fails at the except block in some cases if I make lots of interaction from the viewer when the number of images are large. Probably the following is a better (temporary) fix.
The text was updated successfully, but these errors were encountered: