-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix benchmarking scripts #1005
Fix benchmarking scripts #1005
Conversation
devernay
commented
Nov 22, 2022
•
edited
Loading
edited
- closes nerfacto overfitting on blender scenes? #1000
- see also Poor performance on NeRF Blender Synthetic Data & potential bugs #806
- launch_train_blender.sh:
- add -s option to launch a single job per GPU
- add -v option to use tensorboard instead of wandb
- set nerfacto options according to Poor performance on NeRF Blender Synthetic Data & potential bugs #806 (comment)
- use a single timestamp for all training jobs
- last GPU was ignored
- kill all subprocesses when script is terminated
- print the eval script command-line
- launch_eval_blender.sh:
- add shebang
- add -s option to launch a single job per GPU
- last GPU was ignored
- kill all subprocesses when script is terminated
- update benchmarking doc
- add shebang - add -s option to launch a single job per GPU - last GPU was ignored - kill all subprocesses when script is terminated
- add -s option to launch a single job per GPU - add -v option to use tensorboard instead of wandb - set nerfacto options according to #806 (comment) - use a single timestamp for all training jobs - last GPU was ignored - kill all subprocesses when script is terminated - print the eval script command-line
@@ -71,6 +73,7 @@ The flags used in the benchmarking script are defined as follows: | |||
- `-m`: config name (e.g. `instant-ngp`). This should be the same as what was passed in for -c in the train script. | |||
- `-o`: base output directory for where all of the benchmarks are stored (e.g. `outputs/`). Corresponds to the `--output-dir` in the base `Config` for training. | |||
- `-t`: timestamp of benchmark; also the identifier (e.g. `2022-08-10_172517`). | |||
- `-s`: Launch a single job per GPU. | |||
- `-g`: specifies the gpus to use and if not specified (no -g flag), will automaticaly search for available gpus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this flag still used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a flag I added. It basically launches one job per GPU, then waits on the first one to be finished before relaunching on that GPU.
In the previous version, all jobs were launched in parallel at script launch (and Ctrl-C didn't kill the jobs). That works fine if you have several GPUs and lots of GPU memory, but not if you have a single 16Gb GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh sorry, I was trying to highlight the -g
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this flag was simply ignored in the previous version of the scripts. It just took the list of remaining arguments as the list of GPUs, so I kept it that way and removed the unused flag. I'll adjust this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* launch_eval_blender.sh: fix script - add shebang - add -s option to launch a single job per GPU - last GPU was ignored - kill all subprocesses when script is terminated * launch_train_blender.sh: fix script - add -s option to launch a single job per GPU - add -v option to use tensorboard instead of wandb - set nerfacto options according to nerfstudio-project#806 (comment) - use a single timestamp for all training jobs - last GPU was ignored - kill all subprocesses when script is terminated - print the eval script command-line * launch_eval_blender.sh: fix script * Update benchmarking.md * launch_train_blender.sh: add -s to eval command-line * Update launch_train_blender.sh * Update benchmarking.md
* launch_eval_blender.sh: fix script - add shebang - add -s option to launch a single job per GPU - last GPU was ignored - kill all subprocesses when script is terminated * launch_train_blender.sh: fix script - add -s option to launch a single job per GPU - add -v option to use tensorboard instead of wandb - set nerfacto options according to nerfstudio-project#806 (comment) - use a single timestamp for all training jobs - last GPU was ignored - kill all subprocesses when script is terminated - print the eval script command-line * launch_eval_blender.sh: fix script * Update benchmarking.md * launch_train_blender.sh: add -s to eval command-line * Update launch_train_blender.sh * Update benchmarking.md