Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subprocess.CalledProcessError: Command '['ns-render', '--tyro-print-completion', 'bash']' returned non-zero exit status 1. While running nerfStudio from docker image #1056

Closed
nehilsood opened this issue Nov 30, 2022 · 17 comments Β· Fixed by #1328

Comments

@nehilsood
Copy link

[09:49:20] 🀷 .zshrc not found, skipping. install.py:202
πŸ” Found .bashrc! install.py:204
[09:49:21] βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-dev-test! install.py:109
βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-install-cli! install.py:109
βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-process-data! install.py:109
[09:49:25] βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-eval! install.py:109
βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-download-data! install.py:109
[09:49:29] βœ” Wrote new completion to /data/dl/nerfstudio/scripts/completions/bash/_ns-train! install.py:109
Traceback (most recent call last):
File "/usr/local/bin/ns-install-cli", line 8, in
sys.exit(entrypoint())
File "/data/dl/nerfstudio/scripts/completions/install.py", line 274, in entrypoint
tyro.cli(main, description=doc)
File "/usr/local/lib/python3.8/dist-packages/tyro/_cli.py", line 125, in cli
_cli_impl(
File "/usr/local/lib/python3.8/dist-packages/tyro/_cli.py", line 326, in _cli_impl
out, consumed_keywords = _calling.call_from_args(
File "/usr/local/lib/python3.8/dist-packages/tyro/_calling.py", line 194, in call_from_args
return unwrapped_f(*args, **kwargs), consumed_keywords # type: ignore
File "/data/dl/nerfstudio/scripts/completions/install.py", line 243, in main
completion_paths = list(
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/data/dl/nerfstudio/scripts/completions/install.py", line 245, in
lambda path_or_entrypoint_and_shell: _generate_completion(
File "/data/dl/nerfstudio/scripts/completions/install.py", line 98, in _generate_completion
new = subprocess.run(
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-render', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.

!!!!While running nerfStudio from docker image 0.1.11!!!!!

@ps1x
Copy link

ps1x commented Dec 1, 2022

Having exaclty same issue.

@rishikeshrmadan
Copy link

@ps1x @Napolean29 @brentyi are you running an old NVidia driver version and/or in headless mode? I tried in two different machines, and got this error in the machine with the older driver that's running headless, I cannot update the drivers there myself to check if they are the cause. The other machine works just fine.

@ps1x
Copy link

ps1x commented Dec 2, 2022

I'm using fedora 37. Driver is in "hybrid" mode. Driver version: 520.56.06
UPD:

(base) ➜  ~ nvidia-smi
Fri Dec  2 12:16:14 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.56.06    Driver Version: 520.56.06    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   40C    P8     1W /  N/A |    232MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2431      G   /usr/libexec/Xorg                  52MiB |
|    0   N/A  N/A      4207    C+G   ...eSite,UserAgentClientHint      178MiB |
+-----------------------------------------------------------------------------+

Maybe problem is that my CUDA version is not recommended 11.3?

@brentyi
Copy link
Collaborator

brentyi commented Dec 2, 2022

I haven't been able to reproduce this error myself, but @ps1x @Napolean29 can either of you try with the latest docker image (0.1.12)?

I'd expect it to still fail, but with #1068 merged the error message should at least be more useful.

@nehilsood
Copy link
Author

Screenshot 2022-12-04 at 7 18 46 PM

configuration I am using

@ps1x
Copy link

ps1x commented Dec 5, 2022

I haven't been able to reproduce this error myself, but @ps1x @Napolean29 can either of you try with the latest docker image (0.1.12)?

I'd expect it to still fail, but with #1068 merged the error message should at least be more useful.

[14:03:31] βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-eval!             install.py:117
           ❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash']        install.py:107
           βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-download-data!    install.py:117
           Traceback (most recent call last):                                                             install.py:111
             File "/home/user/.local/bin/ns-train", line 5, in <module>                                                 
               from scripts.train import entrypoint                                                                     
             File "/home/user/nerfstudio/scripts/train.py", line 50, in <module>                                        
               from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion                                   
             File "/home/user/nerfstudio/nerfstudio/configs/method_configs.py", line 45, in <module>                    
               from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind                      
             File "/home/user/nerfstudio/nerfstudio/field_components/__init__.py", line 17, in <module>                 
               from .encodings import Encoding, ScalingAndOffset                                                        
             File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in <module>                
               import tinycudann as tcnn                                                                                
             File "/home/user/.local/lib/python3.8/site-packages/tinycudann/__init__.py", line 9, in                    
           <module>                                                                                                     
               from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,                 
           Encoding                                                                                                     
             File "/home/user/.local/lib/python3.8/site-packages/tinycudann/modules.py", line 33, in                    
           <module>                                                                                                     
               if _C is None:                                                                                           
           NameError: name '_C' is not defined                                                                          
                                                                                                                        
Traceback (most recent call last):
           βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-render!           install.py:117
(●     ) ✍  Generating completions...  File "/home/user/.local/bin/ns-install-cli", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/user/nerfstudio/scripts/completions/install.py", line 282, in entrypoint
    tyro.cli(main, description=__doc__)
  File "/home/user/.local/lib/python3.8/site-packages/tyro/_cli.py", line 125, in cli
    _cli_impl(
  File "/home/user/.local/lib/python3.8/site-packages/tyro/_cli.py", line 326, in _cli_impl
    out, consumed_keywords = _calling.call_from_args(
  File "/home/user/.local/lib/python3.8/site-packages/tyro/_calling.py", line 194, in call_from_args
    return unwrapped_f(*args, **kwargs), consumed_keywords  # type: ignore
  File "/home/user/nerfstudio/scripts/completions/install.py", line 251, in main
    completion_paths = list(
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/user/nerfstudio/scripts/completions/install.py", line 253, in <lambda>
    lambda path_or_entrypoint_and_shell: _generate_completion(
  File "/home/user/nerfstudio/scripts/completions/install.py", line 112, in _generate_completion
    raise e
  File "/home/user/nerfstudio/scripts/completions/install.py", line 99, in _generate_completion
    new = subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-train', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.

@sandros94
Copy link
Contributor

sandros94 commented Dec 5, 2022

So I'm getting the same issue that @ps1x is getting (ns-train's error) on 0.1.12

I'm using the latest docker desktop via WSL2, on win11 and a 1660 ti

if I run docker run -it --gpus=all --rm nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark to test if the gpu virtualization is working everything works fine.

EDIT: driver is 526.98 studio, gpu is not in headless mode (in fact it's the only one and used)

@dragonheat123
Copy link

dragonheat123 commented Dec 8, 2022

Hi, I had the same problem.

Turns out that when installing tinycudann via pip install, it reads ENV TCNN_CUDA_ARCHITECTURES=86 off the Dockerfile, and compiles a version suitable to the compute capability of your GPU. check out setup.py for tinycudann here

You can try to replace 86 by the number you see when you run the code snippet:

import torch
major, minor = torch.cuda.get_device_capability()
compute_capabilities = [major * 10 + minor]
print(f"Obtained compute capability {compute_capabilities[0]} from PyTorch")

or either omit setting ENV TCNN_CUDA_ARCHITECTURES in the Dockerfile, since setup.py will test for it anyway. (I didn't try this)

@ps1x
Copy link

ps1x commented Dec 8, 2022

Does not work without setting TCNN_CUDA_ARCHITECTURES in Dockerfile

  Γ— python setup.py egg_info did not run successfully.
  β”‚ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-i5yqewf2/bindings/torch/setup.py", line 30, in <module>
          raise EnvironmentError("Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.")
      OSError: Unknown compute capability. Specify the target compute capabilities in the TCNN_CUDA_ARCHITECTURES environment variable or install PyTorch with the CUDA backend to detect it automatically.
      No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
      Building PyTorch extension for tiny-cuda-nn version 1.6
      [end of output]

@b0ot
Copy link

b0ot commented Dec 21, 2022

Hi, I had the same problem.

Turns out that when installing tinycudann via pip install, it reads ENV TCNN_CUDA_ARCHITECTURES=86 off the Dockerfile, and compiles a version suitable to the compute capability of your GPU. check out setup.py for tinycudann here

You can try to replace 86 by the number you see when you run the code snippet:

import torch
major, minor = torch.cuda.get_device_capability()
compute_capabilities = [major * 10 + minor]
print(f"Obtained compute capability {compute_capabilities[0]} from PyTorch")

or either omit setting ENV TCNN_CUDA_ARCHITECTURES in the Dockerfile, since setup.py will test for it anyway. (I didn't try this)

How do you modify a docker image?
Based on your code, mine should be '66'

ENV TCNN_CUDA_ARCHITECTURES=66

NVIDIA GeForce GTX 1070

@XinyueZ
Copy link

XinyueZ commented Jan 31, 2023

#1056 (comment)
works for me @dragonheat123

@Scolymus
Copy link

Scolymus commented Jan 31, 2023

Same! works @dragonheat123

In my case I was using the official docker to build another one and I needed compability 75. I forced reinstallation of torch and tinycudnn

FROM dromni/nerfstudio:0.1.15

ENV TCNN_CUDA_ARCHITECTURES=75
RUN python3.10 -m pip install --no-cache-dir --upgrade --force-reinstall --no-deps torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
RUN python3.10 -m pip install --no-cache-dir --upgrade --force-reinstall --no-deps git+https://github.com/NVlabs/tiny-cuda-nn.git#subdirectory=bindings/torch

# Do your stuff

PS. @b0ot does it solve this for you?

@Zunhammer
Copy link
Contributor

Hi, could you please try again with docker pull dromni/nerfstudio:0.1.16, The issue should be fixed, just waiting for confirmations that it is really solved as I cannot test on my own (only have RTX4000 available atm).
I'll update Dockerfile and descriptions as soon as I got the fix confirmed,
Thanks

@b0ot
Copy link

b0ot commented Feb 1, 2023

@Scolymus thanks for the tip. Based on @Zunhammer, I tried the install again and got much further.

I'm fairly new to Docker so it took me awhile to lookup how to run it again, especially on Windows after installing docker.

For other new people, all the following did in windows cmd
Step 1
docker pull dromni/nerfstudio:0.1.16

That takes awhile to download, once complete:

Step 2
docker run --gpus all -v C:\\Users\\your\\fullpath:\\createdFolder -v C:\\Users\\your\\fullpath\\.cache:/home/user/cache -p 7007:7007 --rm -it dromni/nerfstudio:0.1.16

Notes: You need \\ for windows. Relative paths did not work for me, I needed the full absolute path.

That now launches into a linux env

C:\Users\tom>docker run --gpus all -v C:\\Users\\tom\\docker_nerfstudio:\\docker_nerfstudio -v C:\\Users\\tom\\docker_nerfstudio\\.cache:/home/user/cache -p 7007:7007 --rm -it dromni/nerfstudio:0.1.16

==========
== CUDA ==
==========

CUDA Version 11.7.1

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[14:24:00] 🀷 .zshrc not found, skipping.                                                                 install.py:210
           πŸ” Found .bashrc!                                                                              install.py:212
           βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-install-cli!      install.py:117
           βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-dev-test!         install.py:117
[14:24:01] βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-process-data!     install.py:117
[14:24:03] βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-download-data!    install.py:117
[14:24:05] βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-render!           install.py:117
           βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-eval!             install.py:117
[14:24:07] βœ” Wrote new completion to /home/user/nerfstudio/scripts/completions/bash/_ns-train!            install.py:117
           πŸ™† Completions installed to /home/user/.bashrc. Neat! Open a new shell to try them out.        install.py:184
All done!
user@8f9a64c9b11f:/workspace$

Since I can run this, I believe the issue this thread is discussing is now resolved
I can now run commands like ns-download-data and see info related to arguments / subcommands

However, I still seem to have some issues as if I try to run the Training your first model example, I have an error:

user@8f9a64c9b11f:/workspace$ ns-download-data nerfstudio --capture-name=poster
Traceback (most recent call last):
  File "/home/user/.local/bin/ns-download-data", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/user/nerfstudio/scripts/downloads/download_data.py", line 321, in entrypoint
    main(tyro.cli(Commands))
  File "/home/user/nerfstudio/scripts/downloads/download_data.py", line 313, in main
    dataset.save_dir.mkdir(parents=True, exist_ok=True)
  File "/usr/lib/python3.10/pathlib.py", line 1175, in mkdir
    self._accessor.mkdir(self, mode)
PermissionError: [Errno 13] Permission denied: 'data'

However, this issue seems to be seperate from the topic in this thread and more similar to this Issue 1309

@XinyueZ
Copy link

XinyueZ commented Feb 1, 2023

@b0ot I think you should "chmod 777" to the "data" fold before you run the docker.

@Zunhammer
Copy link
Contributor

Zunhammer commented Feb 1, 2023

So the image is working fine, please use the option to set a user defined data folder for ns-download-data (should be --save-dir) (see https://docs.nerf.studio/en/latest/reference/cli/ns_download_data.html)
Also, in Windows you need to use \ as you already did, however on the docker side I think it should still be /. Try to use:

docker run --gpus all -v 'C:\Users\your\fullpath:/createdFolder' -v 'C:\Users\your\fullpath\.cache:/home/user/cache' -p 7007:7007 --rm -it dromni/nerfstudio:0.1.16

and then

ns-download-data nerfstudio --save-dir /createdFolder --capture-name=poster

@eyildiz-ugoe
Copy link

This problem exists without the docker installation as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.