Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

Closed
brentyi opened this issue Jan 31, 2023 · 7 comments · Fixed by #1328
Closed

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

brentyi opened this issue Jan 31, 2023 · 7 comments · Fixed by #1328

Comments

@brentyi
Copy link
Collaborator

brentyi commented Jan 31, 2023

This line:

ENV TCNN_CUDA_ARCHITECTURES=86

Seems to be the cause of a few issues: #1056, #1297, #1168

A workaround was proposed by @dragonheat123: #1056 (comment)

Maybe somebody who knows more about Docker can help automate this?

cc @Zunhammer

@Zunhammer
Copy link
Contributor

I see this could cause issues. So, I tried to create a new docker image with tinycudnn supporting multiple CUDA architecures, however I cannot validate as I only have access to RTX3000 and RTX4000 cards. Could someone who had issues with using docker before test the new image and give feedback for docker pull dromni/nerfstudio:0.1.16

This should support the following GPUs and corresponding CUDA architectures:

H100 40X0 30X0 A100 20X0 TITAN V / V100 10X0 / TITAN Xp 9X0 K80
90 89 86 80 75 70 61 52 37

@Zunhammer
Copy link
Contributor

So, I got one confirmation that it is working now, looking for an additional one to be sure. As soon as I get this IÄll update the Dockerfile and description.
@brentyi Could you test the image again?

@sandros94
Copy link
Contributor

sandros94 commented Feb 1, 2023

@Zunhammer it is indeed working with 0.16 here on a 1660Ti.
I'm just getting an error after the training, but it must be a permission error on the folder I'm binding to.
I was just running out of vram.

@brentyi
Copy link
Collaborator Author

brentyi commented Feb 1, 2023

Thanks for fixing this so quickly @Zunhammer!

@aboutyy
Copy link

aboutyy commented Jun 9, 2023

My graphics is NVIDIA GeForce RTX 3090/PCIe/SSE2 / NVIDIA Corporation. And I am still having problems.

sudo docker run -it --rm --gpus all dromni/nerfstudio:0.3.1

==========
== CUDA ==

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[07:33:03] 🤷 .zshrc not found, skipping. install.py:370
🔍 Found .bashrc! install.py:372
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-install-cli!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-dev-test!
[07:33:06] ✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-dev-sync-viser-message-defs!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-process-data!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-download-data!
[07:33:08] ✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-render!
✔ Wrote new completion to /home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-eval! install.py:134
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-viewer!
❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash'] install.py:124
/home/user/.local/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA install.py:128
initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions
before calling NumCudaDevices() that might have already set an error? Error 804: forward
compatibility was attempted on non supported HW (Triggered internally at
../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/home/user/.local/bin/ns-train", line 5, in
from nerfstudio.scripts.train import entrypoint
File "/home/user/nerfstudio/nerfstudio/scripts/train.py", line 62, in
from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion
File "/home/user/nerfstudio/nerfstudio/configs/method_configs.py", line 55, in
from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind
File "/home/user/nerfstudio/nerfstudio/field_components/init.py", line 17, in
from .encodings import Encoding as Encoding
File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in
import tinycudann as tcnn
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/init.py", line 9, in

from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,
Encoding
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 18, in

raise EnvironmentError("Unknown compute capability. Ensure PyTorch with CUDA support is
installed.")
OSError: Unknown compute capability. Ensure PyTorch with CUDA support is installed.

[07:33:09] ❌ Completion script generation failed: ['ns-export', '--tyro-print-completion', 'bash'] install.py:124
/home/user/.local/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA install.py:128
initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions
before calling NumCudaDevices() that might have already set an error? Error 804: forward
compatibility was attempted on non supported HW (Triggered internally at
../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/home/user/.local/bin/ns-export", line 5, in
from nerfstudio.scripts.exporter import entrypoint
File "/home/user/nerfstudio/nerfstudio/scripts/exporter.py", line 46, in
from nerfstudio.fields.sdf_field import SDFField
File "/home/user/nerfstudio/nerfstudio/fields/sdf_field.py", line 31, in
from nerfstudio.field_components.embedding import Embedding
File "/home/user/nerfstudio/nerfstudio/field_components/init.py", line 17, in
from .encodings import Encoding as Encoding
File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in
import tinycudann as tcnn
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/init.py", line 9, in

from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,
Encoding
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 18, in

raise EnvironmentError("Unknown compute capability. Ensure PyTorch with CUDA support is
installed.")
OSError: Unknown compute capability. Ensure PyTorch with CUDA support is installed.

Traceback (most recent call last):
File "/home/user/.local/bin/ns-install-cli", line 8, in
sys.exit(entrypoint())
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 412, in entrypoint
tyro.cli(main, description=doc)
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 177, in cli
output = _cli_impl(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 429, in _cli_impl
out, consumed_keywords = _calling.call_from_args(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_calling.py", line 204, in call_from_args
return unwrapped_f(*positional_args, **kwargs), consumed_keywords # type: ignore
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 393, in main
_generate_completions_files(completions_dir, scripts_dir, shells_supported, shells_found)
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 324, in _generate_completions_files
completion_paths = list(
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 326, in
lambda path_or_entrypoint_and_shell: _generate_completion(
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 129, in _generate_completion
raise e
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 116, in _generate_completion
new = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-export', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.

@Zunhammer
Copy link
Contributor

Zunhammer commented Jun 9, 2023

It's not able to detect your GPU, usually this is caused by an outdated nvidia driver (must be at least be compatible with CUDA 11.8) or misconfigured docker. Try to use a nvidia docker image and run nvidia-smi inside to see if your GPU is passed through correctly.
E.g.: docker run --gpus all --rm nvidia/cuda nvidia-smi

@aboutyy
Copy link

aboutyy commented Jun 9, 2023

It's not able to detect your GPU, usually this is caused by an outdated nvidia driver (must be at least be compatible with CUDA 11.8) or misconfigured docker. Try to use a nvidia docker image and run nvidia-smi inside to see if your GPU is passed through correctly. E.g.: docker run --gpus all --rm nvidia/cuda nvidia-smi
sudo docker run --gpus all --rm nvidia/cuda:11.8.0-devel-ubuntu22.04 nvidia-smi

GPU is passed through correctly

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Fri Jun 9 09:49:46 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 On | N/A |
| 0% 59C P8 19W / 350W | 183MiB / 24576MiB | 6% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1509 G 90MiB |
| 0 N/A N/A 2009 G 38MiB |
| 0 N/A N/A 4348 G 52MiB |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants