Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

brentyi · 2023-01-31T17:38:43Z

This line:

nerfstudio/Dockerfile

Line 10 in 0e9e26d

ENV TCNN_CUDA_ARCHITECTURES=86

Seems to be the cause of a few issues: #1056, #1297, #1168

A workaround was proposed by @dragonheat123: #1056 (comment)

Maybe somebody who knows more about Docker can help automate this?

cc @Zunhammer

Zunhammer · 2023-02-01T10:58:57Z

I see this could cause issues. So, I tried to create a new docker image with tinycudnn supporting multiple CUDA architecures, however I cannot validate as I only have access to RTX3000 and RTX4000 cards. Could someone who had issues with using docker before test the new image and give feedback for docker pull dromni/nerfstudio:0.1.16

This should support the following GPUs and corresponding CUDA architectures:

H100	40X0	30X0	A100	20X0	TITAN V / V100	10X0 / TITAN Xp	9X0	K80
90	89	86	80	75	70	61	52	37

Zunhammer · 2023-02-01T13:03:49Z

So, I got one confirmation that it is working now, looking for an additional one to be sure. As soon as I get this IÄll update the Dockerfile and description.
@brentyi Could you test the image again?

sandros94 · 2023-02-01T17:23:44Z

@Zunhammer it is indeed working with 0.16 here on a 1660Ti.
~~I'm just getting an error after the training, but it must be a permission error on the folder I'm binding to.~~
I was just running out of vram.

brentyi · 2023-02-01T18:32:04Z

Thanks for fixing this so quickly @Zunhammer!

aboutyy · 2023-06-09T07:33:41Z

My graphics is NVIDIA GeForce RTX 3090/PCIe/SSE2 / NVIDIA Corporation. And I am still having problems.

sudo docker run -it --rm --gpus all dromni/nerfstudio:0.3.1

==========
== CUDA ==

CUDA Version 11.8.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

[07:33:03] 🤷 .zshrc not found, skipping. install.py:370
🔍 Found .bashrc! install.py:372
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-install-cli!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-dev-test!
[07:33:06] ✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-dev-sync-viser-message-defs!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-process-data!
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-download-data!
[07:33:08] ✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-render!
✔ Wrote new completion to /home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-eval! install.py:134
✔ Wrote new completion to install.py:134
/home/user/nerfstudio/nerfstudio/scripts/completions/bash/_ns-viewer!
❌ Completion script generation failed: ['ns-train', '--tyro-print-completion', 'bash'] install.py:124
/home/user/.local/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA install.py:128
initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions
before calling NumCudaDevices() that might have already set an error? Error 804: forward
compatibility was attempted on non supported HW (Triggered internally at
../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/home/user/.local/bin/ns-train", line 5, in
from nerfstudio.scripts.train import entrypoint
File "/home/user/nerfstudio/nerfstudio/scripts/train.py", line 62, in
from nerfstudio.configs.method_configs import AnnotatedBaseConfigUnion
File "/home/user/nerfstudio/nerfstudio/configs/method_configs.py", line 55, in
from nerfstudio.field_components.temporal_distortions import TemporalDistortionKind
File "/home/user/nerfstudio/nerfstudio/field_components/init.py", line 17, in
from .encodings import Encoding as Encoding
File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in
import tinycudann as tcnn
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/init.py", line 9, in

from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,
Encoding
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 18, in

raise EnvironmentError("Unknown compute capability. Ensure PyTorch with CUDA support is
installed.")
OSError: Unknown compute capability. Ensure PyTorch with CUDA support is installed.

[07:33:09] ❌ Completion script generation failed: ['ns-export', '--tyro-print-completion', 'bash'] install.py:124
/home/user/.local/lib/python3.10/site-packages/torch/cuda/init.py:107: UserWarning: CUDA install.py:128
initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions
before calling NumCudaDevices() that might have already set an error? Error 804: forward
compatibility was attempted on non supported HW (Triggered internally at
../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
Traceback (most recent call last):
File "/home/user/.local/bin/ns-export", line 5, in
from nerfstudio.scripts.exporter import entrypoint
File "/home/user/nerfstudio/nerfstudio/scripts/exporter.py", line 46, in
from nerfstudio.fields.sdf_field import SDFField
File "/home/user/nerfstudio/nerfstudio/fields/sdf_field.py", line 31, in
from nerfstudio.field_components.embedding import Embedding
File "/home/user/nerfstudio/nerfstudio/field_components/init.py", line 17, in
from .encodings import Encoding as Encoding
File "/home/user/nerfstudio/nerfstudio/field_components/encodings.py", line 34, in
import tinycudann as tcnn
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/init.py", line 9, in

from tinycudann.modules import free_temporary_memory, NetworkWithInputEncoding, Network,
Encoding
File "/home/user/.local/lib/python3.10/site-packages/tinycudann/modules.py", line 18, in

raise EnvironmentError("Unknown compute capability. Ensure PyTorch with CUDA support is
installed.")
OSError: Unknown compute capability. Ensure PyTorch with CUDA support is installed.

Traceback (most recent call last):
File "/home/user/.local/bin/ns-install-cli", line 8, in
sys.exit(entrypoint())
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 412, in entrypoint
tyro.cli(main, description=doc)
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 177, in cli
output = _cli_impl(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_cli.py", line 429, in _cli_impl
out, consumed_keywords = _calling.call_from_args(
File "/home/user/.local/lib/python3.10/site-packages/tyro/_calling.py", line 204, in call_from_args
return unwrapped_f(*positional_args, **kwargs), consumed_keywords # type: ignore
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 393, in main
_generate_completions_files(completions_dir, scripts_dir, shells_supported, shells_found)
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 324, in _generate_completions_files
completion_paths = list(
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 458, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 326, in
lambda path_or_entrypoint_and_shell: _generate_completion(
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 129, in _generate_completion
raise e
File "/home/user/nerfstudio/nerfstudio/scripts/completions/install.py", line 116, in _generate_completion
new = subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ns-export', '--tyro-print-completion', 'bash']' returned non-zero exit status 1.

Zunhammer · 2023-06-09T07:52:55Z

It's not able to detect your GPU, usually this is caused by an outdated nvidia driver (must be at least be compatible with CUDA 11.8) or misconfigured docker. Try to use a nvidia docker image and run nvidia-smi inside to see if your GPU is passed through correctly.
E.g.: docker run --gpus all --rm nvidia/cuda nvidia-smi

aboutyy · 2023-06-09T09:52:34Z

It's not able to detect your GPU, usually this is caused by an outdated nvidia driver (must be at least be compatible with CUDA 11.8) or misconfigured docker. Try to use a nvidia docker image and run nvidia-smi inside to see if your GPU is passed through correctly. E.g.: docker run --gpus all --rm nvidia/cuda nvidia-smi
sudo docker run --gpus all --rm nvidia/cuda:11.8.0-devel-ubuntu22.04 nvidia-smi

GPU is passed through correctly

CUDA Version 11.8.0

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

Zunhammer mentioned this issue Feb 1, 2023

[Fix] Docker not supporting older CUDA architectures #1328

Merged

Zunhammer closed this as completed in #1328 Feb 1, 2023

robinsonkwame mentioned this issue Feb 7, 2024

Docker dromni/nerfstudi ns-train crashes #2883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

brentyi commented Jan 31, 2023

Zunhammer commented Feb 1, 2023

Zunhammer commented Feb 1, 2023

sandros94 commented Feb 1, 2023 •

edited

Loading

brentyi commented Feb 1, 2023

aboutyy commented Jun 9, 2023

Zunhammer commented Jun 9, 2023 •

edited

Loading

aboutyy commented Jun 9, 2023

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

Don't hardcode TCNN_CUDA_ARCHITECTURES in Dockerfile #1317

Comments

brentyi commented Jan 31, 2023

Zunhammer commented Feb 1, 2023

Zunhammer commented Feb 1, 2023

sandros94 commented Feb 1, 2023 • edited Loading

brentyi commented Feb 1, 2023

aboutyy commented Jun 9, 2023

========== == CUDA ==

Zunhammer commented Jun 9, 2023 • edited Loading

aboutyy commented Jun 9, 2023

sandros94 commented Feb 1, 2023 •

edited

Loading

==========
== CUDA ==

Zunhammer commented Jun 9, 2023 •

edited

Loading