Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No CUDA device found; using CPU as fallback. #609

Open
Fedomer opened this issue Oct 4, 2024 · 9 comments
Open

No CUDA device found; using CPU as fallback. #609

Fedomer opened this issue Oct 4, 2024 · 9 comments

Comments

@Fedomer
Copy link

Fedomer commented Oct 4, 2024

First use and just at the first line [1]:
GPU Configuration and Imports in the tutorial Sionna_Ray_Tracing_Introduction was not found.
No CUDA device found; using CPU as fallback.

but !nvidia-smi print:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-PCIE-40GB          Off |   00000000:01:00.0 Off |                    0 |
| N/A   42C    P0             37W /  250W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-PCIE-40GB          Off |   00000000:81:00.0 Off |                    0 |
| N/A   51C    P0             47W /  250W |       1MiB /  40960MiB |      5%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+

I use a container with the docker image.
Other Rapids docker images works fine.
drivers pb????

@merlinND
Copy link
Collaborator

merlinND commented Oct 7, 2024

Hello @Fedomer,

Sionna uses Mitsuba for its ray tracing capabilities, which itself uses OptiX under the hood.
For OptiX to be able to be loaded, the Docker container needs to enable its support. I am not a Docker expert, but I think that enabling the graphics driver capabilities should help: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#driver-capabilities

@Fedomer
Copy link
Author

Fedomer commented Oct 7, 2024

Hello @merlinND ,
thank's you I did it. I've created my container using the tutorial:
podman container create --name Sionna --device nvidia.com/gpu=all -it -p 8888:8888 --privileged=true --env NVIDIA_DRIVER_CAPABILITIES=graphics,compute,utility localhost/sionna:latest

NB: podman use the flags of docker and works fine for rapids images.

@merlinND
Copy link
Collaborator

merlinND commented Oct 7, 2024

Glad it worked!

@merlinND merlinND closed this as completed Oct 7, 2024
@Fedomer
Copy link
Author

Fedomer commented Oct 7, 2024

Hello @merlinND ,
I've done it but it did't work!
I'm still investigating . I will try on a different hardware machine with different OS (Ubuntu 20.04, now I use RedHat enterprise 9.4 with podman)

the "No CUDA device found; " appears when I do : import sionna

@merlinND merlinND reopened this Oct 7, 2024
@gmarcusm
Copy link
Collaborator

gmarcusm commented Oct 7, 2024

could you please run this inside the docker container and give us the result?

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

@Fedomer
Copy link
Author

Fedomer commented Oct 7, 2024

Hi @gmarcusm thanks,
# python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))" 2024-10-07 16:48:56.624726: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-10-07 16:48:56.624791: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-10-07 16:48:56.626089: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-10-07 16:48:56.632935: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU')]

also with import sionna:
`# python3
Python 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import sionna
2024-10-07 16:59:29.563043: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-10-07 16:59:29.563161: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-10-07 16:59:29.564489: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-10-07 16:59:29.571596: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
No CUDA device found; using CPU as fallback.`

it seems that Tensorflow is not GPU enabled! but it's the official build with the dockerfile provided.

@Fedomer
Copy link
Author

Fedomer commented Oct 8, 2024

Upgraded news

Docker container seems load fine sionna package (with cuda) in a computer with Ubuntu 20.04LTS and Nvidia A5000 card with driver:
| NVIDIA-SMI 470.256.02 Driver Version: 470.256.02 CUDA Version: 12.3 |
but have that strange issue in a GPU rack server with dual A100 GPU powered by RedHat enterprise 9.4 and podman as container engine.
Driver in RH9.4 are:
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |

Other container with more recent tensorflow, Rapids works fine.

Still investigating......

** ... after some investigating**
It's seeems that the problem belong to the container engine Podman. Using on different linux distribution with docker the environement works! Contacting Redhat for that issues is the next step.... still investigating.

@csankar69
Copy link

Hi @Fedomer did you solve this issue? Were you able to get it to work on Redhat Linux? I am facing the same problem. TF by itself is able to find a GPU but once I pip install sionna it is not able to find a GPU anymore. Not sure if Sionna downgrades the TF version and messes up things in the process.

@Fedomer
Copy link
Author

Fedomer commented Jan 9, 2025

Hello @csankar69 , for the moment I'm using Docker because I Think it's a problem about GPU podman management or a bad configuration for Sionna. I'm waiting a new Server wit redHat and I will try again. RedHat can't solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants