AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

gregwbrooks · 2022-09-15T21:28:49Z

Has this issue been opened before? Check the FAQ, the issues

Describe the bug
Build of hlky or auto fails in the final step with "Attaching to webui-docker-automatic1111-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]" on an AWS g5.xlarge using an AWS Nvidia A10G Tensor Core GPU.

This occurs after step 12 in the build and web interface will not launch following the error. Error is still present after reboot and rebuild of images.

Behavior duplicated on Debian 11 and Ubuntu 22.04, both with and without adding Nvidia drivers per https://levelup.gitconnected.com/how-to-install-an-nvidia-gpu-driver-on-an-aws-ec2-instance-20185c1c578c (This shouldn't have been necessary in any case, since AWS g5 instances include the appropriate Nvidia drivers.)

Which UI
hlky or auto

Steps to Reproduce

Install per instructions on AWS g5.xlarge
Error will show on shell screen after step 12/12 of the auto build

Hardware / Software:

OS: Debian 11 and Ubuntu 22.04
RAM: 16GB
GPU: Nvidia A10G Tensor Core
VRAM: 24GB
Docker Version, Docker compose version: Latest
Release version [e.g. 1.0.1]: Latest

AbdBarho · 2022-09-16T03:53:11Z

Drivers are required. Do you have nvidia container toolkit installed?

try running this command:

docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

if it errors, then your drivers / kernel modules are not setup correctly.

gregwbrooks · 2022-09-16T15:46:08Z

You're right! For those who might stumble on this thread via search, here's the only solution that worked: When using one of AWS's GPU-enabled VMs, you must use one of their Deep Learning OS images. These have the right divers, the toolkit and all the rest already installed and optimized. They are available in Ubuntu and Amazon Linux flavors.

Once I re-imaged with that? Everything works like a charm and a batch of eight 704x704 images at 130 steps takes maybe 300 seconds. That's about a 6-8x improvement over an 8GB GForce GTX 1080.

AbdBarho · 2022-09-16T16:14:10Z

@gregwbrooks Glad it worked! I have added this to the FAQ

https://github.com/AbdBarho/stable-diffusion-webui-docker/wiki/FAQ#running-in-aws

dathide · 2022-09-18T23:23:54Z

I had this issue on Arch Linux, installing nvidia-container-toolkit from the AUR then restarting docker.service fixed it

samos123 · 2022-11-17T18:55:10Z

You can also use the standard Ubuntu 22.04 image on AWS however you just need to install the nvidia drivers and container toolkit like this:

# install nvidia drivers
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb    
sudo apt update
sudo apt install -y cuda-drivers
                                                                  
# Install nvidia toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container
-toolkit-keyring.gpg \                                            
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-docker2

# Test that nvidia can be accessed from container
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Steps taken from nvidia docs

gregwbrooks added the bug Something isn't working label Sep 15, 2022

AbdBarho added the awaiting-response Waiting for the issuer to respond label Sep 16, 2022

AbdBarho closed this as completed Sep 16, 2022

AbdBarho changed the title ~~Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]~~ AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] Sep 16, 2022

AbdBarho removed the awaiting-response Waiting for the issuer to respond label Sep 16, 2022

AbdBarho mentioned this issue Sep 19, 2022

Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #81

Closed

JohnTigue mentioned this issue Feb 9, 2023

Monolith Docker image MountaintopLotus/braintrust#70

Open

JohnTigue mentioned this issue May 21, 2023

Automatic1111 on Docker on EC2 MountaintopLotus/braintrust_automatic1111#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

gregwbrooks commented Sep 15, 2022

AbdBarho commented Sep 16, 2022 •

edited

Loading

gregwbrooks commented Sep 16, 2022

AbdBarho commented Sep 16, 2022

dathide commented Sep 18, 2022

samos123 commented Nov 17, 2022

AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

Comments

gregwbrooks commented Sep 15, 2022

AbdBarho commented Sep 16, 2022 • edited Loading

gregwbrooks commented Sep 16, 2022

AbdBarho commented Sep 16, 2022

dathide commented Sep 18, 2022

samos123 commented Nov 17, 2022

AbdBarho commented Sep 16, 2022 •

edited

Loading