Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70

Closed
gregwbrooks opened this issue Sep 15, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@gregwbrooks
Copy link

Has this issue been opened before? Check the FAQ, the issues

Describe the bug
Build of hlky or auto fails in the final step with "Attaching to webui-docker-automatic1111-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]" on an AWS g5.xlarge using an AWS Nvidia A10G Tensor Core GPU.

This occurs after step 12 in the build and web interface will not launch following the error. Error is still present after reboot and rebuild of images.

Behavior duplicated on Debian 11 and Ubuntu 22.04, both with and without adding Nvidia drivers per https://levelup.gitconnected.com/how-to-install-an-nvidia-gpu-driver-on-an-aws-ec2-instance-20185c1c578c (This shouldn't have been necessary in any case, since AWS g5 instances include the appropriate Nvidia drivers.)

Which UI
hlky or auto

Steps to Reproduce

  1. Install per instructions on AWS g5.xlarge
  2. Error will show on shell screen after step 12/12 of the auto build

Hardware / Software:

  • OS: Debian 11 and Ubuntu 22.04
  • RAM: 16GB
  • GPU: Nvidia A10G Tensor Core
  • VRAM: 24GB
  • Docker Version, Docker compose version: Latest
  • Release version [e.g. 1.0.1]: Latest
@gregwbrooks gregwbrooks added the bug Something isn't working label Sep 15, 2022
@AbdBarho
Copy link
Owner

AbdBarho commented Sep 16, 2022

Drivers are required. Do you have nvidia container toolkit installed?

try running this command:

docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi

if it errors, then your drivers / kernel modules are not setup correctly.

@AbdBarho AbdBarho added the awaiting-response Waiting for the issuer to respond label Sep 16, 2022
@gregwbrooks
Copy link
Author

You're right! For those who might stumble on this thread via search, here's the only solution that worked: When using one of AWS's GPU-enabled VMs, you must use one of their Deep Learning OS images. These have the right divers, the toolkit and all the rest already installed and optimized. They are available in Ubuntu and Amazon Linux flavors.

Once I re-imaged with that? Everything works like a charm and a batch of eight 704x704 images at 130 steps takes maybe 300 seconds. That's about a 6-8x improvement over an 8GB GForce GTX 1080.

@AbdBarho
Copy link
Owner

@gregwbrooks Glad it worked! I have added this to the FAQ

https://github.com/AbdBarho/stable-diffusion-webui-docker/wiki/FAQ#running-in-aws

@AbdBarho AbdBarho changed the title Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] Sep 16, 2022
@AbdBarho AbdBarho removed the awaiting-response Waiting for the issuer to respond label Sep 16, 2022
@dathide
Copy link

dathide commented Sep 18, 2022

I had this issue on Arch Linux, installing nvidia-container-toolkit from the AUR then restarting docker.service fixed it

@samos123
Copy link

You can also use the standard Ubuntu 22.04 image on AWS however you just need to install the nvidia drivers and container toolkit like this:

# install nvidia drivers
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e 's/\.//g')
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb    
sudo apt update
sudo apt install -y cuda-drivers
                                                                  
# Install nvidia toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container
-toolkit-keyring.gpg \                                            
      && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
            sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
            sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt update
sudo apt install -y nvidia-docker2

# Test that nvidia can be accessed from container
sudo docker run --rm --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

Steps taken from nvidia docs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants