-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]] #70
Comments
Drivers are required. Do you have nvidia container toolkit installed? try running this command: docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi if it errors, then your drivers / kernel modules are not setup correctly. |
You're right! For those who might stumble on this thread via search, here's the only solution that worked: When using one of AWS's GPU-enabled VMs, you must use one of their Deep Learning OS images. These have the right divers, the toolkit and all the rest already installed and optimized. They are available in Ubuntu and Amazon Linux flavors. Once I re-imaged with that? Everything works like a charm and a batch of eight 704x704 images at 130 steps takes maybe 300 seconds. That's about a 6-8x improvement over an 8GB GForce GTX 1080. |
@gregwbrooks Glad it worked! I have added this to the FAQ |
I had this issue on Arch Linux, installing nvidia-container-toolkit from the AUR then restarting docker.service fixed it |
You can also use the standard Ubuntu 22.04 image on AWS however you just need to install the nvidia drivers and container toolkit like this:
Steps taken from nvidia docs |
Has this issue been opened before? Check the FAQ, the issues
Describe the bug
Build of hlky or auto fails in the final step with "Attaching to webui-docker-automatic1111-1
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]" on an AWS g5.xlarge using an AWS Nvidia A10G Tensor Core GPU.
This occurs after step 12 in the build and web interface will not launch following the error. Error is still present after reboot and rebuild of images.
Behavior duplicated on Debian 11 and Ubuntu 22.04, both with and without adding Nvidia drivers per https://levelup.gitconnected.com/how-to-install-an-nvidia-gpu-driver-on-an-aws-ec2-instance-20185c1c578c (This shouldn't have been necessary in any case, since AWS g5 instances include the appropriate Nvidia drivers.)
Which UI
hlky or auto
Steps to Reproduce
Hardware / Software:
The text was updated successfully, but these errors were encountered: