Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for nvidia driver version 535.161.08 #10605

Open
dawndrain opened this issue Jul 2, 2024 · 2 comments
Open

Add support for nvidia driver version 535.161.08 #10605

dawndrain opened this issue Jul 2, 2024 · 2 comments
Labels
area: gpu Issue related to sandboxed GPU access type: enhancement New feature or request

Comments

@dawndrain
Copy link

Description

I want to use gVisor with my A100 GPU's.
When I follow the instructions at https://gvisor.dev/docs/user_guide/gpu/ and run:

runsc nvproxy list-supported-drivers

I see:

535.161.07
550.54.14
550.54.15
535.104.12
535.129.03
535.154.05

Unfortunately, 535.161.07 is one off from 535.161.08, the version that my GPU's are currently running.
If I run:

sudo runsc --nvproxy --debug --debug-log=/tmp/runsc.log run -bundle / my_container

and cat the error log I see:

I0701 16:17:05.097581  2939814 nvproxy.go:35] NVIDIA driver version: 535.161.08
W0701 16:17:05.097621  2939814 util.go:64] FATAL ERROR: creating loader: registering filesystems: registering nvproxy driver: unsupported Nvidia driver version: 535.161.08
creating loader: registering filesystems: registering nvproxy driver: unsupported Nvidia driver version: 535.161.08
unable to read from the sync descriptor: 0, error EOF

i.e. the .08 vs. 0.07 actually matters.

Presumably the solution is similar to https://github.com/google/gvisor/pull/10181/files

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

Presumably the solution is similar to https://github.com/google/gvisor/pull/10181/files

@dawndrain dawndrain added the type: enhancement New feature or request label Jul 2, 2024
@EtiennePerot
Copy link
Contributor

Note that if the two driver versions are ABI-equivalent, you can set the --nvproxy-driver-version flag to the NVIDIA driver version that gVisor does support and it will override this version-detection code.

@ayushr2
Copy link
Collaborator

ayushr2 commented Jul 3, 2024

As per https://gvisor.dev/docs/user_guide/gpu/#driver-versions, our policy is to add support only for driver versions used by COS, which is used in GKE.

We do have support for 535.161.07. Assuming no breaking changes have occurred between 535.161.07 and 535.161.08, you could try setting runsc flag --nvproxy-driver-version=535.161.07 as Etienne suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: gpu Issue related to sandboxed GPU access type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants