Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NAS-129905 / 24.10 / Mark GPU as critical if it has devices which are in an iommu group which has a critical device #13976

Merged
merged 1 commit into from
Jul 12, 2024

Conversation

Qubad786
Copy link
Contributor

@Qubad786 Qubad786 commented Jul 4, 2024

Problem

When we want to passthrough a GPU, what needs to happen is that all the IOMMU groups with all their devices including ones which are not GPU related in which the GPU's devices are placed need to be isolated.

Currently we had validation in place where we didn't allow to isolate a GPU if any of it's devices were critical for the system like CPU/memory etc - however this can result in a scenario where the following happens:

A GPU having a device which is in an IOMMU group which has a critical device, so when that GPU is going to be configured for passthrough and an attempt to start the VM is going to be made, that will crash.

Solution

Properly mark a GPU as critical covering the case discussed above so we don't allow isolating such GPU's in the first place. Secondly a reasonable critical reason has been added as well which will clarify why the GPU has been marked as critical.

@bugclerk
Copy link
Contributor

bugclerk commented Jul 4, 2024

@bugclerk bugclerk changed the title Mark GPU as critical if it has devices which are in an iommu group which has a critical device NAS-129905 / 24.10 / Mark GPU as critical if it has devices which are in an iommu group which has a critical device Jul 4, 2024
@Qubad786 Qubad786 force-pushed the mrehan/improve-gpu-choices branch from 64b3394 to 300d1f3 Compare July 4, 2024 14:44
@Qubad786 Qubad786 removed the WIP label Jul 4, 2024
@Qubad786 Qubad786 requested a review from a team July 4, 2024 14:54
@Qubad786 Qubad786 requested a review from sonicaj July 12, 2024 05:28
@Qubad786 Qubad786 merged commit 2802a46 into master Jul 12, 2024
2 of 3 checks passed
@Qubad786 Qubad786 deleted the mrehan/improve-gpu-choices branch July 12, 2024 14:53
@bugclerk
Copy link
Contributor

This PR has been merged and conversations have been locked.
If you would like to discuss more about this issue please use our forums or raise a Jira ticket.

@truenas truenas locked as resolved and limited conversation to collaborators Jul 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants