-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Fabric Manager Support #3873
Merged
monirul
merged 1 commit into
bottlerocket-os:develop
from
monirul:fabric-manager-changes
Apr 13, 2024
Merged
Add Fabric Manager Support #3873
monirul
merged 1 commit into
bottlerocket-os:develop
from
monirul:fabric-manager-changes
Apr 13, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bcressey
reviewed
Apr 8, 2024
monirul
force-pushed
the
fabric-manager-changes
branch
from
April 8, 2024 22:42
d6b98e9
to
681d18a
Compare
monirul
changed the title
Add Fabric Manager Support to Bottlerocket
Add Fabric Manager Support
Apr 8, 2024
yeazelm
reviewed
Apr 10, 2024
arnaldo2792
reviewed
Apr 11, 2024
monirul
force-pushed
the
fabric-manager-changes
branch
from
April 12, 2024 05:29
681d18a
to
10a599d
Compare
bcressey
reviewed
Apr 12, 2024
monirul
force-pushed
the
fabric-manager-changes
branch
2 times, most recently
from
April 13, 2024 00:08
5a837f8
to
2e576e2
Compare
bcressey
approved these changes
Apr 13, 2024
Signed-off-by: monirul <[email protected]>
monirul
force-pushed
the
fabric-manager-changes
branch
from
April 13, 2024 00:49
2e576e2
to
e8415e5
Compare
yeazelm
approved these changes
Apr 13, 2024
I have tested the changes with p4d instances. here is the test details.
Output of
Output of nvidia smoke test:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue number: #3278
Closes # #3278
Description of changes:
Bottlerocket currently lacks support for Fabric Manager, which is necessary for utilizing GPUs in p4 and p5 instance types. This pull request introduces support for the Fabric Manager, enhancing Bottlerocket's capabilities to manage GPU resources efficiently and enables customer to use Bottlerocket as container host OS in p4 and p5 instances.
The fabric manager support is added in kernel kmod-5.15 and kmod-6.1. As a result, k8s-1.24+ will have the changes. However, the kmod-5.10 kernel module utilizes the 470 legacy driver, which lacks compatibility with the latest GPUs found in p4 and p5 instances. Therefore, the Fabric Manager updates have not been applied to kmod-5.10, and as such, k8s-1.23 variant will not include Fabric Manager capabilities.
The change is based of bcressey@75eab67 branch.
Testing done:
Testing Summary:
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
could not insert 'nvidia_modeset': No such device
Reason: NVIDIA Tesla K80 supported through the NVIDIA 470.xx Legacy drivers. This bottlerocket variant uses 535 driver.
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
could not insert 'nvidia_modeset': No such device
Reason: NVIDIA Tesla K80 supported through the NVIDIA 470.xx Legacy drivers. This bottlerocket variant uses 535 driver.
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
request to query NVSwitch device information from NVSwitch driver failed with error:WARNING Nothing to do [NV_WARN_NOTHING_TO_DO]
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.