Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds nvidia-fabric-manager package #3820

Conversation

VariableExp0rt
Copy link

Issue number:

Closes #3278

Description of changes:

I have made a fairly naive attempt at resolving the above issue to add nvidia-fabric-manager package to Bottlerocket. In the Fabric Manager User Guide there is a lot of additional information around running as non-root, etc etc, I was not sure whether that was a dealbreaker or not.

I have yet to be able to build an image successfully locally, I saw this issue bottlerocket-os/twoliter#173 from a while ago and seem to be hitting the same error.

I'd appreciate any pointers on whether I have gotten the licensing bits right here, along with any other issues (the specfile didn't work as is, and I had to make a few cosmetic changes to the copying of files etc to the %{buildroot}.

It's probably fairly obvious this is my first attempt at an rpmbuild 😄

Testing done:
Added the package to the most recent k8s variant (aws-k8s-1.29-nvidia).

[nvidia]
spdx-id = "LicensesRef-NVIDIA-Customer-Use"
licenses = [
    {path = "LICENSE", license-url = "https://www.nvidia.com/en-us/drivers/nvidia-license/"},
]
cargo make -e BUILDSYS_ARCH=aarch64 -e BUILDSYS_UPSTREAM_LICENSE_FETCH=true fetch-licenses
cargo make -e BUILDSYS_ARCH=aarch64 -e PACKAGE=nvidia-fabric-manager -e BUILDSYS_VARIANT=aws-k8s-1.29-nvidia build-package
[cargo-make] INFO - cargo make 0.37.10
[cargo-make] INFO - Build File: Makefile.toml
[cargo-make] INFO - Task: build-package
[cargo-make] INFO - Profile: development
[cargo-make] INFO - Running Task: install-twoliter
Found Twoliter v0.0.6 installed.
Skipping installation.
[cargo-make] INFO - Execute Command: "<>/code/bottlerocket/tools/twoliter/twoliter" "--log-level=info" "make" "build-package" "--project-path=<>/code/bottlerocket/Twoliter.toml" "--cargo-home=<>/code/bottlerocket/.cargo" "--"
[2024-03-14T12:42:26Z WARN  twoliter::project] A Release.toml file was found. Release.toml is deprecated. Please remove it from your project.
[cargo-make][1] INFO - Build File: <>/code/bottlerocket/build/tools/Makefile.toml
[cargo-make][1] INFO - Task: build-package
[cargo-make][1] INFO - Profile: development
[cargo-make][1] INFO - Running Task: check-cargo-version
[cargo-make][1] INFO - Running Task: setup
[cargo-make][1] INFO - Running Task: setup-build
[cargo-make][1] INFO - Running Task: fetch-sdk
[cargo-make][1] INFO - Running Task: publish-setup
12:42:27 [INFO] No infra config at '<>/code/bottlerocket/Infra.toml' - using local roles/keys
[cargo-make][1] INFO - Running Task: fetch-toolchain
[cargo-make][1] INFO - Running Task: fetch-sources
[cargo-make][1] INFO - Running Task: fetch-vendored
[cargo-make][1] INFO - Running Task: fetch-licenses
Skipping fetching licenses
[cargo-make][1] INFO - Running Task: build-package
   Compiling nvidia-fabric-manager v0.1.0 (<>/code/bottlerocket/packages/nvidia-fabric-manager)
    Finished dev [optimized] target(s) in 13.80s
[cargo-make][1] INFO - Build Done in 43.17 seconds.
[cargo-make] INFO - Build Done in 43.96 seconds.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

liam.baker added 2 commits March 14, 2024 12:45
Signed-off-by: liam.baker <[email protected]>
@monirul
Copy link
Contributor

monirul commented Mar 15, 2024

Thank you for submitting the pull request. Will take a look at it. Meanwhile, I’m currently looking into two additional PRs that address the fabric manager issue. These may provide some insights that could be beneficial to your work:

  1. 807780a
  2. bcressey@75eab67

@bcressey
Copy link
Contributor

In my view, the best approach is to have the fabric manager components packaged inside the kmod-*-nvidia packages, because the fabric manager binary version has to exactly match the rest of the driver package (kernel and userspace).

My commit is pretty close in terms of packaging, though the driver version is out of date now and would need to be sync'd up. The hard bit is the testing, and seeing whether the nvidia-fabricmanager.cfg I came up with is right and if the service does the right thing on instances that support fabric manager (p4, p5) and instances that don't (g4, g5).

@VariableExp0rt
Copy link
Author

Thanks for the pointer @bcressey, in terms of the change in the second commit in your message above, I can either bring those changes into this branch or let you PR those changes (whichever is easiest to de-duplicate effort). Let me know what works best?

Agreed on the testing of this!

@bcressey
Copy link
Contributor

I can either bring those changes into this branch or let you PR those changes (whichever is easiest to de-duplicate effort). Let me know what works best?

I believe @monirul is working on testing and a PR but I can let him comment. The other TODO is just extending the changes to the kmod-5.10 and kmod-5.15 packages also.

@monirul
Copy link
Contributor

monirul commented Apr 13, 2024

I have successfully implemented and merged a pull request to integrate fabric manager support into Bottlerocket. Due to that, I am marking this pull request as closed.
For reference, here's the merged PR for the fabric manager:
PR #3873

@monirul monirul closed this Apr 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for fabric manager
3 participants