cudaPackages: add jetson support by SomeoneSerge · Pull Request #242050 · NixOS/nixpkgs

SomeoneSerge · 2023-07-07T12:01:56Z

Description of changes

It's a pretty small patch recovering jetson support as originally attempted in #194791, hopefully can be merged without any extra effort once ofborg goes through. I successfully built cuda_nvcc on an nvidia jetson host. One prerequisite for getting more complex stuff to work (like pytorch) would be #233581

CC #158350 @NixOS/cuda-maintainers

Things done

Built on platform(s)
- x86_64-linux
- aarch64-linux
- x86_64-darwin
- aarch64-darwin

❯ nix eval .#cudaPackages.cuda_nvcc.manifestAttribute
"linux-x86_64"
❯ nix eval .#cudaPackages.cuda_nvcc.meta.platforms
[ "aarch64-linux" "powerpc64le-linux" "x86_64-linux" ]
❯ nix eval .#pkgsCross.aarch64-multiplatform.cudaPackages.cuda_nvcc.manifestAttribute
"linux-aarch64"
❯ nix eval .#pkgsCross.aarch64-multiplatform.cudaPackages.cuda_nvcc.meta.platforms
[ "aarch64-linux" "powerpc64le-linux" "x86_64-linux" ]

What doesn't work

Idk how to use cross, but I tried this on an x86-64 host and it failed:

❯ NIXPKGS_ALLOW_UNFREE=1 nix build --impure .#pkgsCross.aarch64-multiplatform.cudaPackages.cuda_nvcc
...
       > error: auto-patchelf could not satisfy dependency libstdc++.so.6 wanted by /nix/store/9kv83k8nnnaxhv7wy2h27k15ww61xc75-cuda_nvcc-aarch64-unknown-linux-gnu-11.8.89/bin/__nvcc_device_query
       > error: auto-patchelf could not satisfy dependency libgcc_s.so.1 wanted by /nix/store/9kv83k8nnnaxhv7wy2h27k15ww61xc75-cuda_nvcc-aarch64-unknown-linux-gnu-11.8.89/bin/__nvcc_device_query
...
       > auto-patchelf failed to find all the required dependencies.
...
       For full logs, run 'nix log /nix/store/rmb71904ksdm8sbkn8wscnrzihvp5nkb-cuda_nvcc-aarch64-unknown-linux-gnu-11.8.89.drv'.

It's a shot in the dark, but I think this might be related to #226165

SomeoneSerge · 2023-07-07T12:07:01Z

Result of nixpkgs-review pr 242050 --extra-nixpkgs-config '{ cudaCapabilities = [ "8.6" ]; cudaSupport = true; }' run on x86_64-linux 1

ConnorBaker · 2023-07-07T15:41:01Z

@SomeoneSerge I ended up adding support for multiple arch in a PR I have brewing: #240498

As you’ve noticed, it’s a bit of trouble currently to figure out whether to choose the Linux4Tegra or SBSA packages for aarch64. Is there a different double we have for Jetson specifically?

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix

SomeoneSerge · 2023-07-07T22:37:13Z

@ConnorBaker, I'm still unsure what exactly the sbsa builds are for, but I do know that the builds marked linux-aarch64 are meant for jetsons, which is why in this PR I give those the priority

RE: #240498

Great! I guess the question is how long is it going to take you to merge that PR v. merge this and rebase yours? I tried to limit mine to enabling jetson support specifically with the purpose of avoiding collisions, but I guess it's not that simple 🙃

ConnorBaker · 2023-07-08T18:39:41Z

@SomeoneSerge I definitely need to take out the multi-arch stuff from that PR; it's a can of worms. Here's a summary of what I've learned:

Both the Linux 4 Tegra (Jetson devices, NVIDIA redist manifests refer to this as linux-aarch64) and SBSA (server-grade ARM setups, referred to as linux-sbsa) are effectively aarch64-linux. However, the packages are NOT interchangeable, as I understand it -- they're built with different configurations and target different hardware.

That means if we want to support both, we need to:

Ensure cudaFlags verifies that capabilities for Jetson and non-Jetson devices are not mixed
Choose the redist package to use depending on the capabilities present

For the first point: Unlike other GPUs which can be slotted into both x86_64 or SBSA (ARM) servers, Jetson capabilities are tied to aarch64-linux. If Jetson capabilities are present in config.cudaCapabilities, our hostPlatform.system must be aarch64-linux. (That is, we must either be building on an aarch64-linux device, in which case our buildPlatform and hostPlatform are the same, or we are cross-compiling to aarch64-linux from a different platform.) Effectively: the presence of any Jetson capabilities in config.cudaCapabilities necessitates that we are both building for aarch64-linux and all capabilities in config.cudaCapabilities are Jetson capabilities.

For the second point: we must know whether we are building for Jetson so we can correctly decide whether to use the linux-aarch64 or linux-sbsa redist package when our hostPlatform is aarch64-linux.

SomeoneSerge · 2023-07-08T19:05:54Z

jetson capabilities are tied to aarch64-linux

Oh, I see. So we might want to

introduce a separate package set, e.g. cudaPackages_jetson, where cudaFlags only contain sm_62, sm_72, and sm_87,
and have the cudaPackages attribute default to linux-sbsa for aarch64 devices

we need to ... Ensure cudaFlags verifies that capabilities for Jetson and non-Jetson devices are not mixed

Is it that we need this, or is it something "nice to have"?

Unlike other GPUs which can be slotted into both x86_64 or SBSA (ARM) servers

I think I finally get it, thanks! SBSA binaries are for when we wire a pci-e gpu to a generic aarch64 host?

ConnorBaker · 2023-07-08T19:26:11Z

Is it that we need this, or is it something "nice to have"?

If we continue to use only a single cudaPackages package set and choose the redistributable based on selected capabilities, I'm of the opinion that it is something that we need to have. Consider the case where we have mixed capabilities and

the hostPlatform.system is aarch64-linux:
- We'll be using the Jetson redistributables, which do not support other capabilities
the hostPlatform.system is not aarch64-linux:
- We'll be using non-Jetson redistributables, which do not support Jetson capabilities

If we introduce a cudaPackages_jetson package, what would you envision happening with cudaFlags? Would it be the same as cudaPackages.cudaFlags, but only allow config.cudaCapabilities to contain capabilities for Jetson devices? If so, my understand is that using packages from cudaPackages_jetson would trigger a check to make sure only Jetson capabilities are requested by config.cudaCapabilities.

I think I finally get it, thanks! SBSA binaries are for when we wire a pci-e gpu to a generic aarch64 host?

Yes -- apparently SBSA is the name of a specification for ARM-based servers https://en.wikipedia.org/wiki/Server_Base_System_Architecture.

SomeoneSerge · 2023-07-08T20:14:30Z

If we introduce a cudaPackages_jetson package, what would you envision happening with cudaFlags

I pushed an example in the last commit: the idea would be just to override cudaFlags for the package set:

❯ nix eval .#pkgsCross.aarch64-multiplatform.cudaPackages.cuda_nvcc.manifestAttribute
"linux-sbsa"
❯ nix eval .#pkgsCross.aarch64-multiplatform.cudaPackages_jetson.cuda_nvcc.manifestAttribute
"linux-aarch64"

Instead of a hard-coded list, we could form one from gpus.nix. I looked into this, we might want to replace dontDefaultAfter with something like jetsonCompatible and jetsonOnly?

Jetson device owners may overlay their nixpkgs with cudaPackages = final.cudaPackages_jetson and get their opencv and pytorch running

What's maybe embarrassing is that cudaPackages_jetson would ignore user-specified config.cudaCapabilities

Alternatively,

we could introduce a config.jetson :: bool option and keep, as you point out, a single cudaPackages set

ConnorBaker · 2023-07-08T20:22:47Z

Instead of a hard-coded list, we could form one from gpus.nix. I looked into this, we might want to replace dontDefaultAfter with something like jetsonCompatible and jetsonOnly?

Take a look at the changes I made to

gpus.nix: https://github.com/NixOS/nixpkgs/pull/240498/files#diff-57a33fc57cb4d7f307b394db67c977419a609461e3f14023c66272ab70f46639
- added isJetson flag
redist/extension.nix: https://github.com/NixOS/nixpkgs/pull/240498/files#diff-3ff6fef61cbe9c56c8355bd97e17de5989cf60ff96668d44bf96c73c025d627dR82-R126
- added logic to swap out redistributable package used when Jetson capabilities are requested
flags.nix: https://github.com/NixOS/nixpkgs/pull/240498/files#diff-c5ba40fc7e3e3e088673e4224e920c837db69cf9e0da10eb7fe3b4a8647e1956
- sanity-checking for Jetson capabilities; additional test cases

Overall, those changes allow us to build the user-requested Jetson capabilities. (They must be requested by the user through config.cudaCapabilities though, as Jetson capabilities are excluded by the isDefault predicate in flags.nix.)

SomeoneSerge · 2023-07-10T16:53:22Z

pkgs/top-level/all-packages.nix

This would hinder future attempts at cross-compilation:

❯ nix eval -f cross-jetson.nix cudaPackages.cuda_nvcc.manifestAttribute "linux-aarch64" ❯ nix eval -f cross-jetson.nix buildPackages.cudaPackages.cuda_nvcc.manifestAttribute "linux-aarch64"

Expected: "linux-aarch64", "linux-x86_64"

Consequence (watch out, I could be wrong about everything):

We should always choose a tag (linux-x86_64, linux-aarch64, linux-sbsa) that is compatible with the current hostPlatform.system

For the CUDA libraries that come with PTX text (e.g. libcublas) we should choose, among host-compatible tags, one that ships all of the requested cuda capabilities. If there isn't one, we should mark the package broken. We should not mark nvcc as broken

Details

# cross-jetson.nix (import ./. { config.allowUnfree = true; config.cudaSupport = true; config.cudaCapabilities = [ "7.2" ]; # config.cudaCapabilities = [ "8.6" ]; overlays = [ (final: prev: { cudaPackages = prev.cudaPackages_jetson; }) ]; }).pkgsCross.aarch64-multiplatfor

PoC: SomeoneSerge#4

SomeoneSerge · 2023-12-06T11:59:49Z

Superseded by #256324

cudaPackages: add jetson support

3f5805d

SomeoneSerge force-pushed the cudaPackages-jetson branch from 1c84b91 to 3f5805d Compare July 7, 2023 13:11

ofborg bot added the 8.has: package (new) This PR adds a new package label Jul 7, 2023

ofborg bot requested review from ConnorBaker and samuela July 7, 2023 13:37

ofborg bot added 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels Jul 7, 2023

SomeoneSerge added backport release-23.05 6.topic: cuda Parallel computing platform and API labels Jul 7, 2023

samuela reviewed Jul 7, 2023

View reviewed changes

pkgs/development/compilers/cudatoolkit/redist/build-cuda-redist-package.nix Outdated Show resolved Hide resolved

cudaPackages: avoid repeated let-ins

617f84e

ofborg bot requested a review from samuela July 8, 2023 18:57

SomeoneSerge commented Jul 10, 2023

View reviewed changes

cudaPackages_jetson: init

679ad3d

SomeoneSerge force-pushed the cudaPackages-jetson branch from b7685fd to 679ad3d Compare July 10, 2023 17:37

SomeoneSerge closed this Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cudaPackages: add jetson support#242050

cudaPackages: add jetson support#242050
SomeoneSerge wants to merge 3 commits intoNixOS:masterfrom
SomeoneSerge:cudaPackages-jetson

SomeoneSerge commented Jul 7, 2023 •

edited

Loading

Uh oh!

SomeoneSerge commented Jul 7, 2023

Uh oh!

ConnorBaker commented Jul 7, 2023

Uh oh!

Uh oh!

SomeoneSerge commented Jul 7, 2023

Uh oh!

ConnorBaker commented Jul 8, 2023

Uh oh!

SomeoneSerge commented Jul 8, 2023

Uh oh!

ConnorBaker commented Jul 8, 2023 •

edited

Loading

Uh oh!

SomeoneSerge commented Jul 8, 2023 •

edited

Loading

Uh oh!

ConnorBaker commented Jul 8, 2023

Uh oh!

SomeoneSerge Jul 10, 2023 •

edited

Loading

Uh oh!

SomeoneSerge commented Dec 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

SomeoneSerge commented Jul 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Things done

What doesn't work

Uh oh!

SomeoneSerge commented Jul 7, 2023

Uh oh!

ConnorBaker commented Jul 7, 2023

Uh oh!

Uh oh!

SomeoneSerge commented Jul 7, 2023

Uh oh!

ConnorBaker commented Jul 8, 2023

Uh oh!

SomeoneSerge commented Jul 8, 2023

Uh oh!

ConnorBaker commented Jul 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SomeoneSerge commented Jul 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Alternatively,

Uh oh!

ConnorBaker commented Jul 8, 2023

Uh oh!

SomeoneSerge Jul 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SomeoneSerge commented Dec 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SomeoneSerge commented Jul 7, 2023 •

edited

Loading

ConnorBaker commented Jul 8, 2023 •

edited

Loading

SomeoneSerge commented Jul 8, 2023 •

edited

Loading

SomeoneSerge Jul 10, 2023 •

edited

Loading