Skip to content

cudatoolkit: enable build on aarch64-linux for versions 11.0+#158350

Closed
prusnak wants to merge 2 commits intoNixOS:masterfrom
prusnak:cuda-aarch64
Closed

cudatoolkit: enable build on aarch64-linux for versions 11.0+#158350
prusnak wants to merge 2 commits intoNixOS:masterfrom
prusnak:cuda-aarch64

Conversation

@prusnak
Copy link
Member

@prusnak prusnak commented Feb 6, 2022

Motivation for this change
  • nvidia provides aarch64-linux (SBSA) builds of cudatoolkit since version 11, let's enable these in nixpkgs
  • the code might be refactored in the future to reduce code-reuse, but I want to get this PR in first and do a refactor later
Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@ofborg ofborg bot added 8.has: clean-up This PR removes packages or removes other cruft 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux. 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. labels Feb 6, 2022
@ofborg ofborg bot removed the 10.rebuild-linux: 1 This PR causes 1 package to rebuild on Linux. label Feb 6, 2022
@prusnak
Copy link
Member Author

prusnak commented Feb 11, 2022

What do you think? @dguibert @samuela @danieldk

@samuela
Copy link
Member

samuela commented Feb 11, 2022

It looks like there's actually two different things going on in this PR:

  1. aarch64-darwin support added
  2. cudatoolkit_11_6 is added

I would actually prefer to have these be two separate PRs, but if other maintainers are ok with it then I'll go along.

Have you built these on aarch64-linux? The issue description indicates that the changes haven't been built/tested yet. Unfortunately I don't have an aarch64-linux system to test on.

@mschwaig
Copy link
Member

I just tested these changes in the following way:

If I run

NIXPKGS_ALLOW_UNFREE=1 nix build --impure github:prusnak/nixpkgs/cuda-aarch64#legacyPackages.aarch64-linux.cudatoolkit_11_6

I get

@nix { "action": "setPhase", "phase": "unpackPhase" }
unpacking sources
Creating directory pkg
@nix { "action": "setPhase", "phase": "patchPhase" }
patching sources
@nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" }
updateAutotoolsGnuConfigScriptsPhase
@nix { "action": "setPhase", "phase": "configurePhase" }
configuring
no configure script, doing nothing
@nix { "action": "setPhase", "phase": "buildPhase" }
building
no Makefile, doing nothing
@nix { "action": "setPhase", "phase": "glibPreInstallPhase" }
glibPreInstallPhase
@nix { "action": "setPhase", "phase": "installPhase" }
installing
renamed '/nix/store/a6xivd1blbm9bi1skblrzsa1b5g3hr1m-cudatoolkit-11.6.0/lib64/libcudart.so' -> '/nix/store/bna6nh45plmdnq75alnnw2hnixmac7wx-cudatoolkit-11.6.0-lib/lib/libcudart.so'
renamed '/nix/store/a6xivd1blbm9bi1skblrzsa1b5g3hr1m-cudatoolkit-11.6.0/lib64/libcudart.so.11.0' -> '/nix/store/bna6nh45plmdnq75alnnw2hnixmac7wx-cudatoolkit-11.6.0-lib/lib/libcudart.so.11.0'
renamed '/nix/store/a6xivd1blbm9bi1skblrzsa1b5g3hr1m-cudatoolkit-11.6.0/lib64/libcudart.so.11.6.55' -> '/nix/store/bna6nh45plmdnq75alnnw2hnixmac7wx-cudatoolkit-11.6.0-lib/lib/libcudart.so.11.6.55'
renamed '/nix/store/a6xivd1blbm9bi1skblrzsa1b5g3hr1m-cudatoolkit-11.6.0/lib64/libcudart_static.a' -> '/nix/store/bna6nh45plmdnq75alnnw2hnixmac7wx-cudatoolkit-11.6.0-lib/lib/libcudart_static.a'

Builder called die: Cannot wrap '/nix/store/a6xivd1blbm9bi1skblrzsa1b5g3hr1m-cudatoolkit-11.6.0/bin/nvprof' because it is not an executable file
Backtrace:
7 assertExecutable /nix/store/1fdqj6rspaish57gmvwfxqkxzb5ljb24-hook/nix-support/setup-hook
145 wrapProgram /nix/store/1fdqj6rspaish57gmvwfxqkxzb5ljb24-hook/nix-support/setup-hook
1417 genericBuild /nix/store/x2iwjwrj213xdj9qfn5f43ds684h4i20-stdenv-linux/setup
2 main /nix/store/9krlzvny65gdc8s7kpb6lkx8cd02c25b-default-builder.sh

I tested this on a on an x86_64 NIxOS host where the system configuration includes

boot.binfmt.emulatedSystems = [ "aarch64-linux" ];

to allow the build host to emulate the target environment. I get the same error building natively on an Nvidia Jetson Nano (aaarch64 with Ubuntu).

This is also not just about cudatoolkit_11_6, since cudatoolkit_11_4 fails to build as well. It is an aarch64-specific issue, since building cudatoolkit_11_6 for x86_64 succeeds.

I also think that this PR should stick to just adding aarch64-linux. cudatoolkit 11.6 has already landed via another PR. I have not looked at how this is impacted by #167016.

@samuela
Copy link
Member

samuela commented Apr 21, 2022

This is outdated after the whole redist conversion. Before testing we'll need to rebase onto something more recent. Are there aarch64-linux redist packages?

cc @NixOS/cuda-maintainers

@prusnak
Copy link
Member Author

prusnak commented Apr 21, 2022

I am closing this as I won't be needing this anymore and won't have time to rebase this on top of master.

@prusnak prusnak closed this Apr 21, 2022
@prusnak prusnak deleted the cuda-aarch64 branch April 21, 2022 20:45
@mschwaig
Copy link
Member

I think the linux-sbsa attributes in those redistrib[version].json files would be for aarch64-linux.
Sadly it looks like my hardware (Jetson Nano) won't be supported with those 11.x builds anyways.

Maybe someone with supported hardware will pick this topic back up in the future.

@prusnak
Copy link
Member Author

prusnak commented Apr 21, 2022

I think the linux-sbsa attributes in those redistrib[version].json files would be for aarch64-linux.

Yes, linux-sbsa is a fancy name for aarch64-linux, see https://en.wikipedia.org/wiki/Server_Base_System_Architecture

@kmittman
Copy link

Hi @mschwaig , @prusnak , @samuela
I don't have a good answer other than "it's complicated". The CUDA toolkit builds labeled for linux-aarch64 (L4T / Jetson) builds versus linux-sbsa (ARM64 server) are compiled with different options enabled/disabled. Currently, what is published in redistrib manifests is meant for servers, not Tegra boards. Hope that helps.

@mikepurvis
Copy link
Contributor

I have access to a Jetson board which would benefit from this work. Would love to see it move forward.

@SomeoneSerge
Copy link
Contributor

I have access to a Jetson board which would benefit from this work. Would love to see it move forward.

@mikepurvis I think one would have to put together a new PR, either using linux-sbsa tarballs, or maybe using linux-aarch64 deb's. I'm still interested too, but haven't the capacity to act

FYI: matrix could be one more place to speculate on how to about this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

8.has: clean-up This PR removes packages or removes other cruft 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 1-10 This PR causes between 1 and 10 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants