Skip to content

cudatoolkit: prune broken symlinks in postFixup#217322

Merged
samuela merged 1 commit intoNixOS:masterfrom
ConnorBaker:cudatoolkit-prune-broken-symlinks
Feb 23, 2023
Merged

cudatoolkit: prune broken symlinks in postFixup#217322
samuela merged 1 commit intoNixOS:masterfrom
ConnorBaker:cudatoolkit-prune-broken-symlinks

Conversation

@ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Feb 20, 2023

Description of changes

As cudatoolkit is currently written, 11.8 introduces a broken symlink in include (also named include) and in lib (named lib64).

This trips up some consumers, (e.g., it causes the build of tensorflow-gpu to fail), and will be a problem when switching to 11.8 as the default.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 23.05 Release Notes (or backporting 22.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.
Failures

@ofborg ofborg bot added 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Feb 20, 2023
@ConnorBaker
Copy link
Contributor Author

Result of nixpkgs-review pr 217322 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • python310Packages.caffeWithCuda
  • truecrack-cuda
7 packages failed to build:
  • caffeWithCuda
  • cudaPackages.tensorrt
  • python310Packages.tensorflowWithCuda
  • python310Packages.tensorrt
  • python311Packages.jaxlibWithCuda
  • python311Packages.tensorrt
  • xgboostWithCuda
30 packages built:
  • colmapWithCuda
  • cudaPackages.cudatoolkit
  • cudaPackages.cutensor
  • cudaPackages.nccl
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda)
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xpraWithNvenc

@ConnorBaker ConnorBaker force-pushed the cudatoolkit-prune-broken-symlinks branch from 654c9f4 to edd9beb Compare February 20, 2023 17:50
@ofborg ofborg bot requested review from abbradar and nviets February 20, 2023 18:23
Copy link
Contributor

@nviets nviets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved - isn't this change for xgboost already included in #217333?

@ConnorBaker
Copy link
Contributor Author

Approved - isn't this change for xgboost already included in #217333?

I couldn't figure out a way to test with nixpkgs-review besides rebasing on top of the branches containing the changes I needed. Specifying multiple PRs to test seemed to test them sequentially, so I'm stacking them.

If there's a better way to handle PRs which depend on each other, please let me know. My current plan is to continue to rebase this as the PRs it depends on are merged into master.

@ConnorBaker
Copy link
Contributor Author

Result of nixpkgs-review pr 217322 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • python310Packages.caffeWithCuda
  • truecrack-cuda
2 packages failed to build:
  • python310Packages.tensorflowWithCuda
  • python311Packages.tensorrt
35 packages built:
  • caffeWithCuda
  • colmapWithCuda
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cutensor
  • cudaPackages.nccl
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda)
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.tensorrt
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.jaxlibWithCuda
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@ConnorBaker ConnorBaker marked this pull request as ready for review February 20, 2023 20:46
@ConnorBaker
Copy link
Contributor Author

cc @NixOS/cuda-maintainers

As cudatoolkit is currently written, 11.8 introduces a broken symlink in `include` (also named `include`) and in `lib` (named `lib64`).

This trips up some consumers, like `tensorflow-gpu`.
@ConnorBaker ConnorBaker force-pushed the cudatoolkit-prune-broken-symlinks branch from edd9beb to 476de5c Compare February 22, 2023 01:39
@ConnorBaker
Copy link
Contributor Author

Moved the comment out of the script and into Nix as requested.

Also removed the PR stack information from the OP and rebased on master instead since it can be merged independently of the others.

If you do run nixpkgs-review, you may see unrelated errors the PR stack included fixes for:

@ConnorBaker
Copy link
Contributor Author

@samuela would you mind taking a look if you have a chance?

@samuela
Copy link
Member

samuela commented Feb 22, 2023

running a nixpkgs-review run rn just to double check

@samuela
Copy link
Member

samuela commented Feb 22, 2023

btw do TF/JAX build against CUDA 11.8 after this change? it would be great if we could finally upgrade our cudaPackages alias (separate, future PR)!

@ConnorBaker
Copy link
Contributor Author

ConnorBaker commented Feb 22, 2023

I believe JAX does, though I remember Tensorflow failing at the very end of its very, very long build with an error about GLIBCXX. I haven't looked too much at it since it's been a constant failure across all of my PRs, but I wonder if it has something to do with compiler versions gone awry? Might need to make sure the derivation is being built cudatoolkit.cc and not whatever stdenv provides.

EDIT: the error is linked in the OP: https://gist.github.com/ConnorBaker/06ceb965a933ae0659dfce58f9a8c654#file-kv64kshayamrvh5jsfzkx9hzi2dsq81l-tensorflow-gpu-2-11-0-drv-log-L288

ImportError: /nix/store/ps7an26cirhh0xy1wrlc2icvfhrd39cj-gcc-11.3.0-lib/lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /nix/store/jlx2nfpi73sjb0f3096cly5ik8arw9k9-icu4c-72.1/lib/libicuuc.so.72)

EDIT2: Seems like this has been reported elsewhere #216361.

@samuela
Copy link
Member

samuela commented Feb 22, 2023

ugh yeah TF is the worst to build... well that's not due to anything in this change. Seems like progress since IIRC before TF was failing at the beginning of the build with the same error that JAX had. So that's great news!

@ConnorBaker
Copy link
Contributor Author

ConnorBaker commented Feb 23, 2023

All of the failures I see from running nixpkgs-review are expected.

Result of nixpkgs-review pr 217322 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • python310Packages.caffeWithCuda
  • truecrack-cuda
4 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • python310Packages.tensorflowWithCuda
  • python310Packages.tensorrt
  • python311Packages.tensorrt
33 packages built:
  • caffeWithCuda
  • colmapWithCuda
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cutensor
  • cudaPackages.nccl
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda)
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.jaxlibWithCuda
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Member

samuela commented Feb 23, 2023

Result of nixpkgs-review pr 217322 run on x86_64-linux 1

2 packages marked as broken and skipped:
  • python310Packages.caffeWithCuda
  • truecrack-cuda
4 packages failed to build:
  • cudaPackages.tensorrt (cudaPackages.tensorrt_8_4_0)
  • python310Packages.tensorflowWithCuda
  • python310Packages.tensorrt
  • python311Packages.tensorrt
33 packages built:
  • caffeWithCuda
  • colmapWithCuda
  • cudatoolkit (cudaPackages.cudatoolkit ,cudatoolkit_11)
  • cudaPackages.cutensor
  • cudaPackages.nccl
  • forge
  • gpu-burn
  • gpu-screen-recorder
  • gpu-screen-recorder-gtk
  • gromacsCudaMpi
  • gwe
  • hip-nvidia
  • katagoWithCuda
  • librealsenseWithCuda
  • magma (magma-cuda)
  • nvtop
  • nvtop-nvidia
  • python310Packages.TheanoWithCuda
  • python310Packages.cupy
  • python310Packages.jaxlibWithCuda
  • python310Packages.numbaWithCuda
  • python310Packages.pycuda
  • python310Packages.pynvml
  • python310Packages.pyrealsense2WithCuda
  • python310Packages.torchWithCuda
  • python311Packages.TheanoWithCuda
  • python311Packages.cupy
  • python311Packages.jaxlibWithCuda
  • python311Packages.pycuda
  • python311Packages.pynvml
  • python311Packages.pyrealsense2WithCuda
  • xgboostWithCuda
  • xpraWithNvenc

@samuela
Copy link
Member

samuela commented Feb 23, 2023

LGTM thanks so much @ConnorBaker !

@samuela samuela merged commit dc3ac9d into NixOS:master Feb 23, 2023
@ConnorBaker ConnorBaker deleted the cudatoolkit-prune-broken-symlinks branch February 24, 2023 01:25
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/tweag-nix-dev-update-45/26397/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants