Skip to content

cudnn: 8.3.0 -> 8.3.2#164338

Merged
samuela merged 3 commits intoNixOS:masterfrom
samuela:samuela/cudnn
Mar 17, 2022
Merged

cudnn: 8.3.0 -> 8.3.2#164338
samuela merged 3 commits intoNixOS:masterfrom
samuela:samuela/cudnn

Conversation

@samuela
Copy link
Member

@samuela samuela commented Mar 16, 2022

Description of changes

Update cuDNN 8.3 version. Refactor the cuDNN derivations to be clearer and enforce CUDA version constraints.

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.05 Release Notes (or backporting 21.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

samuela added 2 commits March 16, 2022 00:50
Update cuDNN 8.3 version. Refactor the cuDNN derivations to be clearer
and enforce CUDA version constraints.
@samuela
Copy link
Member Author

samuela commented Mar 16, 2022

cc @NixOS/cuda-maintainers

@samuela
Copy link
Member Author

samuela commented Mar 16, 2022

I'll go ahead and merge tomorrow unless anyone objects.

cudnn_8_3_cudatoolkit_11;

cudnn = cudnn_7_6_cudatoolkit_10;
# TODO(samuela): This is old and should be upgraded to 8.3 at some point.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's too old, why do we keep it as a default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because no one's gotten around to updating it yet. But that would be great for a follow-up PR!

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/announcing-the-nixos-cuda-maintainers-team-and-a-call-for-maintainers/18074/10

@ofborg ofborg bot added 8.has: clean-up This PR removes packages or removes other cruft 8.has: package (new) This PR adds a new package 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Mar 17, 2022
@SomeoneSerge
Copy link
Contributor

Nixpkgs also packages cudatoolkit-11.6. Archlinux allows installing cudnn-8.3.2 with it and it seems to work there:

https://archlinux.org/packages/community/x86_64/cudnn/
Dependencies (1)
cuda>=11.5

Which attribute should the users of cudatoolkit-11.6 use for cudnn?

@SomeoneSerge
Copy link
Contributor

Some package set maintainers may complain about the effort it takes to update and validate urls and hashes, but I think mkSrc is a step in good direction. This is because we can follow up with discarding this confusing cudnn_${cudnn-ver}_cudatoolkit_${cuda-ver} pattern and instead have a single overridable cudnn that chooses the sources and does all the checks

@samuela
Copy link
Member Author

samuela commented Mar 17, 2022

Which attribute should the users of cudatoolkit-11.6 use for cudnn?

Sadly, none. I'm basing this off of NVIDIA's official version constraints. There's sort of a policy question here: How strictly should we track upstream's version constraints? As you note, it is possible to smudge those constraints in certain scenarios and it "seems to work." My personal inclination atm is to wait for a new cuDNN release that officially supports CUDA 11.6 because

  • Diverging from the official version constraints mean that the responsibility of testing cuDNN/CUDA combos falls on us, as opposed to upstream. That increases our maintenance burden and makes it that much more difficult when debugging issues, eg "Is X failing bc of this cuDNN/CUDA combo, or just bc the code is broken?". Reporting issues to NVIDIA upstream is also a lot easier if it occurs within their stated constraints.
  • Users can still override the official version constraints if they so desire. The package will still evaluate, just with meta.broken set. And that can be overridden as well with only a little fuss.

Some package set maintainers may complain about the effort it takes to update and validate urls and hashes, but I think mkSrc is a step in good direction. This is because we can follow up with discarding this confusing cudnn_${cudnn-ver}cudatoolkit${cuda-ver} pattern and instead have a single overridable cudnn that chooses the sources and does all the checks

Thanks! Yeah, I'm hoping that this is a step in the right direction. Dealing with these version constraints is a huge mental burden normally so it's nice to have the system help you out here and there.

@samuela samuela merged commit 7c6eaf9 into NixOS:master Mar 17, 2022
@samuela samuela deleted the samuela/cudnn branch March 17, 2022 20:10
@SomeoneSerge
Copy link
Contributor

Sadly, none

Then we can clarify the cudnn-cudatoolkit situation in the wiki (st people don't just feel they're on their own), and later work out the UX of overriding cuda version

@samuela
Copy link
Member Author

samuela commented Mar 17, 2022

Yeah, good idea. This would be a great bit of info for the wiki page.

@SomeoneSerge
Copy link
Contributor

I haven't edited the main page yet, but I added comments in the talk page: https://nixos.wiki/wiki/Talk:Nvidia#cudnn


cudnn = cudnn_7_6_cudatoolkit_10;
# TODO(samuela): This is old and should be upgraded to 8.3 at some point.
cudnn = cudnn_7_6_cudatoolkit_10_1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have to be cudnn_8_3_cudatoolkit_10 in order to be compatible with the currently packaged cudatoolkit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, yes. The issue is that cuDNN 7.6.5 doesn't actually support CUDA 10.2, the current cudatoolkit version. And I didn't want to upgrade cuDNN past 7.6.5 here in the interest of keeping this PR as small as possible. The solution is to upgrade cudnn to 8.3 which supports CUDA 10.2-11.5. (Looks like you're already on that, great!)

In the future, it think it would be nice to explore a solution like cudatoolkit_11.withPackages similar to python3.withPackages, but we have to move one step at a time to get the ball rolling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

8.has: clean-up This PR removes packages or removes other cruft 8.has: package (new) This PR adds a new package 10.rebuild-darwin: 1-10 This PR causes between 1 and 10 packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants