python3Packages.pytorchWithCuda: consistent cuda #166734

SomeoneSerge · 2022-04-01T00:21:57Z

Pytorch has been using (through unversioned attributes) cudatoolkit=10.2 (10.1 in nixpkgs-unstable)
and cudnn_7_6_cudatoolkit_10_1; the older 10.1 took priority; pytorch
1.10 requires at least 10.2, which lead to a build failure: #166403

Description of changes

Calling pytorch with consistent cudnn and cudatoolkit versions.
Choosing to use cudnn_8_3_cudatoolkit_11 (cuda 11.5, atm) because this is what I expect to be the new value of the unversioned cuda*-attributes after cudnn: 7.6.5 -> 8.3.2 #166533 lands (we might be able to re-use the cache later)

This is a hotfix for pytorch to stay buildable on nixpkgs-unstable.
The longer-term solution is #166533.
This causes an additional magma build, but (a) we're going to provide cache, (b) with #166533 the global magma will be using the same version of cuda and we can get rid of the ad hoc fix we introduce here
In future, we'll want to remove these custom arguments, because they make overriding (e.g. global cudnn/cuda) harder

Note that this change is "local" in the sense that it should only affect the "cuda" world, not built by hydra, but tested by @NixOS/cuda-maintainers. This might affect cuda-enabled pytorch on darwin, because we currently do not have any capabilities to test darwin

~~This can be merged if 113 has fewer failures than 97, in particular pytorch~~ This should fix the configurePhase, but the later build, apparently, is going to fail: #161843. That should be addressed in a separate PR. Or maybe I should call it off and wait for the cudnn PR to get merged - I'll sleep on this thought and wait for the build

Things done

Built on platform(s)
- x86_64-linux (waiting)
Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
Tested basic functionality of all binary files (usually in ./result/bin/)
22.05 Release Notes (or backporting 21.11 Release notes)
- (Package updates) Added a release notes entry if the change is major or breaking
Fits CONTRIBUTING.md.

cc @NixOS/cuda-maintainers
cc @tscholak about cuda-enabled pytorch on darwin

pytorch has been using (through unversioned attributes) cudatoolkit=10.2 and cudnn_7_6_cudatoolkit_10_1; the older 10.1 took priority; pytorch 1.10 requires at least 10.2

samuela · 2022-04-01T02:11:04Z

I was just about to submit a PR with

  pytorch = callPackage ../development/python-modules/pytorch {
    cudaSupport = pkgs.config.cudaSupport or false;

    # TODO: next time pytorch is updated (to 1.11.0, currently in staging as of
    # 2022-03-31), make the following changes:

    # -> cudatoolk_11
    cudatoolkit = pkgs.cudatoolkit_10;

    # -> cudnn_8_3_cudatoolkit_11
    cudnn = pkgs.cudnn_8_1_cudatoolkit_10;

    # -> cutensor_cudatoolkit_11 (cutensor is a new dependency in v1.11.0)
    # cutensor = pkgs.cutensor_cudatoolkit_11;

    # -> setting a custom magma should be unnecessary with v1.11.0
    magma = pkgs.magma.override { cudatoolkit = pkgs.cudatoolkit_10; };

    # -> nccl_cudatoolkit_11
    nccl = pkgs.nccl.override { cudatoolkit = pkgs.cudatoolkit_10; };
  };

which also builds successfully and has the advantage of only relying on a single cudatoolkit version. I'm happy to merge either solution as this will all be obviated once staging gets merged into master.

samuela · 2022-04-01T02:14:46Z

pkgs/top-level/python-packages.nix

+      cudnn = pkgs.cudnn_8_3_cudatoolkit_11;
+      cudatoolkit = cudnn.cudatoolkit;


IIUC pkgs.cudnn_8_3_cudatoolkit_11 will pull in cudatoolkit_11_5 and cudnn.cudatoolkit will pull in cudatoolkit_10_1. Just as a heads up.

This is still strictly better than broken, but worth noting that it still does pull in multiple CUDA versions AFAIU

Wait, what o_0

IIUC cudnn currently points to cudnn_7_6_cudatoolkit_10_1

Check out the diff in #166784

oh wait... maybe I'm being an idiot. I guess this is like a let-rec binding so cudnn refers to the definition just prior. ok in that case nvm!

😓 I thought for a moment the same, but

nix-repl> pkgs.cudnn_8_3_cudatoolkit_11.cudatoolkit.version "11.5.0"

SomeoneSerge · 2022-04-13T13:28:53Z

Fixed with #167016

python3Packages.pytorchWithCuda: consistent cuda

6f56f69

pytorch has been using (through unversioned attributes) cudatoolkit=10.2 and cudnn_7_6_cudatoolkit_10_1; the older 10.1 took priority; pytorch 1.10 requires at least 10.2

SomeoneSerge requested review from FRidh and jonringer as code owners April 1, 2022 00:21

github-actions bot added the 6.topic: python label Apr 1, 2022

ofborg bot added 10.rebuild-darwin: 0 10.rebuild-linux: 1-10 10.rebuild-linux: 1 labels Apr 1, 2022

SomeoneSerge assigned samuela Apr 1, 2022

samuela reviewed Apr 1, 2022

View reviewed changes

samuela mentioned this pull request Apr 1, 2022

fix pytorchWithCuda, fix cupy, upgrade cudnn #166784

Merged

13 tasks

python3Packages.pytorch: nccl with same cuda

2bfcff9

SomeoneSerge force-pushed the hotfix-pytorch-wrong-cuda branch from 401386f to 2bfcff9 Compare April 1, 2022 08:49

SomeoneSerge closed this Apr 13, 2022

FRidh added the 6.topic: cuda label Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

python3Packages.pytorchWithCuda: consistent cuda #166734

python3Packages.pytorchWithCuda: consistent cuda #166734

SomeoneSerge commented Apr 1, 2022 •

edited

Loading

samuela commented Apr 1, 2022

samuela Apr 1, 2022

SomeoneSerge Apr 1, 2022

samuela Apr 1, 2022

samuela Apr 1, 2022

samuela Apr 1, 2022

SomeoneSerge Apr 1, 2022

SomeoneSerge commented Apr 13, 2022

		cudnn = pkgs.cudnn_8_3_cudatoolkit_11;
		cudatoolkit = cudnn.cudatoolkit;

python3Packages.pytorchWithCuda: consistent cuda #166734

python3Packages.pytorchWithCuda: consistent cuda #166734

Conversation

SomeoneSerge commented Apr 1, 2022 • edited Loading

Description of changes

Things done

samuela commented Apr 1, 2022

samuela Apr 1, 2022

Choose a reason for hiding this comment

SomeoneSerge Apr 1, 2022

Choose a reason for hiding this comment

samuela Apr 1, 2022

Choose a reason for hiding this comment

samuela Apr 1, 2022

Choose a reason for hiding this comment

samuela Apr 1, 2022

Choose a reason for hiding this comment

SomeoneSerge Apr 1, 2022

Choose a reason for hiding this comment

SomeoneSerge commented Apr 13, 2022

SomeoneSerge commented Apr 1, 2022 •

edited

Loading