-
-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python3Packages.pytorchWithCuda: consistent cuda #166734
Conversation
pytorch has been using (through unversioned attributes) cudatoolkit=10.2 and cudnn_7_6_cudatoolkit_10_1; the older 10.1 took priority; pytorch 1.10 requires at least 10.2
I was just about to submit a PR with pytorch = callPackage ../development/python-modules/pytorch {
cudaSupport = pkgs.config.cudaSupport or false;
# TODO: next time pytorch is updated (to 1.11.0, currently in staging as of
# 2022-03-31), make the following changes:
# -> cudatoolk_11
cudatoolkit = pkgs.cudatoolkit_10;
# -> cudnn_8_3_cudatoolkit_11
cudnn = pkgs.cudnn_8_1_cudatoolkit_10;
# -> cutensor_cudatoolkit_11 (cutensor is a new dependency in v1.11.0)
# cutensor = pkgs.cutensor_cudatoolkit_11;
# -> setting a custom magma should be unnecessary with v1.11.0
magma = pkgs.magma.override { cudatoolkit = pkgs.cudatoolkit_10; };
# -> nccl_cudatoolkit_11
nccl = pkgs.nccl.override { cudatoolkit = pkgs.cudatoolkit_10; };
}; which also builds successfully and has the advantage of only relying on a single cudatoolkit version. I'm happy to merge either solution as this will all be obviated once |
cudnn = pkgs.cudnn_8_3_cudatoolkit_11; | ||
cudatoolkit = cudnn.cudatoolkit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC pkgs.cudnn_8_3_cudatoolkit_11
will pull in cudatoolkit_11_5 and cudnn.cudatoolkit
will pull in cudatoolkit_10_1
. Just as a heads up.
This is still strictly better than broken, but worth noting that it still does pull in multiple CUDA versions AFAIU
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, what o_0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC cudnn currently points to cudnn_7_6_cudatoolkit_10_1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out the diff in #166784
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh wait... maybe I'm being an idiot. I guess this is like a let-rec binding so cudnn
refers to the definition just prior. ok in that case nvm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😓 I thought for a moment the same, but
nix-repl> pkgs.cudnn_8_3_cudatoolkit_11.cudatoolkit.version
"11.5.0"
401386f
to
2bfcff9
Compare
Fixed with #167016 |
Pytorch has been using (through unversioned attributes) cudatoolkit=10.2 (10.1 in nixpkgs-unstable)
and cudnn_7_6_cudatoolkit_10_1; the older 10.1 took priority; pytorch
1.10 requires at least 10.2, which lead to a build failure: #166403
Description of changes
pytorch
with consistent cudnn and cudatoolkit versions.cudnn_8_3_cudatoolkit_11
(cuda 11.5, atm) because this is what I expect to be the new value of the unversioned cuda*-attributes after cudnn: 7.6.5 -> 8.3.2 #166533 lands (we might be able to re-use the cache later)This is a hotfix for pytorch to stay buildable on nixpkgs-unstable.
The longer-term solution is #166533.
This causes an additional magma build, but (a) we're going to provide cache, (b) with #166533 the global magma will be using the same version of cuda and we can get rid of the ad hoc fix we introduce here
In future, we'll want to remove these custom arguments, because they make overriding (e.g. global cudnn/cuda) harder
Note that this change is "local" in the sense that it should only affect the "cuda" world, not built by hydra, but tested by @NixOS/cuda-maintainers. This might affect cuda-enabled pytorch on darwin, because we currently do not have any capabilities to test darwin
This can be merged if 113 has fewer failures than 97, in particular pytorchThis should fix the configurePhase, but the later build, apparently, is going to fail: #161843. That should be addressed in a separate PR. Or maybe I should call it off and wait for the cudnn PR to get merged - I'll sleep on this thought and wait for the buildThings done
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)cc @NixOS/cuda-maintainers
cc @tscholak about cuda-enabled pytorch on darwin