-
-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix pytorchWithCuda
, fix cupy
, upgrade cudnn
#166784
Conversation
cudnn_8_3_cudatoolkit_10_2 = generic rec { | ||
version = "8.3.2"; | ||
cudatoolkit = cudatoolkit_10_2; | ||
# See https://docs.nvidia.com/deeplearning/cudnn/archives/cudnn-832/support-matrix/index.html#cudnn-cuda-hardware-versions. | ||
minCudaVersion = "10.2.00000"; | ||
maxCudaVersion = "11.5.99999"; | ||
mkSrc = cudatoolkit: | ||
let v = if lib.versions.majorMinor cudatoolkit.version == "10.2" then "10.2" else "11.5"; in | ||
fetchurl { | ||
# Starting at version 8.3.1 there's a new directory layout including | ||
# a subdirectory `local_installers`. | ||
url = "https://developer.download.nvidia.com/compute/redist/cudnn/v${version}/local_installers/${v}/cudnn-linux-x86_64-8.3.2.44_cuda${v}-archive.tar.xz"; | ||
hash = { | ||
"10.2" = "sha256-1vVu+cqM+PketzIQumw9ykm6REbBZhv6/lXB7EC2aaw="; | ||
"11.5" = "sha256-VQCVPAjF5dHd3P2iNPnvvdzb5DpTsm3AqCxyP6FwxFc="; | ||
}."${v}"; | ||
}; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is purely a matter of formatting.
assert cudnn.cudatoolkit == cudatoolkit; | ||
assert cutensor.cudatoolkit == cudatoolkit; | ||
assert nccl.cudatoolkit == cudatoolkit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was discovered here that cupy was accidentally pulling in multiple cudatoolkit versions. These asserts should prevent that issue going forward.
passthru = { | ||
inherit cudatoolkit; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary for the assert
in cupy/default.nix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
Indeed, we've been considering both, the first as a temporary fix, and the latter as a longer-term solution.
Hmm, I see. Currently it's the other way around (cudnn has cudatoolkit in passthru). Also, there are several releases of cudnn in nixpkgs. Again, |
On the topic of a cudaPackages package set, check out the proposal here: #163704 Re passthru: I think that's absolutely the way to go for the time being. This PR is already rather beefy, so I'd prefer to accomplish that separately, but I totally agree that we ought to do so. |
@FRidh @SomeoneSerge Created an issue to track the torchvision cudatoolkit discrepancy: #166948 |
If no one objects, I'll go ahead and merge tomorrow |
EDIT: tested like below with (import ./. {
config.allowUnfree = true;
config.cudaSupport = true;
});
mkShell {
buildInputs = [
python310Packages.caffe
];
} or like this: $ nix build github:SomeoneSerge/nixpkgs-unfree/e244a94864998f2b3edd927a6b4b4680718f406b#python39Packages.caffe |
Hmm I'm not seeing this:
|
Relaying a conversation from
I think in this case there will be exactly one package left that still uses cuda 10 (seems to be I'm not opposed to either solution. I prefer this alternative |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
The current version of caffe, v1.0, does not build with any later versions of cuDNN and CUDA 10.1 is the latest CUDA that is supported by cuDNN 7.6.
@SomeoneSerge I added a |
I'm in favor of doing this, although I hesitate to include those changes in this PR as it has gotten so huge already. What if we merged this one and then followed up immediately after with a PR that bumps the cudatoolkit and cudatoolkit_11 versions? I believe that would get us the same end result but with smaller steps along the way. |
I thought the major bumps would allow to drop some of the changes and reduce the size of PR... but actually, merging the asserts and fixing pytorch+cupy before cleaning up sounds tempting |
Ah I see what you mean. Perhaps that is true. But I suspect that until we get pytorch v1.11 we won't be able to get out of specifying custom cudatoolkit versions since I don't think v1.10.2 supports cudatoolkit = cudatoolkit_11 = cudatoolkit_11_5. I'm not entirely sure though. |
This is why I was talking about a downgrade to 11.4 |
Ohh I see. sorry I misunderstood! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I feel OK merging this now and immediately following up with a clean-up PR that would adjust the cudnn and cudatoolkit attributes so as to minimize the total number of ad hoc parameters passed to callPackage
s: the first step makes the attributes of nixpkgs buildable, the second fixes the overlays UX, which seems like a reasonable allocation of priorities
@@ -21,7 +21,7 @@ stdenv.mkDerivation { | |||
|
|||
src = fetchurl { | |||
url = "https://developer.download.nvidia.com/compute/cutensor/${mostOfVersion}/local_installers/libcutensor-${stdenv.hostPlatform.parsed.kernel.name}-${stdenv.hostPlatform.parsed.cpu.name}-${version}.tar.gz"; | |||
inherit sha256; | |||
inherit hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why hash
vs sha256
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iiuc hash is preferred to sha256 as it is more future proof, but I don't remember the reference for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# fetchurl/default.nix
, # SRI hash.
hash ? ""
, # Legacy ways of specifying the hash.
outputHash ? ""
, outputHashAlgo ? ""
, md5 ? ""
, sha1 ? ""
, sha256 ? ""
, sha512 ? ""
Hmm, I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, we are switching everywhere (though slowly) to SRI hashes.
passthru = { | ||
inherit cudatoolkit; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
Description of changes
This PR accomplishes a few separate but related tasks:
Although I try to keep PRs as small as possible, in this case it became clear that it was much more straightforward to make these changes simultaneously than separately. I understand that increases the burden on reviewers to some extent, but I tried to keep each commit as small as possible to ease the process.
This PR is the equivalent of #166533 but targeting master instead of staging.
According to nixpkgs-review this affects packages
pytorchWithCuda
,Theano
, andcupy
. I've confirmed that all of those nix-build successfully. For some reason,nixpkgs-review wip
doesn't work in this case, but everything seems to build just fine.Fixes #161843, #166403, and possibly others.
Things done
sandbox = true
set innix.conf
? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)nixos/doc/manual/md-to-db.sh
to update generated release notes