{_cuda.extensions,pkgsCuda,pkgsForCudaArch}: init#406568
{_cuda.extensions,pkgsCuda,pkgsForCudaArch}: init#406568ConnorBaker merged 10 commits intoNixOS:masterfrom
Conversation
cb9bdac to
6e522e2
Compare
6e522e2 to
8af60b4
Compare
8af60b4 to
6911a4c
Compare
6911a4c to
073eb58
Compare
bf568cb to
5814d36
Compare
|
|
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
Co-authored-by: Connor Baker <ConnorBaker01@gmail.com> Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
ab4a491 to
0ff3ee0
Compare
|
Thank you for the review! Excellent suggestions -- I made them all, squashed, and rebased. |
| # `pkgsForCudaArch` maps each CUDA capability in _cuda.db.cudaCapabilityToInfo to a Nixpkgs variant configured for | ||
| # that target system. For example, `pkgsForCudaArch.sm_90a.python3Packages.torch` refers to PyTorch built for the | ||
| # Hopper architecture, leveraging architecture-specific features. | ||
| # NOTE: Not every package set is supported on every architecture! | ||
| # See `Using pkgsForCudaArch` in doc/languages-frameworks/cuda.section.md for more information. | ||
| pkgsForCudaArch = lib.listToAttrs ( | ||
| lib.map (cudaCapability: { | ||
| name = self._cuda.lib.mkRealArchitecture cudaCapability; | ||
| value = nixpkgsFun { | ||
| config = super.config // { | ||
| cudaSupport = true; | ||
| rocmSupport = false; | ||
| # Not supported by architecture-specific feature sets, so disable for all. | ||
| # Users can choose to build for family-specific feature sets if they wish. | ||
| cudaForwardCompat = false; | ||
| cudaCapabilities = [ cudaCapability ]; | ||
| }; | ||
| }; | ||
| }) (lib.attrNames self._cuda.db.cudaCapabilityToInfo) | ||
| ); | ||
|
|
There was a problem hiding this comment.
FTR we'd discussed this with Connor, and it is a known concern that we here are bending the Nixpkgs' "public API" to compensate for Nix's CLI UX issues. This is somewhat evidenced by there being no in-tree consumers for selecting specific CUDA architectures. Still the intuitive way to achieve the desired UX "now", and I can't really judge what's worth what...
As such, even today this UX kinda could be implemented in a separate layer. Like we could literally write a builder-pattern wrapper for nixpkgs going like (foo.buildPlatform.x86_64-linux.hostPlatform.aarch64-linux.cuda."86".withOverlay (fi: pre: ...)).result, and a cli tool around such wrapper.
| pkgsForCudaArch = lib.listToAttrs ( | ||
| lib.map (cudaCapability: { | ||
| name = self._cuda.lib.mkRealArchitecture cudaCapability; | ||
| value = nixpkgsFun { |
There was a problem hiding this comment.
Another remembrance: prior to introduction of pkgsRocm the idea was to make nixpkgsFun public, and that too might be helpful with the separation into layers, in particular with moving pkgsRocm and pkgsCuda* from stages to the level of all-packages.nix
| # We must use an instance of Nixpkgs where the CUDA package set we're building is the default; if we do not, members | ||
| # of the versioned, non-default package sets may rely on (transitively) members of the default, unversioned CUDA | ||
| # package set. | ||
| # See `Using cudaPackages.pkgs` in doc/languages-frameworks/cuda.section.md for more information. | ||
| pkgs' = | ||
| let | ||
| cudaPackagesUnversionedName = "cudaPackages"; |
There was a problem hiding this comment.
Not obvious from the title, but the biggest and most important change of the PR is probably this: the re-instantiation of nixpkgs from inside various cudaPackages_XX_Y package sets. An obvious layering violation and a very unfortunate situation, but there doesn't seem to be any (affordable) way around this. This is mostly needed because of a handful of fairly exceptional packages like mpi and ucc (circular dependencies in the "coarsened" graph where cudaPackages are a single super-node), but... really is needed?
There was a problem hiding this comment.
Hmmm on the (hundred and) second thought... the bug here was that when we (cudaPackages) consumed ucc/ucx which we knew depended, in turn, on "us" we hadn't configured them with the appropriate overrides. Now we are doing this, but we pass the overrides using a "sledgehammer", i.e. via overlays. We could, in fact, just do the explicit local overrides. I'd argue it would be more in the current spirit of Nixpkgs to do that.
The proposed overalys-based approach is more systemic (ensures we don't forget local overrides again, ensures transitive dependencies are handled correctly) at the cost of introducing (another) layering violation
There was a problem hiding this comment.
Late addition: on the ROCm side we seem to be getting by with just overriding things like ucx as we feel it's needed rather than an overlayed pkgs but I can see the stack of things to override may get increasingly difficult over time to apply; maybe we'll end up copying this approach later.
There was a problem hiding this comment.
This was done explicitly because there's no way to know all transitive dependencies have been overridden (or even over overridden correctly!). Rather than playing a never-ending game of whack-a-mole (and I don't know how one would find out about misconfigurations without someone reporting an issue) this was the best solution I could come up with. The stack has enough problems; I didn't want missing overrides in massive transitive closures to be one of them.
There was a problem hiding this comment.
(and I don't know how one would find out about misconfigurations without someone reporting an issue)
Some horrifying test that asserts on equality of pkgsCuda.cudaPackages.x matching cudaPackages.x for everything in it?
but yeah very whack-a-mole
There was a problem hiding this comment.
getting by with just overriding things like ucx as we feel
Yeah gets problematic with the diamond deps like tf->cudA,tf->mpi->ucx->cudB,cuda->ucx
Some horrifying test that asserts on equality
I think we should be able to just break cyclic dependencies, in particular the cuda->ucx part. I'd be in for switching from cudaPackages.pkgs to asserts too, as long as we manage to keep the obvious targets (torch, tf, triton?) functional...
SomeoneSerge
left a comment
There was a problem hiding this comment.
To sum up: I truly wish I could say "no, and we should do X instead" to any of this, but the fact is the PR reflects the complexity of the problem... Thank you for shoveling this, Connor. And thank you for the reviews Dan
|
Also a record concerning processes because I've been increasingly often late with reviews. So far we've seen arguments but no explicit rules about "how long one should wait for reviews prior to merging, what kind of changes are ok without reviews,,&c", and my personal opinions currently are that:
In case of this PR specifically, the nixpkgs-side commx that happened was more or less the following:
I think these interactions involved managing conflicts of interests (e.g. contributor v. employee), and managing perceived senses of "ownership" ([imo] nobody rightfully "owns" anything in Nixpkgs, but feelings get hurt). None of the interactions were as such obligatory, and time surely could have been saved. There's likely a bunch of PRs in the backlog for which it should've been saved. Time and, I imagine, frustration. There's got to be, somewhere, the balance where things do move forward, but no angry discourse "willy-nilly merge" threads are opened. I'm hoping we're not too far from the balance |
|
Successfully created backport PR for |
|
Bisect says fc2c350 |
|
Can you tell we’re not running the tests in any CI ;) That failure makes sense — that attribute is only available from 11.4 and on. Truly everything prior should be removed at this point. Although, Jetson devices (special aarch64-linux) don’t get that until 11.8, unless you use JetPack NixOS, which repackages Debian installers for 11.4 to get those attributes. Hmmm. It’s probably fine to remove everything before 11.8 from an in-tree perspective, I’ll try to do that when I get some time. |
Description of changes Aligns with upstream's changes in NixOS/nixpkgs#406568. nvidia-jetpack.cudaPackages now has a pkgs attribute which functions in the same way as upstream's cudaPackages.pkgs attribute: https://nixos.org/manual/nixpkgs/stable/#cuda-using-cudapackages-pkgs. The default version of nvidia-jetpack is now determined by the default version of the CUDA package set (cudaPackages). Changing the default version of the CUDA package set (either through something like cudaPackages_12.pkgs or an overlay) now changes the default version of nvidia-jetpack. Added assertions to the NixOS configuration which, when CUDA support is requested, check that the default CUDA package set can be used with the current version of JetPack NixOS. Added a new check, jetpackSelectionDependsOnCudaVersion, which uses assertions to verify the version-changing behavior made possible by cudaPackages.pkgs and the like function as expected. Testing CI (internal) Manual testing in HITL (internal; in progress) Manual verification
This PR:
_cuda.extensions,pkgsCuda, andpkgsForCudaArchcudaPackages_11_8,cudaPackages_12_2, etc.) and documentscudaPackages.pkgsNote
There is a increase to evaluation time for attributes which draw from the non-default CUDA package set due to the implementation of the fix for package set leakage.
As an example, consider
catboost, which requirescudaPackages_11, and saw a slight slowdown from about0.49seconds to about0.55seconds:Before
{ "cpuTime": 0.4951170086860657, "envs": { "bytes": 14589216, "elements": 1034109, "number": 789543 }, "gc": { "cycles": 1, "heapSize": 402915328, "totalBytes": 182882816 }, "list": { "bytes": 1729208, "concats": 22189, "elements": 216151 }, "nrAvoided": 846465, "nrExprs": 217900, "nrFunctionCalls": 711585, "nrLookups": 381381, "nrOpUpdateValuesCopied": 4538315, "nrOpUpdates": 45631, "nrPrimOpCalls": 417981, "nrThunks": 908748, "sets": { "bytes": 89235808, "elements": 5467638, "number": 109600 }, "sizes": { "Attr": 16, "Bindings": 16, "Env": 8, "Value": 24 }, "symbols": { "bytes": 464088, "number": 44826 }, "time": { "cpu": 0.4951170086860657, "gc": 0.002, "gcFraction": 0.004039449190621771 }, "values": { "bytes": 52030200, "number": 2167925 } }After
{ "cpuTime": 0.5533679723739624, "envs": { "bytes": 22146784, "elements": 1569057, "number": 1199291 }, "gc": { "cycles": 1, "heapSize": 402915328, "totalBytes": 275970768 }, "list": { "bytes": 2658768, "concats": 31839, "elements": 332346 }, "nrAvoided": 1285255, "nrExprs": 217978, "nrFunctionCalls": 1073751, "nrLookups": 590551, "nrOpUpdateValuesCopied": 6945741, "nrOpUpdates": 65022, "nrPrimOpCalls": 644075, "nrThunks": 1375075, "sets": { "bytes": 135716224, "elements": 8314615, "number": 167649 }, "sizes": { "Attr": 16, "Bindings": 16, "Env": 8, "Value": 24 }, "symbols": { "bytes": 464266, "number": 44837 }, "time": { "cpu": 0.5533679723739624, "gc": 0.002, "gcFraction": 0.003614231577985893 }, "values": { "bytes": 77944752, "number": 3247698 } }The largest measured increase in evaluation time comes from
pkgs/top-level/release-cuda.nix, which instantiates every CUDA package set, jumping from about7.7seconds to about25.6seconds:Before
{ "cpuTime": 7.651072025299072, "envs": { "bytes": 438816984, "elements": 31661704, "number": 23190419 }, "gc": { "cycles": 9, "heapSize": 1829953536, "totalBytes": 3203691584 }, "list": { "bytes": 60658712, "concats": 1043070, "elements": 7582339 }, "nrAvoided": 24049709, "nrExprs": 1224124, "nrFunctionCalls": 20635918, "nrLookups": 10945781, "nrOpUpdateValuesCopied": 47363986, "nrOpUpdates": 1770862, "nrPrimOpCalls": 9565514, "nrThunks": 27756355, "sets": { "bytes": 1151189808, "elements": 67027422, "number": 4921941 }, "sizes": { "Attr": 16, "Bindings": 16, "Env": 8, "Value": 24 }, "symbols": { "bytes": 937835, "number": 81382 }, "time": { "cpu": 7.651072025299072, "gc": 2.613, "gcFraction": 0.34152076877068227 }, "values": { "bytes": 977363952, "number": 40723498 } }After
{ "cpuTime": 25.631572723388672, "envs": { "bytes": 1542351136, "elements": 110687785, "number": 82106107 }, "gc": { "cycles": 13, "heapSize": 7483875328, "totalBytes": 12488948416 }, "list": { "bytes": 194446808, "concats": 3671610, "elements": 24305851 }, "nrAvoided": 87539852, "nrExprs": 1224245, "nrFunctionCalls": 75285426, "nrLookups": 35311482, "nrOpUpdateValuesCopied": 236831655, "nrOpUpdates": 6330797, "nrPrimOpCalls": 35780617, "nrThunks": 94455780, "sets": { "bytes": 4987623664, "elements": 298972000, "number": 12754479 }, "sizes": { "Attr": 16, "Bindings": 16, "Env": 8, "Value": 24 }, "symbols": { "bytes": 937946, "number": 81387 }, "time": { "cpu": 25.631572723388672, "gc": 7.534, "gcFraction": 0.2939343629556241 }, "values": { "bytes": 3475471944, "number": 144811331 } }Given the small number of packages which use a non-default version of the CUDA package set:
catboost:nixpkgs/pkgs/top-level/all-packages.nix
Line 7802 in 846783c
dcgm:nixpkgs/pkgs/by-name/dc/dcgm/package.nix
Lines 9 to 10 in 846783c
magma:nixpkgs/pkgs/development/libraries/science/math/magma/generic.nix
Lines 45 to 46 in 846783c
python3Packages.paddlepaddle:nixpkgs/pkgs/development/python-modules/paddlepaddle/default.nix
Line 13 in 846783c
python3Packages.tensorflow-build:nixpkgs/pkgs/top-level/python-packages.nix
Line 17604 in 846783c
python3Packages.tensorrt:nixpkgs/pkgs/top-level/python-packages.nix
Line 17671 in 846783c
I don't believe this increase to be a cause for concern for most users.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.