Skip to content

cuda-modules: fix and clean up multiplex builder package selection logic#405707

Merged
ConnorBaker merged 1 commit intoNixOS:masterfrom
ConnorBaker:fix/cuda-packages-multiplex-builder-package-selection-logic
May 9, 2025
Merged

cuda-modules: fix and clean up multiplex builder package selection logic#405707
ConnorBaker merged 1 commit intoNixOS:masterfrom
ConnorBaker:fix/cuda-packages-multiplex-builder-package-selection-logic

Conversation

@ConnorBaker
Copy link
Contributor

This is a fix for a regression introduced in #368366, which manifests as cudnn (and likely other packages produced by the multiplex builder) selecting packages for the wrong architecture.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@ConnorBaker ConnorBaker self-assigned this May 9, 2025
@ConnorBaker ConnorBaker added the 6.topic: cuda Parallel computing platform and API label May 9, 2025
@ConnorBaker ConnorBaker moved this from New to 🏗 In progress in CUDA Team May 9, 2025
@ConnorBaker ConnorBaker marked this pull request as ready for review May 9, 2025 20:22
@ConnorBaker ConnorBaker moved this from 🏗 In progress to 👀 Awaits reviews in CUDA Team May 9, 2025
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels May 9, 2025
@ConnorBaker ConnorBaker requested a review from prusnak May 9, 2025 20:40
Copy link
Contributor

@GaetanLepage GaetanLepage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff looks fine to me. Although I must admit that I am not familiar with this module, so consider it as an out-of-context sanity check.

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker force-pushed the fix/cuda-packages-multiplex-builder-package-selection-logic branch from 7afa7c7 to 030575e Compare May 9, 2025 21:44
@ConnorBaker
Copy link
Contributor Author

Rebased on master to fix merge conflict since #405664 touched parts of the same file.

@ConnorBaker
Copy link
Contributor Author

ConnorBaker commented May 9, 2025

Merging since this allows building PyTorch.

@ConnorBaker ConnorBaker merged commit b3443d7 into NixOS:master May 9, 2025
24 of 28 checks passed
@github-project-automation github-project-automation bot moved this from 👀 Awaits reviews to ✅ Done in CUDA Team May 9, 2025
@ConnorBaker ConnorBaker deleted the fix/cuda-packages-multiplex-builder-package-selection-logic branch May 9, 2025 22:32
@K900
Copy link
Contributor

K900 commented May 11, 2025

This broke eval of python312Packages.jax-cuda12-pjrt and possibly others on Darwin.

Comment on lines +58 to +60
allPackages = lib.filter satisfiesCudaVersion (
evaluatedModules.config.${pname}.releases.${redistArch} or [ ]
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any filtering done in the previous version, so the issue is probably here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, iirc we switched from filtering the candidates to sorting them specifically as a dumb way to avoid missing attributes:

#282185
#276800

Been working offline for two days, need time to reload context, feel free to act ahead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just special case the empty list branch

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented May 11, 2025

This broke eval of python312Packages.jax-cuda12-pjrt and possibly others on Darwin.

Confirmed:

Details
❯ nix-instantiate --arg config '{ allowUnfree = true; }' -A python312Packages.jax-cuda12-pjrt --argstr system x86_64-darwin
       error: attribute 'cudnn' missing
       at /home/someone/Sources/nixpkgs/pkgs/development/python-modules/jax-cuda12-pjrt/default.nix:104:60:
          103|     # https://jax.readthedocs.io/en/latest/installation.html#pip-installation-nvidia-gpu-cuda-installed-locally-harder
          104|     broken = !(cudaAtLeast "12.1") || !(lib.versionAtLeast cudaPackages.cudnn.version "9.1");
             |                                                            ^
          105|   };

Odd that this passed the checks (CC @infinisil if you're still working on github workflows?)

@ConnorBaker
Copy link
Contributor Author

Unless someone beats me to it I’ll make a PR to predicate broken on cudaSupport.

@SomeoneSerge
Copy link
Contributor

predicate broken on cudaSupport.

Not sure I understand what's the idea?

@ConnorBaker
Copy link
Contributor Author

Sorry that was unclear, I meant something like broken = config.cudaSupport && (<existing arguments OR’d together>).

Although, I’ll also see if there’s a simple way to have some sort of placeholder attribute in the CUDA package sets when none of the manifest entries correspond to the Nix system doing evaluation.

@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented May 11, 2025

something like broken = config.cudaSupport && (<existing arguments OR’d together>)

But where do you want to add that? EDIT: If you mean the python package, I think we should not do that. The bug is on cudaPackages side, and the error the python package should display is that it depends on a package unsupported by the platform IMO

have some sort of placeholder attribute in the CUDA package sets when none of the manifest entries correspond to the Nix system doing evaluation.

👍🏻 that's what we need to restore

@ConnorBaker
Copy link
Contributor Author

See #406207 for a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

5 participants