Skip to content

cudaPackages: multiplex builder always provides attributes#406207

Merged
ConnorBaker merged 2 commits intoNixOS:masterfrom
ConnorBaker:fix/cuda-darwin-eval
May 12, 2025
Merged

cudaPackages: multiplex builder always provides attributes#406207
ConnorBaker merged 2 commits intoNixOS:masterfrom
ConnorBaker:fix/cuda-darwin-eval

Conversation

@ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented May 11, 2025

#405707 broke evaluation of at least one attribute on Darwin (python3Packages.jax-cuda12-pjrt) because the multiplex builder would only add attributes for packages on supported platforms and access to cudaPackages was not guarded by platform checks.

As a quick fix, this PR changes the behavior of the multiplex builder so that it will add attributes for packages supporting the current CUDA version, even if they are unsupported on the current platform.

Tested with

nix-build ci -A eval.full   --max-jobs 24   --cores 16   --arg chunkSize 10000   --arg evalSystems '["x86_64-linux" "aarch64-linux" "x86_64-darwin" "aarch64-darwin"]'

EDIT:

The above didn't error on master, so I don't know that it's a good indication that this fixes the eval error.

However, I did verify manually:

[connorbaker@nixos-desktop:~/nixpkgs]$ git switch master
Switched to branch 'master'
Your branch is up to date with 'origin/master'.

[connorbaker@nixos-desktop:~/nixpkgs]$ nix eval --system aarch64-darwin .#python312Packages.jax-cuda12-pjrt
«error: attribute 'cudnn' missing»

[connorbaker@nixos-desktop:~/nixpkgs]$ git switch fix/cuda-darwin-eval 
Switched to branch 'fix/cuda-darwin-eval'

[connorbaker@nixos-desktop:~/nixpkgs]$ nix eval --system aarch64-darwin .#python312Packages.jax-cuda12-pjrt
«error: Package ‘python3.12-jax-cuda12-pjrt-0.6.0’ in /nix/store/1jnd81lxfm8rqk1zy25syqhpnpy2bklm-source/pkgs/development/python-modules/jax-cuda12-pjrt/default.nix:96 is not available on the requested hostPlatform:
  hostPlatform.config = "arm64-apple-darwin"
  package.meta.platforms = [
    "aarch64-linux"
    "armv5tel-linux"
    "armv6l-linux"
    "armv7a-linux"
    "armv7l-linux"
    "i686-linux"
    "loongarch64-linux"
    "m68k-linux"
    "microblaze-linux"
    "microblazeel-linux"
    "mips-linux"
    "mips64-linux"
    "mips64el-linux"
    "mipsel-linux"
    "powerpc64-linux"
    "powerpc64le-linux"
    "riscv32-linux"
    "riscv64-linux"
    "s390-linux"
    "s390x-linux"
    "x86_64-linux"
  ]
  package.meta.badPlatforms = [ ]
, refusing to evaluate.

a) To temporarily allow packages that are unsupported for this system, you can use an environment variable
   for a single invocation of the nix tools.

     $ export NIXPKGS_ALLOW_UNSUPPORTED_SYSTEM=1
     
   Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
         then pass `--impure` in order to allow use of environment variables.
    
b) For `nixos-rebuild` you can set
  { nixpkgs.config.allowUnsupportedSystem = true; }
in configuration.nix to override this.

c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
  { allowUnsupportedSystem = true; }
to ~/.config/nixpkgs/config.nix.
»

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: cuda Parallel computing platform and API label May 11, 2025
@ConnorBaker ConnorBaker changed the title cudaPackages: multiplex builder provides broken attributes rather tha… cudaPackages: multiplex builder always provides attributes May 11, 2025
@ConnorBaker ConnorBaker moved this from New to 👀 Awaits reviews in CUDA Team May 11, 2025
@ConnorBaker ConnorBaker marked this pull request as ready for review May 11, 2025 16:03
@ConnorBaker ConnorBaker self-assigned this May 11, 2025
@ConnorBaker ConnorBaker force-pushed the fix/cuda-darwin-eval branch from d8270fe to 4bc9a05 Compare May 11, 2025 16:07
…n no attributes

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker force-pushed the fix/cuda-darwin-eval branch from 4bc9a05 to 143898c Compare May 11, 2025 16:07
@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. labels May 11, 2025
Comment on lines 94 to +102
${majorMinorVersion} =
# Only keep the existing package if it is newer than the one we are considering.
if existingPackage != null && lib.versionOlder package.version existingPackage.version then
# Only keep the existing package if it is newer than the one we are considering or it is supported on the
# current platform and the one we are considering is not.
if
existingPackage != null
&& (
packageOlder package existingPackage
|| (!packageSupportedPlatform package && packageSupportedPlatform existingPackage)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, though, we won't be able to see trofi's commit message in git-blame 9fd753e

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment to reference it when calling lib.sort.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still recover the named function

Comment on lines +59 to +75
# allPackages :: List (Package + { redistArch: String })
allPackages =
let
redistArchs = lib.attrNames evaluatedModules.config.${pname}.releases;
in
lib.concatMap (
redistArch:
let
packages = evaluatedModules.config.${pname}.releases.${redistArch};
in
lib.concatMap (
package: lib.optionals (satisfiesCudaVersion package) [ ({ inherit redistArch; } // package) ]
) packages
) redistArchs;

packageOlder = p1: p2: lib.versionOlder p1.version p2.version;
packageSupportedPlatform = p: p.redistArch == redistArch;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you keep the previous version but rename it to allPackages', and add allPackages = allPackages' ++ [(head ...)] where ... is flattened releases. The previous filter was more readable than 2x concatMap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-introduced allReleases and introduced allReleases' to make it more clear what's happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "previous" I meant the one you introduced in the previous PR, where you only keep the matching platform's packages. That part looked fairly simple and comprehensible, we could keep it and append just one potentially-unsupported package at the end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your call though, I can also look in more detail at the current version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say look more closely at the current version since it makes sure that the incompatible package used at least matches the version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match the version

Good point. FYI my plan for the fallback is to runCommand pname { ... } "false", but it might be easier to do after the evalModules schema update

…her than no attributes

fixup! cudaPackages: simplify multiplex builder's allPackages

Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
@ConnorBaker ConnorBaker requested a review from SomeoneSerge May 11, 2025 17:28
@ConnorBaker
Copy link
Contributor Author

Merging to unbreak Darwin eval.

@ConnorBaker ConnorBaker merged commit 1a120d4 into NixOS:master May 12, 2025
22 checks passed
@github-project-automation github-project-automation bot moved this from 👀 Awaits reviews to ✅ Done in CUDA Team May 12, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase --autosquash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants