Skip to content

cuda-modules: fix sort predicate stability#368366

Merged
SomeoneSerge merged 2 commits intoNixOS:masterfrom
trofi:cuda-modules-fix-sort-predicate
May 8, 2025
Merged

cuda-modules: fix sort predicate stability#368366
SomeoneSerge merged 2 commits intoNixOS:masterfrom
trofi:cuda-modules-fix-sort-predicate

Conversation

@trofi
Copy link
Contributor

@trofi trofi commented Dec 26, 2024

Incorrect sorting predicate was found as part of
NixOS/nix#12106 where nix was crashing on the code like:

$ nix eval --expr 'builtins.sort (a: b: true) [ 1 2 3 ]'
...
Aborted (core dumped)

Note: the crash happens here because sorting predicate does not implement isLess and triggers assertion failures for std::stable_sort that backs builtins.sort.

THe change restore isLess semantic for preferable.

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 25.05 Release Notes (or backporting 24.11 and 25.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@github-actions github-actions bot added the 6.topic: cuda Parallel computing platform and API label Dec 26, 2024
@SomeoneSerge
Copy link
Contributor

SomeoneSerge commented Dec 26, 2024

This should be good, but off the top of my head I can't immediately tell why Ofborg failed with

       error: attribute 'lib' missing

       at /ofborg/checkout/2/repo/38dca4e3aa6bca43ea96d2fcc04e8229/mr-est/ofborg-evaluator-4/pkgs/development/python-modules/torch/default.nix:369:30:

          368|       export CUDNN_INCLUDE_DIR=${lib.getLib cudnn}/include
          369|       export CUDNN_LIB_DIR=${cudnn.lib}/lib

...a riddle for the next morning

@trofi trofi force-pushed the cuda-modules-fix-sort-predicate branch from 5e4541b to 8bb2663 Compare April 18, 2025 21:46
@github-actions github-actions bot added the 6.topic: python Python is a high-level, general-purpose programming language. label Apr 18, 2025
@trofi
Copy link
Contributor Author

trofi commented Apr 18, 2025

Trying eval fix as:

--- a/pkgs/development/python-modules/torch/default.nix
+++ b/pkgs/development/python-modules/torch/default.nix
@@ -354,7 +354,7 @@ buildPythonPackage rec {
     ''
     + lib.optionalString (cudaSupport && cudaPackages ? cudnn) ''
       export CUDNN_INCLUDE_DIR=${lib.getLib cudnn}/include
-      export CUDNN_LIB_DIR=${cudnn.lib}/lib
+      export CUDNN_LIB_DIR=${lib.getLib cudnn}/lib
     ''
     + lib.optionalString rocmSupport ''
       export ROCM_PATH=${rocmtoolkit_joined}

I'm not sure if existing export CUDNN_INCLUDE_DIR=${lib.getLib cudnn}/include is correct (I would normally expect ${lib.getDev cudnn}/include). But I'm leaving it as is as it does not cause eval problems for now.

@github-actions github-actions bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. labels Apr 18, 2025
@nix-owners nix-owners bot requested review from teh, thoughtpolice and tscholak April 18, 2025 22:08
@trofi trofi force-pushed the cuda-modules-fix-sort-predicate branch from 91f1573 to abfad7d Compare April 18, 2025 23:04
Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the bump. Let's wait for OfBorg and merge

@wegank wegank added the 12.approvals: 1 This PR was reviewed and approved by one person. label Apr 20, 2025
@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review/3032/5425

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/prs-ready-for-review/3032/5447

trofi added 2 commits May 8, 2025 17:20
Without the change the eval fails as:

    $ nix build --no-link -f. opensplatWithCuda --argstr system aarch64-linux
    error:
       error: attribute 'lib' missing
       at /home/slyfox/dev/git/nixpkgs-master/pkgs/development/python-modules/torch/default.nix:357:30:
          356|       export CUDNN_INCLUDE_DIR=${lib.getLib cudnn}/include
          357|       export CUDNN_LIB_DIR=${cudnn.lib}/lib
             |                              ^
          358|     ''
Incorrect sorting predicate was found as part of
NixOS/nix#12106 where `nix` was crashing on
the code like:

    $ nix eval --expr 'builtins.sort (a: b: true) [ 1 2 3 ]'
    ...
    Aborted (core dumped)

Note: the crash happens here because sorting predicate does not
implement `isLess` and triggers assertion failures for
`std::stable_sort` that backs `builtins.sort`.

THe change restore `isLess` semantic for `preferable`.
@SomeoneSerge SomeoneSerge force-pushed the cuda-modules-fix-sort-predicate branch from abfad7d to 9fd753e Compare May 8, 2025 17:20
@SomeoneSerge
Copy link
Contributor

github got confused by torch/default.nix -> torch/source/default.nix

@SomeoneSerge SomeoneSerge merged commit 2cd2dec into NixOS:master May 8, 2025
22 checks passed
@github-project-automation github-project-automation bot moved this from New to ✅ Done in CUDA Team May 8, 2025
@trofi trofi deleted the cuda-modules-fix-sort-predicate branch May 8, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 6.topic: python Python is a high-level, general-purpose programming language. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants