Skip to content

{cudaPackages.nccl, onnxruntime}: remove reference to nvcc in binary#457803

Merged
ConnorBaker merged 2 commits intoNixOS:masterfrom
GaetanLepage:nccl-patch
Nov 5, 2025
Merged

{cudaPackages.nccl, onnxruntime}: remove reference to nvcc in binary#457803
ConnorBaker merged 2 commits intoNixOS:masterfrom
GaetanLepage:nccl-patch

Conversation

@GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Nov 2, 2025

Things done

Since CUDA 12.8, the .nv_fatbin section of libnccl.so contains a reference to the cuda_nvcc derivation:

❮ strings -td $(nix-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.nccl)/lib/libnccl.so | grep -F -- "-L /nix/store"
282100688 &devrt -L /nix/store/ygd3s9zm1pf77n3q3ac63v58www5scbc-9
312945964 &devrt -L /nix/store/ygd3s9zm1pf77n3q3ac63v58www5scbc-9

❮ nix-store --query --requisites "$(nix-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.nccl)" | grep nvcc
/nix/store/ygd3s9zm1pf77n3q3ac63v58www5scbc-cuda12.8-cuda_nvcc-12.8.93

This makes cuda_nvcc (and cudaPackages.stdenv.cc, i.e. gcc-wrapper) a runtime dependency of cudaPackages.nccl.

Although not being desirable in principle, it actually breaks firefox, which has disallowedRequisites = [ stdenv.cc ].

This patch simply removes all references to the cuda_nvcc out path from cudaPackages.nccl's libnccl.so binary.

With this patch:

❮ strings -td $(nix-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.nccl)/lib/libnccl.so | grep -F -- "-L /nix/store"
282100676 &devrt -L /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-9
312945944 &devrt -L /nix/store/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee-9

❮ nix-store --query --requisites "$(nix-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.nccl)" | grep nvcc

Fixes #457406
Related: #457424

cc @SomeoneSerge @ConnorBaker

  • Built on platform:
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • Tested, as applicable:
  • Ran nixpkgs-review on this PR. See nixpkgs-review usage.
  • Tested basic functionality of all binary files, usually in ./result/bin/.
  • Nixpkgs Release Notes
    • Package update: when the change is major or breaking.
  • NixOS Release Notes
    • Module addition: when adding a new NixOS module.
    • Module update: when the change is significant.
  • Fits CONTRIBUTING.md, pkgs/README.md, maintainers/README.md and other READMEs.

Add a 👍 reaction to pull requests you find important.

@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. 6.topic: cuda Parallel computing platform and API labels Nov 2, 2025
@GaetanLepage GaetanLepage changed the title cudaPackages.nccl: remove reference to nvcc in binary {cudaPackages.nccl,onnxruntime}: remove reference to nvcc in binary Nov 2, 2025
@GaetanLepage GaetanLepage changed the title {cudaPackages.nccl,onnxruntime}: remove reference to nvcc in binary {cudaPackages.nccl, onnxruntime}: remove reference to nvcc in binary Nov 2, 2025
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. and removed 10.rebuild-linux: 11-100 This PR causes between 11 and 100 packages to rebuild on Linux. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 11.by: package-maintainer This PR was created by a maintainer of all the package it changes. labels Nov 2, 2025
@nix-owners nix-owners bot requested review from ck3d and puffnfresh November 2, 2025 17:12
@GaetanLepage
Copy link
Contributor Author

Well, onnxruntime still depends on cuda_nvcc at runtime (through cuda_cudart...

❯ nix why-depends --precise $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A firefox-unwrapped) $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.cuda_nvcc)
Finished at 18:16:53 after 1s
Finished at 18:16:53 after 0s
/nix/store/yy1z5y3iql9r4kpslxnjdwcygx52ssl8-firefox-unwrapped-144.0.2
└───lib/firefox/libonnxruntime.so: …st be specified....../nix/store/jk4a7v44fc83ykc15b31r4m21yqc92sp-onnxruntime-1.22.2/lib/.....onn…
    → /nix/store/jk4a7v44fc83ykc15b31r4m21yqc92sp-onnxruntime-1.22.2
    └───lib/libonnxruntime_providers_cuda.so: …nn-9.13.0.50-lib/lib:/nix/store/80x699lyc99dahf85iqdv6z1f0vv6vz2-cuda12.8-cuda_cudart-12.8.90/li…
        → /nix/store/80x699lyc99dahf85iqdv6z1f0vv6vz2-cuda12.8-cuda_cudart-12.8.90
        └───nix-support/propagated-build-inputs: …fhjm-setup-cuda-hook /nix/store/ygd3s9zm1pf77n3q3ac63v58www5scbc-cuda12.8-cuda_nvcc-12.8.93 /nix…
            → /nix/store/ygd3s9zm1pf77n3q3ac63v58www5scbc-cuda12.8-cuda_nvcc-12.8.93

This can still be merged as is @ConnorBaker as its making progress.
But we are not all the way there.

Comment on lines +326 to +328
postFixup = lib.optionalString cudaSupport ''
remove-references-to -t "${cudaPackages.cuda_nvcc}" $out/lib/libonnxruntime_providers_cuda.so
'';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this 1 because we already found where it comes from. Well to some precision...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym? This patch is necessary to get rid of nvcc, otherwise:

❮ nix why-depends --precise $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A onnxruntime) $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.cuda_nvcc)
Finished at 23:00:59 after 1s
Finished at 23:00:59 after 0s
/nix/store/rg1j1d4cf4l164zpbbri77yym2kpaqcb-onnxruntime-1.22.2
└───lib/libonnxruntime_providers_cuda.so: …,rt,@.3dev......! -L /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-9.......c_nvcc-..../bin/..//lib…
    → /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93

Copy link
Contributor

@ConnorBaker ConnorBaker Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why NVCC is doing this now?
Is it possible it's a result of my making NVCC a single output? As in, whereas previously references were to a distinct lib, include, or bin output and Nix's closure scanning would keep those dependencies, but not the dev output which contained the setup hook?

EDIT: In particular, I'm concerned the dependency on NVCC is not some quirk of ONNX Runtime's build system or packaging, but rather an issue with the way NVCC is packaged or used in Nixpkgs, and that we'll see such a dependency on many pacakges built with NVCC. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym? This patch is necessary to get rid of nvcc, otherwise:

No it shouldn't, not with the other PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible it's a result of my making NVCC a single output? As

Considered in the other thread, inclined to reject because the outputs were [ "out" "static" ], no dev and no include

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then 100% reject after Gaetan finished bisection and found that the regression happened after merging 12.4 -> 12.8

Comment on lines +325 to +326
# This string makes cuda_nvcc a runtime dependency of onnxruntime.
# PR #457424 (https://github.com/NixOS/nixpkgs/commit/e617b8c8a53049ec10773bde26a22cce56410757)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nooo I meant the 12.4->12.8 PR. Basically, the discussion and the result of bisection. The propagatedBuildInputs issue is probably irrelevant to the reader of onnxruntime, we just happened to mix them up

# fixed over-propagation of cudaPackages.cuda_nvcc, but this is necessary to effectively prevent
# nccl's runtime dependency on nvcc.
postFixup = lib.optionalString cudaSupport ''
remove-references-to -t "${cudaPackages.cuda_nvcc}" ''${!outputLib}/lib/libonnxruntime_providers_cuda.so
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok since I'm complaining about the comment anyway, here's 1 more: let's disallowRequisites so that next time we notice the error before it reaches firefox, and we don't have to run an entire investigation to decide whether the reference is legitimate

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: was meant for nccl

@graham33
Copy link
Contributor

graham33 commented Nov 4, 2025

FWIW this PR fixed my nixos build on aarch64 (I was hitting the firefox issue). Thanks for the hard work all.

@GaetanLepage GaetanLepage force-pushed the nccl-patch branch 2 times, most recently from c0579ee to 74bc118 Compare November 4, 2025 22:39
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. and removed 10.rebuild-darwin: 11-100 This PR causes between 11 and 100 packages to rebuild on Darwin. labels Nov 4, 2025
Copy link
Contributor

@SomeoneSerge SomeoneSerge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direct reference in onnxruntime reappers when cudaCapabilities is extended from "8.6" "8.9" to the full default: #457424 (comment)

So I'm in faovur of merging as is

@nixpkgs-ci nixpkgs-ci bot added 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages. labels Nov 4, 2025
@GaetanLepage
Copy link
Contributor Author

nixpkgs-review result

Generated using nixpkgs-review.

Command: nixpkgs-review pr 457803 --extra-nixpkgs-config '{ allowUnfree = true; cudaSupport = true; }' --package firefox
Commit: 1eb39d0ea75b8e0646bc62bbe345aa6af146f3da


x86_64-linux

✅ 1 package built:
  • firefox

@mdaniels5757
Copy link
Member

Re-running CI due to a since-fixed unrelated treefmt-nix failure

@github-project-automation github-project-automation bot moved this from New to ✅ Done in CUDA Team Nov 5, 2025
@mdaniels5757 mdaniels5757 reopened this Nov 5, 2025
@github-project-automation github-project-automation bot moved this from ✅ Done to 📋 The forgotten in CUDA Team Nov 5, 2025
@ConnorBaker ConnorBaker added this pull request to the merge queue Nov 5, 2025
Merged via the queue into NixOS:master with commit c81871f Nov 5, 2025
49 of 57 checks passed
@github-project-automation github-project-automation bot moved this from 📋 The forgotten to ✅ Done in CUDA Team Nov 5, 2025
@GaetanLepage GaetanLepage deleted the nccl-patch branch November 5, 2025 08:22
postFixup = lib.optionalString cudaSupport ''
remove-references-to -t "${lib.getBin cuda_nvcc}" ''${!outputLib}/lib/libonnxruntime_providers_cuda.so
'';
disallowedRequisites = [ (lib.getBin cuda_nvcc) ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disallowedRequisites = [ (lib.getBin cuda_nvcc) ];
disallowedRequisites = lib.optionals cudaSupport [ (lib.getBin cuda_nvcc) ];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be !cudaSupport ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matdibu no, the irony is that evaluating disallowedRequisites without wegank's fix would require allowUnfree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that makes sense, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @wegank. Opened #458835.

@mweinelt
Copy link
Member

mweinelt commented Nov 5, 2025

With 1eb39d0 CUDA transitively leaks into the default Firefox build, which removes the job from the unstable jobset, which breaks the channel.

❯ nix-instantiate -A firefox-unwrapped
error:
       … while evaluating an expression to select 'drvPath' on it
         at «internal»:1:552:
       … while evaluating strict
         at «internal»:1:552:
       (stack trace truncated; use '--show-trace' to show the full trace)

       error: Package ‘cuda12.8-cuda_nvcc-12.8.93’ in /home/hexa/git/nixos/master/pkgs/development/cuda-modules/packages/cuda_nvcc.nix:175 has an unfree license (‘CUDA EULA’), refusing to evaluate.

       a) To temporarily allow unfree packages, you can use an environment variable
          for a single invocation of the nix tools.

            $ export NIXPKGS_ALLOW_UNFREE=1

          Note: When using `nix shell`, `nix build`, `nix develop`, etc with a flake,
                then pass `--impure` in order to allow use of environment variables.

       b) For `nixos-rebuild` you can set
         { nixpkgs.config.allowUnfree = true; }
       in configuration.nix to override this.

       Alternatively you can configure a predicate to allow specific packages:
         { nixpkgs.config.allowUnfreePredicate = pkg: builtins.elem (lib.getName pkg) [
             "cuda_nvcc"
           ];
         }

       c) For `nix-env`, `nix-build`, `nix-shell` or any other Nix command you can add
         { allowUnfree = true; }
       to ~/.config/nixpkgs/config.nix.


       note: trace involved the following derivations:
       derivation 'firefox-unwrapped-144.0.2'
       derivation 'onnxruntime-1.22.2'

@LovingMelody
Copy link
Member

LovingMelody commented Nov 5, 2025

Just attempting building Firefox since this PR made its way to nixos-unstable-small, it seems like the onmxruntime tests are failing

[  FAILED  ] 17 tests, listed below:
[  FAILED  ] QDQTransformerTests.Conv_U8X8U8
[  FAILED  ] QDQTransformerTests.ConvMaxPoolReshape_UInt8
[  FAILED  ] QDQTransformerTests.ConvMaxPoolReshape_Int8
[  FAILED  ] QDQTransformerTests.ConvRelu
[  FAILED  ] QDQTransformerTests.ConvAveragePoolReshape_UInt8
[  FAILED  ] QDQTransformerTests.ConvAveragePoolReshape_Int8
[  FAILED  ] QDQTransformerTests.ConvTranspose_QBackward
[  FAILED  ] QDQTransformerTests.QBackward_MutilpleSteps
[  FAILED  ] QDQTransformerTests.ConvTranspose_DQForward
[  FAILED  ] QDQTransformerTests.DQForward_MutilpleSteps
[  FAILED  ] NhwcTransformerTests.Conv
[  FAILED  ] NhwcTransformerTests.ConvBlockBinary
[  FAILED  ] NhwcTransformerTests.ConvMaxPool
[  FAILED  ] NhwcTransformerTests.ConvGlobalAveragePool
[  FAILED  ] NhwcTransformerTests.ConvAveragePool
[  FAILED  ] NhwcTransformerTests.ConvPad
[  FAILED  ] NhwcTransformerTests.ConvBlockActivation

Unsure if it's from this PR though

EDIT for clarification: This is the Firefox package directly, cuda support is not enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. 12.approvals: 1 This PR was reviewed and approved by one person. 12.approved-by: package-maintainer This PR was reviewed and approved by a maintainer listed in any of the changed packages.

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Build failure when CUDA enabled: firefox

9 participants