Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions pkgs/by-name/on/onnxruntime/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
perl,
pkg-config,
python3Packages,
removeReferencesTo,
re2,
zlib,
protobuf,
Expand Down Expand Up @@ -44,6 +45,7 @@ let

stdenv = throw "Use effectiveStdenv instead";
effectiveStdenv = if cudaSupport then cudaPackages.backendStdenv else inputs.stdenv;
inherit (cudaPackages) cuda_nvcc;

cudaArchitecturesString = cudaPackages.flags.cmakeCudaArchitecturesString;

Expand Down Expand Up @@ -121,6 +123,7 @@ effectiveStdenv.mkDerivation rec {
++ lib.optionals cudaSupport [
cudaPackages.cuda_nvcc
cudaPackages.cudnn-frontend
removeReferencesTo
]
++ lib.optionals isCudaJetson [
cudaPackages.autoAddCudaCompatRunpath
Expand Down Expand Up @@ -318,6 +321,12 @@ effectiveStdenv.mkDerivation rec {
../include/onnxruntime/core/session/onnxruntime_*.h
'';

# See comments in `cudaPackages.nccl`
postFixup = lib.optionalString cudaSupport ''
remove-references-to -t "${lib.getBin cuda_nvcc}" ''${!outputLib}/lib/libonnxruntime_providers_cuda.so
'';
Comment on lines +325 to +327
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Omit this 1 because we already found where it comes from. Well to some precision...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym? This patch is necessary to get rid of nvcc, otherwise:

❮ nix why-depends --precise $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A onnxruntime) $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.cuda_nvcc)
Finished at 23:00:59 after 1s
Finished at 23:00:59 after 0s
/nix/store/rg1j1d4cf4l164zpbbri77yym2kpaqcb-onnxruntime-1.22.2
└───lib/libonnxruntime_providers_cuda.so: …,rt,@.3dev......! -L /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-9.......c_nvcc-..../bin/..//lib…
    → /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93

Copy link
Contributor

@ConnorBaker ConnorBaker Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea why NVCC is doing this now?
Is it possible it's a result of my making NVCC a single output? As in, whereas previously references were to a distinct lib, include, or bin output and Nix's closure scanning would keep those dependencies, but not the dev output which contained the setup hook?

EDIT: In particular, I'm concerned the dependency on NVCC is not some quirk of ONNX Runtime's build system or packaging, but rather an issue with the way NVCC is packaged or used in Nixpkgs, and that we'll see such a dependency on many pacakges built with NVCC. What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdym? This patch is necessary to get rid of nvcc, otherwise:

No it shouldn't, not with the other PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible it's a result of my making NVCC a single output? As

Considered in the other thread, inclined to reject because the outputs were [ "out" "static" ], no dev and no include

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then 100% reject after Gaetan finished bisection and found that the regression happened after merging 12.4 -> 12.8

disallowedRequisites = [ (lib.getBin cuda_nvcc) ];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disallowedRequisites = [ (lib.getBin cuda_nvcc) ];
disallowedRequisites = lib.optionals cudaSupport [ (lib.getBin cuda_nvcc) ];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't it be !cudaSupport ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matdibu no, the irony is that evaluating disallowedRequisites without wegank's fix would require allowUnfree

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, that makes sense, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch @wegank. Opened #458835.


passthru = {
inherit cudaSupport cudaPackages ncclSupport; # for the python module
inherit protobuf;
Expand Down
16 changes: 16 additions & 0 deletions pkgs/development/cuda-modules/packages/nccl.nix
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
flags,
lib,
python3,
removeReferencesTo,
which,
# passthru.updateScript
gitUpdater,
Expand Down Expand Up @@ -72,6 +73,7 @@ backendStdenv.mkDerivation (finalAttrs: {
nativeBuildInputs = [
cuda_nvcc
python3
removeReferencesTo
which
];

Expand Down Expand Up @@ -117,8 +119,22 @@ backendStdenv.mkDerivation (finalAttrs: {
postFixup = ''
_overrideFirst outputStatic "static" "lib" "out"
moveToOutput lib/libnccl_static.a "''${!outputStatic:?}"
''
# Since CUDA 12.8, the cuda_nvcc path leaks in:
# - libnccl.so's .nv_fatbin section
# - libnccl_static.a
# &devrt -L /nix/store/00000000000000000000000000000000-...nvcc-.../bin/...
# This string makes cuda_nvcc a runtime dependency of nccl.
# See https://github.com/NixOS/nixpkgs/pull/457803
+ ''
remove-references-to -t "${lib.getBin cuda_nvcc}" \
''${!outputLib}/lib/libnccl.so.* \
''${!outputStatic}/lib/*.a
'';

# C.f. remove-references-to above. Ensure *all* references to cuda_nvcc are removed
disallowedRequisites = [ (lib.getBin cuda_nvcc) ];

passthru = {
platformAssertions = [
{
Expand Down
Loading