{cudaPackages.nccl, onnxruntime}: remove reference to nvcc in binary#457803
{cudaPackages.nccl, onnxruntime}: remove reference to nvcc in binary#457803ConnorBaker merged 2 commits intoNixOS:masterfrom
Conversation
dfba773 to
3c0b776
Compare
|
Well, This can still be merged as is @ConnorBaker as its making progress. |
| postFixup = lib.optionalString cudaSupport '' | ||
| remove-references-to -t "${cudaPackages.cuda_nvcc}" $out/lib/libonnxruntime_providers_cuda.so | ||
| ''; |
There was a problem hiding this comment.
Omit this 1 because we already found where it comes from. Well to some precision...
There was a problem hiding this comment.
Wdym? This patch is necessary to get rid of nvcc, otherwise:
❮ nix why-depends --precise $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A onnxruntime) $(nom-build --arg config '{ allowUnfree = true; cudaSupport = true; }' -A cudaPackages.cuda_nvcc)
Finished at 23:00:59 after 1s
Finished at 23:00:59 after 0s
/nix/store/rg1j1d4cf4l164zpbbri77yym2kpaqcb-onnxruntime-1.22.2
└───lib/libonnxruntime_providers_cuda.so: …,rt,@.3dev......! -L /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-9.......c_nvcc-..../bin/..//lib…
→ /nix/store/a9d5nqjvd81kq3rxpch647xxasvfvvpi-cuda12.8-cuda_nvcc-12.8.93
There was a problem hiding this comment.
Any idea why NVCC is doing this now?
Is it possible it's a result of my making NVCC a single output? As in, whereas previously references were to a distinct lib, include, or bin output and Nix's closure scanning would keep those dependencies, but not the dev output which contained the setup hook?
EDIT: In particular, I'm concerned the dependency on NVCC is not some quirk of ONNX Runtime's build system or packaging, but rather an issue with the way NVCC is packaged or used in Nixpkgs, and that we'll see such a dependency on many pacakges built with NVCC. What are your thoughts?
There was a problem hiding this comment.
Wdym? This patch is necessary to get rid of nvcc, otherwise:
No it shouldn't, not with the other PR?
There was a problem hiding this comment.
Is it possible it's a result of my making NVCC a single output? As
Considered in the other thread, inclined to reject because the outputs were [ "out" "static" ], no dev and no include
There was a problem hiding this comment.
And then 100% reject after Gaetan finished bisection and found that the regression happened after merging 12.4 -> 12.8
3c0b776 to
37716a9
Compare
| # This string makes cuda_nvcc a runtime dependency of onnxruntime. | ||
| # PR #457424 (https://github.com/NixOS/nixpkgs/commit/e617b8c8a53049ec10773bde26a22cce56410757) |
There was a problem hiding this comment.
Nooo I meant the 12.4->12.8 PR. Basically, the discussion and the result of bisection. The propagatedBuildInputs issue is probably irrelevant to the reader of onnxruntime, we just happened to mix them up
| # fixed over-propagation of cudaPackages.cuda_nvcc, but this is necessary to effectively prevent | ||
| # nccl's runtime dependency on nvcc. | ||
| postFixup = lib.optionalString cudaSupport '' | ||
| remove-references-to -t "${cudaPackages.cuda_nvcc}" ''${!outputLib}/lib/libonnxruntime_providers_cuda.so |
There was a problem hiding this comment.
Ok since I'm complaining about the comment anyway, here's 1 more: let's disallowRequisites so that next time we notice the error before it reaches firefox, and we don't have to run an entire investigation to decide whether the reference is legitimate
There was a problem hiding this comment.
Edit: was meant for nccl
|
FWIW this PR fixed my nixos build on aarch64 (I was hitting the firefox issue). Thanks for the hard work all. |
c0579ee to
74bc118
Compare
74bc118 to
1eb39d0
Compare
SomeoneSerge
left a comment
There was a problem hiding this comment.
The direct reference in onnxruntime reappers when cudaCapabilities is extended from "8.6" "8.9" to the full default: #457424 (comment)
So I'm in faovur of merging as is
|
|
Re-running CI due to a since-fixed unrelated treefmt-nix failure |
| postFixup = lib.optionalString cudaSupport '' | ||
| remove-references-to -t "${lib.getBin cuda_nvcc}" ''${!outputLib}/lib/libonnxruntime_providers_cuda.so | ||
| ''; | ||
| disallowedRequisites = [ (lib.getBin cuda_nvcc) ]; |
There was a problem hiding this comment.
| disallowedRequisites = [ (lib.getBin cuda_nvcc) ]; | |
| disallowedRequisites = lib.optionals cudaSupport [ (lib.getBin cuda_nvcc) ]; |
There was a problem hiding this comment.
shouldn't it be !cudaSupport ?
There was a problem hiding this comment.
@matdibu no, the irony is that evaluating disallowedRequisites without wegank's fix would require allowUnfree
There was a problem hiding this comment.
ah, that makes sense, thank you!
|
With 1eb39d0 CUDA transitively leaks into the default Firefox build, which removes the job from the unstable jobset, which breaks the channel. |
|
Just attempting building Firefox since this PR made its way to nixos-unstable-small, it seems like the onmxruntime tests are failing Unsure if it's from this PR though EDIT for clarification: This is the Firefox package directly, cuda support is not enabled. |
Things done
Since CUDA 12.8, the
.nv_fatbinsection oflibnccl.socontains a reference to thecuda_nvccderivation:This makes
cuda_nvcc(andcudaPackages.stdenv.cc, i.e.gcc-wrapper) a runtime dependency ofcudaPackages.nccl.Although not being desirable in principle, it actually breaks
firefox, which hasdisallowedRequisites = [ stdenv.cc ].This patch simply removes all references to the
cuda_nvccout path fromcudaPackages.nccl'slibnccl.sobinary.With this patch:
Fixes #457406
Related: #457424
cc @SomeoneSerge @ConnorBaker
passthru.tests.nixpkgs-reviewon this PR. See nixpkgs-review usage../result/bin/.Add a 👍 reaction to pull requests you find important.