python3Packages.jax: fix libstdc++ mismatch when built with CUDA#225661
python3Packages.jax: fix libstdc++ mismatch when built with CUDA#225661samuela merged 2 commits intoNixOS:masterfrom
Conversation
|
Result of 4 packages built:
|
❯ nix-build with-my-cuda.nix -A python3Packages.jax
/nix/store/i3sg8xpx1fzva7yjp2wdxcprd63c9kg8-python3.10-jax-0.4.1
|
|
CC @NixOS/cuda-maintainers |
samuela
left a comment
There was a problem hiding this comment.
woohoo! can't wait to have JAX fixed!
| sed -i 's@include/pybind11@pybind11@g' $src | ||
| done | ||
| '' + lib.optionalString cudaSupport '' | ||
| export NIX_LDFLAGS+=" -L${backendStdenv.nixpkgsCompatibleLibstdcxx}/lib" |
There was a problem hiding this comment.
Does the order matter here? I'm assuming stdenv's (undesirable) libstdc++ will also be in NIX_LDFLAGS somewhere?
There was a problem hiding this comment.
I don't think there's any extra -L in NIX_LDFLAGS, but maybe I should quickly check it by adding an echo somewhere
| backendStdenv = final.callPackage ./stdenv.nix { | ||
| nixpkgsStdenv = prev.pkgs.stdenv; | ||
| nvccCompatibleStdenv = prev.pkgs.buildPackages."${finalVersion.gcc}Stdenv"; | ||
| nixpkgsCompatibleLibstdcxx = prev.pkgs.buildPackages.gcc.cc.lib; # Or is it targetPackages? |
There was a problem hiding this comment.
Well, what can I say: either at some point we get a comment from someone who understands cross-compilation, or eventually we start cross-compiling ourselves and find the right way empirically 😆 I tried asking on matrix, but I was clumsy and failed to attract attention of the right people
There was a problem hiding this comment.
Yeah cross-compilation is a mess! I honestly have no idea either :P The current comment looks a bit like a TODO that slipped through code review. Perhaps we could leave a comment to the effect of yours above ^
There was a problem hiding this comment.
IMO the way that crosscompilation works in nix doesn't fit modern projects anymore. I think the autotools-based approach comes from a time where there wasn't really remote execution, platforms/config split, offloading etc. This is why e.g. Bazel and Buck use a different model that makes it easier to transition between toolchains:
Seems like my concerns raised in #225074 weren't too far off after all 😅
There was a problem hiding this comment.
That's interesting, I'll have a look at the issue! All I can say about Bazel's toolchains is that there was some "tutorial: make a toolchain for X" page in their documentation that many a time I started reading, and I have not managed reach the end even once 🤣
There was a problem hiding this comment.
In practice, I don't know of any attempts of cross-compiling CUDA packages in nixpkgs. Personally, I haven't even a reason to try until we'd addressed #225915
There was a problem hiding this comment.
I might also still not be understanding the nix toolchains well enough though. I'm trying to come up with something that makes it easy to swap out LLVM/GGC/libstdc++/libcxx etc without needing all those different crosscompilation targets but also haven't found a good solution yet.
|
Result of 25 packages failed to build:
39 packages built:
|
Failed derivationsDetails
|
| nixpkgsCompatibleLibstdcxx = prev.pkgs.buildPackages.gcc.cc.lib; # Or is it targetPackages? | ||
| nvccCompatibleCC = prev.pkgs.buildPackages."${finalVersion.gcc}".cc; |
There was a problem hiding this comment.
I believe buildPackages is correct here, in both cases.
For nvccCompatibleCC: we want a copy of the compiler that runs on our build platform. I.e. it should come from a stage where hostPlatform is equal to our package's buildPlatform and where targetPlatform is equal to our package's hostPlatform; aka buildPackages.
nixpkgsCompatibleLibstdcxx is a little weirder; normally we grab libraries that will run on a package's hostPlatform from the current stage (aka pkgsHostTarget) but because libstdc++ comes from gcc which is a compiler, the libstdc++ that's produced actually is built for use on the gcc package's targetPlatform. So buildPackages here is correct.
There was a problem hiding this comment.
Such a relief to hear this! I was slowly arriving at this interpretation, but with every morning I feel confused about the build/host/target terminlogy again
| # older libstdc++. This, in practice, means that we should use libstdc++ from | ||
| # the same stdenv that the rest of nixpkgs uses. | ||
| # We currently do not try to support anything other than gcc and linux. | ||
| libcxx = nixpkgsCompatibleLibstdcxx; |
There was a problem hiding this comment.
Unfortunately I don't think libcxx here has the semantics you're looking for.
nixpkgsCompatibleLibstdcxx is added to propagatedDeps but I think that's it; If you look in cc-wrapper, libcxx is only consulted for libc++ include paths when libcxx.isLLVM is present and true:
nixpkgs/pkgs/build-support/cc-wrapper/default.nix
Lines 372 to 388 in 82fa717
Even if nixpkgsCompatibleLibstdcxx.isLLVM were present and true this wouldn't work as intended; the logic for libcxx in cc-wrapper (i.e. -stdlib, cxxabi, include/c++/v1) really is specific to clang and libc++.
Further, setting libcxx does not inhibit cc-wrapper from tacking on the gcc library paths where the cc's libraries (i.e. libstdc++ and friends) to cflags/ldflags.
Unfortunately I don't think cc-wrapper as it exists today has the right knobs to support this use case (cc = gcc but libstdc++ from a different gcc) but I think we can get pretty close with useCcForLibs and gccForLibs = gccFromWhichYouWantToGrabLibstdcxx. Here's a quick PoC (imperfect -- it seems to lose libc headers somehow -- but hopefully gets the point across): dc6a8f9
Note that with the commit above, using "${np.cudaPackages.backendStdenv.cc}/bin/g++ to compile binaries results in them being linked against libstdc++ from gcc12 (observable by running ldd on the binaries). With the PR as it exists right now, such binaries are still linked against libstdc++ from backendStdenv.cc.cc.
Depending on what the requirement here is (just a newer libstdc++? compiler builtins like libgcc_s too? etc) getting cc-wrapper to support this use case might make more sense than adding link options for the desired libstdc++ to packages in an ad-hoc way.
There was a problem hiding this comment.
Unfortunately I don't think libcxx here has the semantics you're looking for.
Yes, I was almost sure about that when I added the original line
nixpkgsCompatibleLibstdcxx is added to propagatedDeps but I think that's it
Indeed! And by sheer accident this has been sufficient for most of our broken cuda packages. This is why took a blind eye on the obvious erroneousness of the current "approach" and hastily merged #223664
I think it shouldn't be hard, though, to develop this into a correctly working stdenv while keeping all of the interfaces intact
Even if nixpkgsCompatibleLibstdcxx.isLLVM were present and true this wouldn't work as intended; this logic ... is specific to clang
I suspected so. Again, effectively we've only relied on the propagated build inputs so far, but we definitely should keep it this way for long
Unfortunately I don't think cc-wrapper as it exists today has the right knobs to support this use case
I was afraid of that. I think it would make sense to extend cc-wrapper for our use-case
Here's a quick PoC ... dc6a8f9
Oh wow, thank you so much! I think this is exactly the direction we should be heading in!
With the PR as it exists right now, such binaries are still linked against libstdc++ from backendStdenv.cc.cc
Thankfully, backendStdenv.cc.cc does not really link to its .lib and happily picks up any libstdc++ we put in downstream package's buildInputs... unless they are built with Bazel
But, as I admit above, this behaviour is accidental and we must not rely on it
There was a problem hiding this comment.
Depending on what the requirement here is (just a newer libstdc++? compiler builtins like libgcc_s too? etc) getting cc-wrapper to support this use case might make more sense than adding link options for the desired libstdc++ to packages in an ad-hoc way.
At this point, I think we only need to override what c++ stdlib gets written into runpaths of downstream derivations
There was a problem hiding this comment.
And by sheer accident this has been sufficient for most of our broken cuda packages. This is why took a blind eye on the obvious erroneousness of the current "approach" and hastily merged #223664
I think it shouldn't be hard, though, to develop this into a correctly working stdenv while keeping all of the interfaces intact
I think it would make sense to extend cc-wrapper for our use-case
But, as I admit above, this behaviour is accidental and we must not rely on it
Ah okay, thanks for clarifying; this is good to hear 🙂
There was a problem hiding this comment.
Thankfully,
backendStdenv.cc.ccdoes not really link to its.liband happily picks up any libstdc++ we put in downstream package'sbuildInputs... unless they are built with Bazel
Oh, interesting...
In general I'd think that your propagated deps would make their way into the linkopts that Bazel uses (propagatedDeps should get picked up by the bintools stdenv hooks and added to NIX_LDFLAGS which buildBazelPackage should map to linkopts).
Maybe it's an ordering thing? I think normally things specified in the NIX_LDFLAGS env var would have precedence over flags in libc-ldflags; with Bazel's linkopts this wouldn't be the case.
Easiest way to test would be to add NIX_LDFLAGS as an action_env in the Bazel options I think.
|
@rrbutani thank you so much for your reviews, they were more useful that you might suspect! I think we're going to merge this PR just with the ad hoc hacks for jaxlib (modulo |
d38f01f to
8fd02ce
Compare
|
no problem!
sounds good! I suspect tweaking |
|
@samuela I think we can merge this now then! By the way, I don't know if you noticed (it's been mostly discussed on matrix), but we've been maintaining tickets at CUDA Team (view) (thanks @ConnorBaker for the idea) |
|
|
|
Thanks so much @SomeoneSerge ! I use JAX a lot so I will def be using this fix! |
|
No time for a cc-wrapper PR yet, but opened an issue to track: #226165 |
Description of changes
Hopefully this will address #220341 for jax{,lib}
Things done
sandbox = trueset innix.conf? (See Nix manual)nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)