Skip to content

libcxx: link correct libcxxabi version#201268

Merged
vcunat merged 3 commits intoNixOS:stagingfrom
music-tribe:libcxx-link-libcxxabi
Nov 28, 2022
Merged

libcxx: link correct libcxxabi version#201268
vcunat merged 3 commits intoNixOS:stagingfrom
music-tribe:libcxx-link-libcxxabi

Conversation

@joshchngs
Copy link
Contributor

@joshchngs joshchngs commented Nov 15, 2022

Description of changes

This is a follow-up to @stephank's PR here: #196909
Firstly, skip the dylib fixups introduced in that PR for symlinks. This is the same behaviour as the standard fixDarwinDylibNames hook.

The main change is to do the same fixup on libc++.dylib as done on libc++abi.dylib
Current behaviour emits a /nix/store/$hash-libc++-$version/lib/libc++.dylib which loads libc++abi from the build stdenv, which on Darwin is /nix/store/$hash-libcxx-11.1.0/lib/libc++abi.dylib – not necessarily the correct version.

Combined with the symlink breakage, using llvmPackages_14.libcxxStdenv in my flake causes dyld to partially load 2 copies of libc++abi. It only seems to load the BSS segment twice, so most things work, apart from a very subtle bug inside std::current_exception().

Things done
  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandbox = true set in nix.conf? (See Nix manual)
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 22.11 Release Notes (or backporting 22.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
    • (Release notes changes) Ran nixos/doc/manual/md-to-db.sh to update generated release notes
  • Fits CONTRIBUTING.md.

@joshchngs
Copy link
Contributor Author

@stephank FYI I tried to pass in cctools to get an absolute path to otool, but it just caused horrific breakages. I didn't look into it much, but the build stdenv wasn't able to compile its test programs because the C headers were messed up.

@ofborg ofborg bot added the 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild on Darwin and must target a staging branch. label Nov 15, 2022
@ofborg ofborg bot added 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. labels Nov 15, 2022
@joshchngs joshchngs force-pushed the libcxx-link-libcxxabi branch from 3fc936d to e496998 Compare November 15, 2022 10:57
Copy link
Contributor

@stephank stephank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and appears to work well on my Swift branch. (Tested on aarch64-darwin)

This won't make 22.11. Is that okay for you?

@joshchngs
Copy link
Contributor Author

I'm still a bit unclear on the release model tbh, but as long as it goes into one of the cached nixpkgs branches I'm good.

However if it can go in I think it should. Any packages built with a libcxxStenv, with any LLVM version apart from 11 will have runtime bugs. There may be others, but the bug I hit looks like:

  • throw exception
  • catch(...)
  • std::current_exception() == nullptr

This happens because the libcxxabi BSS segment gets loaded twice, once from each linked version, and the TLS references in the _cxa machinery resolve different addresses for the key/flag storage in throw code Vs catch code.

@stephank
Copy link
Contributor

Yeah, that's pretty bad. I was unable to get around to this sooner because of circumstances.

Ping @vcunat. I believe the last staging-next merge is happening today (#200868). This is a darwin stdenv rebuild. How do you feel about merging this into staging-next before it lands?

Devil's advocate: As mentioned, it only happens on non-default v11 LLVM, and only on Darwin I believe. It may be tempting to call this niche, also because it's gone under the radar for quite some time.

@vcunat
Copy link
Member

vcunat commented Nov 21, 2022

Rebuilding all of *-darwin takes several days (if Hydra didn't do anything else). And it's not just about doing the rebuilds... because after that one can often find some regressions. Either way, it would significantly interfere with the release process, so it's more of a question for @mweinelt than myself. (And I haven't really looked at what this PR does.)

There might be other motivation to do full *-darwin rebuild anyway, with similar choices of doing it now or a bit later: PR #202126

@mweinelt
Copy link
Member

IMO via staging and staging-22.11, once it exists later tonight.

@stephank
Copy link
Contributor

@stephank FYI I tried to pass in cctools to get an absolute path to otool, but it just caused horrific breakages. I didn't look into it much, but the build stdenv wasn't able to compile its test programs because the C headers were messed up.

Yeah, directly including cctools as an input will almost always fail. The tools normally have wrapped versions already in PATH, and the wrappers ensure all the Nix-specific search paths are present. Adding cctools as an input puts unwrapped tools in PATH before the wrapped tools.

But this is about the stdenvBootstrapTools issue, right?

I was able to get a shell for the failing derivation, and the issue is that aarch64-darwin is bootstrapped from x86_64-darwin, so the tools all have target prefixes like a cross toolchain. I think we can fix the build by simply doing ${targetPrefix}otool. (We already do this for install_name_tool everywhere I believe.)

I can make a PR for that, but would have to wait for this one to go through. You could also add it to this branch, if you want.

@vcunat
Copy link
Member

vcunat commented Nov 22, 2022

It might make sense to switch to native bootstrapping instead. On Hydra side we don't even have native x86_64-darwin anymore but just rosetta emulation.

@joshchngs
Copy link
Contributor Author

But this is about the stdenvBootstrapTools issue, right?

I was able to get a shell for the failing derivation, and the issue is that aarch64-darwin is bootstrapped from x86_64-darwin, so the tools all have target prefixes like a cross toolchain. I think we can fix the build by simply doing ${targetPrefix}otool. (We already do this for install_name_tool everywhere I believe.)

I presume we're talking about the same Hydra failures after your PR #196909

I can make a PR for that, but would have to wait for this one to go through. You could also add it to this branch, if you want.

I've made the change on this branch, running a test build locally before I push it.

@stephank
Copy link
Contributor

It might make sense to switch to native bootstrapping instead. On Hydra side we don't even have native x86_64-darwin anymore but just rosetta emulation.

@vcunat I took as stab at this in #202347, but think it requires a little bit more work.

Even if it worked first try, I think having both fixes doesn't hurt. 🙂

Same adjustment as made for libc++abi in NixOS#185766, for the same reason:
the unamended dylib links to the libc++abi in the build stdenv, which
is the wrong version.

Tested on Darwin with LLVM 14 stdenv, but the phase is added to all
versions, including 11 - so this will cause a mass rebuild.

See: NixOS#185766
Apart from being a no-op, this seems to cause the install to make a
copy, rather than keeping the symlinks intact.
This is to fix the stdenvBootstrapTools issue where otool & others are
not available in PATH, but only under the targetPrefix.
@joshchngs joshchngs force-pushed the libcxx-link-libcxxabi branch from e496998 to 2252245 Compare November 22, 2022 21:38
@joshchngs
Copy link
Contributor Author

joshchngs commented Nov 22, 2022

I'm hitting this problem with dejagnu. Nothing to do with the targetPrefix though, it's the same without changes. I don't understand the issue, and my flake appears to still build OK against this version. Pushing anyway, assuming CI will stop if there's an issue.

Edit: also rebased onto recent staging

@github-actions
Copy link
Contributor

Successfully created backport PR #203425 for staging-22.11.

@Janik-Haag Janik-Haag added the 12.first-time contribution This PR is the author's first one; please be gentle! label Jun 12, 2023
@rrbutani rrbutani added the 6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related label May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: llvm/clang Issues related to llvmPackages, clangStdenv and related 10.rebuild-darwin: 501+ This PR causes many rebuilds on Darwin and should normally target the staging branches. 10.rebuild-darwin: 5001+ This PR causes many rebuilds on Darwin and must target the staging branches. 10.rebuild-darwin-stdenv This PR causes stdenv to rebuild on Darwin and must target a staging branch. 10.rebuild-linux: 101-500 This PR causes between 101 and 500 packages to rebuild on Linux. 12.first-time contribution This PR is the author's first one; please be gentle!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants