Switched to static linking of llvm#100
Conversation
|
@conda-forge-admin, please rerender |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
ppc64le still broken, will look into it |
build works locally on cross-compile and goes further than action runner. Test fail (as expected) due to invalid binary format. |
|
I will try to dynamic link on ppc64le still. I think the native build has issues. The local crossbuild on my machine works though, but I don't want to sacrifice the tests, which I would need to do, if I let the github runner build on linux64. Reverted the PR to draft, because there will be some noise. |
|
@conda-forge-admin, please rerender |
|
Hi! This is the friendly automated conda-forge-webservice. I tried to rerender for you, but it looks like there was nothing to do. This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13329229741. Examine the logs at this URL for more detail. |
05f65d3 to
7673ab6
Compare
…nda-forge-pinning 2025.02.12.20.08.11
7673ab6 to
2a97a9a
Compare
|
@jakirkham : This is now ready for review. ppc64le still builds shared, because I was unable to static link on a native ppc64le platform. |
jakirkham
left a comment
There was a problem hiding this comment.
As already discussed at length previously ( #73 ), will not be accepting a change to static linking
Think we need to find a better way to address the issue users are encountering. One avenue may be changing the linker flags (stripping, symbol visibility, etc.)
|
My analysis showed that the cause of the issue might not be in llvmlite, so changing the flags here won't change much. The issue is that libllvm15 links into libllvm19, so I presume the right place for a fix is there. If my analysis is right libllvm15 should be in conflict with libllvm19, as it is not safe to have both in the same environment. Until this fix is found it is desastrous for the conda-forge ecosystem not to provide a workaround. Currently not merging this means that people using pytorch and numba in the same conda environment either
numba and pytorch is a very common combination. Therefore: Please reconsider merging this. It can be reverted once the issue is solved. |
|
Also in #73 you write: Not following the argument for static linking" The reason this is working is only because numba and pytorch-cuda required the same libllvm version at this precise time. When libllvm14 was used by llvmlite and libllvm15 was used by pytorch-cuda, it broke. Now the same thing happens with libllvm15 and 19. |
Pity you did not respond on the thread I opened in the core channel. I will bring this to a vote within core. |
|
If this is indeed an LLVM issue, let's file on that feedstock. If there was already an issue filed and it was closed prematurely, am happy to reopen (provided a link) |
|
This is the issue I opened on the llvm-dev feedstock: conda-forge/llvmdev-feedstock#312 |
|
I am happy to take on the maintenance burden of pushing updates to the llvm versions for static linking if that helps unblock this. |
|
OK friends. I have another idea on how to ease the maintenance burden of static linking so we can unblock this PR. I have added a new feature to the bot in this PR: conda-forge/conda-forge-bot#3755. It allows the bot to update static libs according to an abstract spec as follows. In the extra:
static_linking_host_requirements:
- llvmdev 15.*
- llvm 15.*This specifies the abstract requirements for the static library you want in host. Then in your recipe, you list the exact packages you care about like this requirements:
host:
- python
- setuptools
- llvmdev 15.0.7 h2621b3d_4 # [osx and arm64]
- llvm 15.0.7 h4a7a88c_4 # [osx and arm64]
- llvmdev 15.0.7 hbedff68_4 # [osx and x86]
- llvm 15.0.7 hed0f868_4 # [osx and x86]The bot then does the following computation:
The result of this is an update to the host section like this: requirements:
host:
- python
- setuptools
- llvmdev 15.0.7 h4429f82_5 # [osx and arm64]
- llvm 15.0.7 h0cf516b_5 # [osx and arm64]
- llvmdev 15.0.7 hc29ff6c_5 # [osx and x86]
- llvm 15.0.7 hb21d583_5 # [osx and x86]
- zlib
- vs2015_runtime # [win]The bot stores the new static lib versions it used to update the feedstock as part of the PR info it has. This should prevent it from issuing duplicate PRs. Further, by bailing if not all of the versions+build numbers of the new static libs match, we should prevent PRs being issued in the middle of a build of the static libs on the backend. Finally, by restricting the search for updates to the static libs to the abstract specs in extra, the bot will only issue an update for increases in minor+patch versions and/or build numbers in the say If this is of interest to you all, let me know and I can finish up the bot PR, make a PR to this feedstock, and we can start trying it out. cc @isuruf @h-vetinari @jakirkham @conda-forge/llvmlite |
|
Thank you for trying to find a solution on this @beckermr! I'm fine with whatever setup that lets us get rid of the segfaults. To me the |
|
Hi! This is the friendly automated conda-forge-linting service. I wanted to let you know that I linted all conda-recipes in your PR ( Here's what I've got... For recipe/meta.yaml:
This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/13569596802. Examine the logs at this URL for more detail. |
|
@isuruf ppc64le is not statically linked in these builds due to some error @timostrunk had when they tried above. |
|
PR to fix the linter: conda-forge/conda-smithy#2253 |
|
OK. This one is all green or will be soon. I do not want to merge when another maintainer has requested changes. @jakirkham Can you look at what we've done here and reconsider your review possibly? We've enabled fully automatic updates via the bot and I've added myself to the feedstock to manage things around that as I expect we'll encounter a few bugs along the way. |
|
@conda-forge-admin relint |
|
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
|
Pinging @jakirkham again here to get the discussion rolling again. Could you please comment on whether you are ok with this setup? |
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | ||
| CXXFLAGS="${CXXFLAGS} -fplt" |
There was a problem hiding this comment.
@isuruf Any idea why this was needed? Would be good to leave a comment.
There was a problem hiding this comment.
Think this is because we normally pass -fno-plt in the CXXFLAGS. This actually done for all Linux architectures
However there have been some cases where this doesn't work on linux_ppc64le (notably with LLVM). Please see this bug:
Think Isuru is adding -fplt as a quick way of overriding the -fno-plt behavior. Adding -fplt is a bit quicker than what we normally do, which is remove the -fno-plt flag
Since we have figured out that this works, think we should adopt the syntax that we have elsewhere and remove -fno-plt from flags. This will also make it easier for future readers to find more context
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | |
| CXXFLAGS="${CXXFLAGS} -fplt" | |
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | |
| # Taken from llvmdev's recipe | |
| # https://github.com/conda-forge/llvmdev-feedstock/blob/8c2c0f2db9db1fdf12289381dcee4e2d9a2e5fec/recipe/build.sh#L29-L33 | |
| # disable `-fno-plt` due to some GCC bug causing linker errors, see | |
| # https://github.com/llvm/llvm-project/issues/51205 | |
| CFLAGS="$(echo $CFLAGS | sed 's/-fno-plt //g')" | |
| CXXFLAGS="$(echo $CXXFLAGS | sed 's/-fno-plt //g')" |
There was a problem hiding this comment.
Should add have committed the syntax change above. Though leaving unresolved so the thread remains visible
|
Had a side chat related to this PR as well as consulted @gmarkall who is a numba and llvmlite maintainer:
|
I think only one version of the symbol can exist in a process - so, my understanding is that if a symbol has already been resolved with the LLVM 15-specific version, it's not going to be resolved again if an LLVM 19-specific caller has a relocation to a symbol of the same name that subsequently needs to be resolved. So symbol versioning doesn't help in our situation here. |
|
After some sidebar conversations with various folks, I think we've reached a consensus of sorts. The plan as I understand it is to:
Thanks everyone for working hard on this tricky issue! |
|
Per further discussion, I will merge this PR on Monday, March 24, 2025. Happy weekend! |
jakirkham
left a comment
There was a problem hiding this comment.
Apologies for the slow reply here
Met late last week with both Keith and Matt to discuss the changes and maintenance here
Agree that static linking is the least bad option we can come up with atm. So agree we should do that
There were a couple questions that had come up when we looked at some of the changes here. It took a bit longer to dig into these. Have made comments on them below
Also agree it would be great to have Matt on as a maintainer. Would like to add Keith as well (if he agrees). This should help with keeping up on changes here
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | ||
| CXXFLAGS="${CXXFLAGS} -fplt" |
There was a problem hiding this comment.
Think this is because we normally pass -fno-plt in the CXXFLAGS. This actually done for all Linux architectures
However there have been some cases where this doesn't work on linux_ppc64le (notably with LLVM). Please see this bug:
Think Isuru is adding -fplt as a quick way of overriding the -fno-plt behavior. Adding -fplt is a bit quicker than what we normally do, which is remove the -fno-plt flag
Since we have figured out that this works, think we should adopt the syntax that we have elsewhere and remove -fno-plt from flags. This will also make it easier for future readers to find more context
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | |
| CXXFLAGS="${CXXFLAGS} -fplt" | |
| elif [[ "${target_platform}" == linux-ppc64le ]]; then | |
| # Taken from llvmdev's recipe | |
| # https://github.com/conda-forge/llvmdev-feedstock/blob/8c2c0f2db9db1fdf12289381dcee4e2d9a2e5fec/recipe/build.sh#L29-L33 | |
| # disable `-fno-plt` due to some GCC bug causing linker errors, see | |
| # https://github.com/llvm/llvm-project/issues/51205 | |
| CFLAGS="$(echo $CFLAGS | sed 's/-fno-plt //g')" | |
| CXXFLAGS="$(echo $CXXFLAGS | sed 's/-fno-plt //g')" |
| - marcelotrevisani | ||
| - xhochy | ||
| - mbargull | ||
| - beckermr |
There was a problem hiding this comment.
@kkraus14 is it ok if we add you to the maintainers here?
| - beckermr | |
| - beckermr | |
| - kkraus14 |
|
@conda-forge-admin , please re-render |
…nda-forge-pinning 2025.03.21.21.56.39
|
@conda-forge-admin rerender |
|
Hi! This is the friendly automated conda-forge-webservice. I tried to rerender for you, but it looks like there was nothing to do. This message was generated by GitHub Actions workflow run https://github.com/conda-forge/conda-forge-webservices/actions/runs/14034002527. Examine the logs at this URL for more detail. |
|
I am happy to add @kkraus14 in another PR if he'd like. Going to merge. |
I switched to static linking of libllvm here and added it to the ignore_run_exports on unix. I did not change the build behaviour on Windows as it seems to already build with default settings there and I have no means to test it.
This fixes #99 and #84.
Reasons against this PR:
Reasons to merge this PR:
Checklist
0(if the version changed)conda-smithy(Use the phrase@conda-forge-admin, please rerenderin a comment in this PR for automated rerendering)