Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does pybind11-abi shield from using different versions of GCC #77

Open
tdegeus opened this issue Feb 21, 2022 · 24 comments
Open

Does pybind11-abi shield from using different versions of GCC #77

tdegeus opened this issue Feb 21, 2022 · 24 comments

Comments

@tdegeus
Copy link
Member

tdegeus commented Feb 21, 2022

I'm having problems due to different libraries having been compiled using different versions of GCC, see pybind/pybind11#3751 .

My question: does having

host:
- pybind11-abi

shield me from this in the future?

Is there a way to detect such an issue also for a local build of library being used to a build on conda-forge ?

@beckermr
Copy link
Member

beckermr commented Mar 5, 2022

Cc @conda-forge/core we may need to rethink this or do proper migrations for compiler bumps?

Colleagues of mine are seeing this too.

@tdegeus
Copy link
Member Author

tdegeus commented Mar 5, 2022

That would be great! This really send me down a swamp with a lot of hand-work that I seek to avoid with conda-forge

@beckermr
Copy link
Member

beckermr commented Mar 5, 2022

I wonder if we need to rebuild the Abi package with compiler bumps and then kick off a pybind11 Abi migration.

@beckermr
Copy link
Member

beckermr commented Mar 5, 2022

We’ll see. I don’t fully understand the constraints here.

@tdegeus
Copy link
Member Author

tdegeus commented Mar 5, 2022

A note is that I did not have the same issues on all platforms. So I don't know of certain compilers or certain updates are more prone to this

@isuruf
Copy link
Member

isuruf commented Mar 5, 2022

As mentioned in pybind/pybind11#3751 the fix is to turn off PYBIND11_BUILD_ABI.

@tdegeus
Copy link
Member Author

tdegeus commented Mar 5, 2022

The workaround I would say, it does come at a small risk I think

@isuruf
Copy link
Member

isuruf commented Mar 5, 2022

What small risk?

@tdegeus
Copy link
Member Author

tdegeus commented Mar 5, 2022

@isuruf
Copy link
Member

isuruf commented Mar 5, 2022

That's overly pessimistic. We have thousands of packages built with gcc 9 and they work fine with gcc 10 and pybind11 is the only package that gets affected this? That's highly unlikely.

@tdegeus
Copy link
Member Author

tdegeus commented Mar 5, 2022

You might be right, honestly I did not invest in understanding how rare the segmentation faults were, I was just naturally cautious. But in case you are right it should be common practice on conda-forge. Can this be done when the feedstock is built?

@henryiii
Copy link
Contributor

henryiii commented Mar 5, 2022

This was very much overly pessimistic. There were rare, hard to track down segfaults, but people were doing things like mixing different compilers, etc. So this was completely locked down (with the comment that it was overkill). If someone wants to more properly decide when things are compatible and when they are not, then that would be welcomed. One "special" thing about pybind11 though is that it depends on std::type_index to tell what is bound across extensions, and I'm not sure what stability guarantees that has.

@beckermr
Copy link
Member

beckermr commented Mar 7, 2022

We've found another issue with pybind11 across gcc major versions. It appears to be related to exception handling in C++ so not sure if that is the same as above. We are working on a minimal working example.

xref: conda-forge/admin-requests#400

@ktlim
Copy link

ktlim commented Mar 8, 2022

Here's my reproduction code for the exception handling.

Build handler.sh and example.sh in the same environment, run example.py, and get "In translator" (success).
Build handler.sh with g++ 9.4, example.sh with g++ 10.3, run example.py, and do not get "In translator" (failure).
minexample.txt

@beckermr
Copy link
Member

beckermr commented Mar 8, 2022

@isuruf This reminds of the exception passing issue we had on osx due to libcxx_abi differences. I wonder if maybe there is an ABI break in the equivalent libs for gcc10 and gcc9?

@isuruf
Copy link
Member

isuruf commented Mar 8, 2022

No. see the linked pybind11 issue of how to workaround this

@beckermr
Copy link
Member

beckermr commented Mar 8, 2022

Forgive my ignorance on this. Is your suggestion

  1. we set this conda-forge wide in our compiler flags or a similar mechanism
  2. we set this on a per package basis
  3. we put in a patch into the pybind11 feedstock to turn off some of its abi protections

@isuruf
Copy link
Member

isuruf commented Mar 8, 2022

  1. Build a new pybind11-abi version with CXXFLAGS="${CXXFLAGS} -DPYBIND11_COMPILER_TYPE= -DPYBIND11_BUILD_ABI=" in new pybind11-abi package
  2. Rebuild all feedstocks.
  3. Patch pybind11 when a new version drops.

@beckermr
Copy link
Member

beckermr commented Mar 8, 2022

I don't follow the above TBH. We have to have -DPYBIND11_COMPILER_TYPE= -DPYBIND11_BUILD_ABI=" in the build of any downstream feedstock to fix the underly issue. Fixing the abi metapackage is fine, but the actual compiled code from the downstream packages needs to have these flags too. Otherwise we'll still hit errors at runtime. I think we need to patch the current build of pybind11 2.4 and mark the older builds as broken before step 2 above.

This raises a policy question. Is our intent to force all of conda-forge into a more generous ABI posture wrt to compilers and pybind11? This may have impacts on external users of our pybind11 packages who expect the stronger ABI protection.

@isuruf
Copy link
Member

isuruf commented Mar 8, 2022

I meant add that to activation script, not in the build script.

I think we need to patch the current build of pybind11 2.4 and mark the older builds as broken before step 2 above.

pybind11 is a header only library. That makes no sense.

Is our intent to force all of conda-forge into a more generous ABI posture wrt to compilers and pybind11?

Yes

This may have impacts on external users of our pybind11 packages who expect the stronger ABI protection.

As I said before, this is overly pessimistic and I don't see why people insist on a "stronger ABI protection" when
even an upstream developer @henryiii agrees that it is overkill.

@beckermr
Copy link
Member

beckermr commented Mar 8, 2022

pybind11 is a header only library. That makes no sense.

We could patch the headers in the header-only library to always set PYBIND11_BUILD_ABI to the empty string. This would require a patch on the feedstock, a rebuild of the feedstock and marking older builds of the feedstock as broken IIUIC. I thought this kind of patch is what you mean for your 3 above and I am wondering why we don't want to do it for the current version.

As I said before, this is overly pessimistic and I don't see why people insist on a "stronger ABI protection" when
even an upstream developer @henryiii agrees that it is overkill.

Right of course. If we patch the headers in the feedstock, we force folks. If we include the defines with the abi package via an activation script, we only get a more generous ABI compat for people who use that package, but people who don't use pybind11-abi in host would not get those defines.

I am totally fine with the more generous ABI compact in general. I simply don't understand completely how the steps above get us there. It seems to me that we cannot rely on the pybind11-abi package since it is optional. Instead, we need to patch the headers directly on the current version and then rebuild any package that uses pybind11.

@beckermr
Copy link
Member

beckermr commented Mar 8, 2022

I'll add something subtle that not everyone may fully grok and took me a while. pybind11 is header-only, but if you pass stuff between two pybind11 built codes at the C++ level, you get a dependence on their internal ABI. I suspect folks might be doing this in conda-forge without knowing to use the pybind11-abi metapackage to mark the builds. If we only use that mechanism, then we might still leave things buggy. It would be buggy for doing something wrong, so not the worst thing that ever happened.

@tdegeus
Copy link
Member Author

tdegeus commented Mar 9, 2022

Permit me to weigh in here. I find it difficult to judge to which end PYBIND11_BUILD_ABI is an overkill or not. But since it was introduced by the developers I don't think patching the header-only library itself with the empty string is the way to go: imagine you are just using conda-forge to get pybind11 and you are compiling locally; in that case I think you would want to get the library as is was developed.

Therefore, I see the following options:

  1. Argue a change of the default behaviour upstream. By far the cleanest, but this may not be successful, or fast.
  2. Document how to switch off PYBIND11_BUILD_ABI for different ways of building pybind11 extensions (this is a bit messy I realise, as I don't think that there is an overwhelming consensus on the build protocol). In this case a bot could open an issue on all feedstocks that are pybind11 extensions, pointing maintainers to this.
  3. Offer some ways such that 'by default' the PYBIND11_BUILD_ABI is empty in several build protocols (it could be for example in CMAKE_ARGS). Though see 2: this is a bit messy.
  4. Maintain a second package e.g. pybind11-noabiprotect that does have the patch defining PYBIND11_BUILD_ABI as an empty string. In that case a bot could open PRs on all pybind11 packages replacing pybind11 with pybind11-noabiprotect. A message could be part of linter for new feedstocks.

Now my favourite:

  • Patch pybind11 such that it includes a header from its folder of headers if it exists (this would not break anything for anyone), as it will not exist by default (one could name it even conda_forge_switch_off_abi.hpp or so.
  • Have pybind11-abi create that header with the empty PYBIND11_BUILD_ABI
    This would induce no changes to feedstocks, always work independent of the build system, and not inflict a default that some might not want. A bot could just open rebuilds on feedstocks that are pybind11 extensions.
    (not that an issue can be opened by the bot urging maintainers to use pybind11-abi)

@bluescarni
Copy link

This was very much overly pessimistic. There were rare, hard to track down segfaults, but people were doing things like mixing different compilers, etc. So this was completely locked down (with the comment that it was overkill). If someone wants to more properly decide when things are compatible and when they are not, then that would be welcomed. One "special" thing about pybind11 though is that it depends on std::type_index to tell what is bound across extensions, and I'm not sure what stability guarantees that has.

@henryiii

I vaguely remember at one point reading about an ongoing effort to turn pybind11 into a compiled library. Is this still happening?

It seems like with a compiled pybind11 all these ABI compatibility issues could be solved within the existing conda-forge infrastructure without having to hard-code compile definitions or rely on a special -abi metapackage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants