Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCC version in manylinux_2_24 #1012

Closed
henryiii opened this issue Feb 27, 2021 · 35 comments
Closed

GCC version in manylinux_2_24 #1012

henryiii opened this issue Feb 27, 2021 · 35 comments

Comments

@henryiii
Copy link
Contributor

henryiii commented Feb 27, 2021

The GCC version in manylinux1 is 4.8, painfully old, unable to build C++14 code (see #118). It’s actually possible to build GCC 9 on CentOS 5, though support was removed in GCC 10.

The GCC version in manylinux2010 is 8.3, new enough for basically full C++17 support minus a few library features.

The GCC version in manylinux2014 is 9.3, full support for C++17 and even bits of C++20.

The GCC version in manylinux_2_24 is a dismal 6.3, much worse than both the year based manylinuxes and not even new enough for C++17 language support (mostly added in GCC 7)! This is a huge step backward for an image with a higher GLIBC version. (PS: listing the GLIBC versions of the older manylinuxes would be nice, IMO)

Is there some way to replicate the “RHEL dev toolset” in the Debian based images?

@henryiii
Copy link
Contributor Author

henryiii commented Feb 27, 2021

Maybe https://stackoverflow.com/questions/61590804/how-to-install-gcc-9-on-debianlatest-docker

as long as it doesn’t update glibc by upgrading to a new os (which it might...), this should work like the dev toolset, I think?

@mayeut
Copy link
Member

mayeut commented Feb 27, 2021

Once you install gcc-9, it will update at least libstdc++ which is a no go. RHEL dev toolset took care of that and also other libraries like libgcc_s so that binaries produced with devtoolset would still be compatible with the base image with no action.
If you have time to craft a PR installing gcc-9 and patching it so that it's still compatible with the base image, I'd be happy to take a look and merge this.

manylinux_2_24 is mostly beneficial for alternate architectures where some functions were fixed in glibc (reported by numpy).

@mattip
Copy link
Contributor

mattip commented Apr 8, 2021

Unfortunately, the 6.3.10 version of gcc is too old to compile OpenBLAS on arm64, I get a error: unknown value 'cortex-a73' for -mtune. It seems support for cortex-a73 was added in gcc7. So now we are stuck: NumPy needs both manylinux_2_24 and a modern gcc.

@henryiii
Copy link
Contributor Author

henryiii commented Apr 8, 2021

Not at all surprised. You want the most recent GCC possible for alternative archs.

@h-vetinari
Copy link

Or go for 2_28 right away (at least for arm64)? Debian 10 has GCC 8.3.

@mayeut
Copy link
Member

mayeut commented Apr 9, 2021

@h-vetinari
Debian 10 can't be used as a base image for manylinux_2_28. There are symbol conflicts with other 2.28 distros (at least Photon 3.0)
Furthermore, it would leave Ubuntu 18.04 users without wheels and there are still a great number of them.

@mattip, there are still a number of possibilities for OpenBLAS/numpy:

  • Update the manylinux_2_24 toolchain in manylinux images.

This would be the preferred option but, as mentioned in a previous comment, I simply can't take on this one but will be happy to review a PR doing that as it would benefit the whole manylinux ecosystem (checking it respects original symbol requirements for manylinux_2_24 and reliability/maintainability of proposed packages)

Allows to remove CORTEXA73 tuning.
This would allow to build with the stock manylinux_2_24 image but would probably introduce a speed penalty (I don't know how much & there is probably something better to be done for dispatching over at OpenBLAS - fallback to ARMV8 where it should maybe fallback to CORTEXA72 ?)
Also, the dynamic list seems to be built depending on compiler support for some architectures. Should probably be reported over at OpenBLAS to get that feature for aarch64 also.

  • Use OpenBLAS built on manylinux2014 images

It should produce a shared library that's perfectly valid on manylinux_2_24.
In fact, now that I think about this, maybe building NumPy/OpenBLAS (others?) on manylinux2014 but tagging it manylinux_2_24 would work ? (depending how the accuracy issues mentioned in #494 were fixed in glibc i.e. symbol version bumped or not ?)

  • Update to ppa:ubuntu-toolchain-r/test gcc-9

While I would not recommend it as a generic toolchain for the manylinux_2_24 image (libstdc++ issue), this one might be well suited for NumPy/OpenBLAS. OpenBLAS built successfully on the aarch64 manylinux_2_24 image using this toolchain with no extra non-compliant symbols:

apt-get update
apt-get install dirmngr
apt-key adv --recv-keys --keyserver keyserver.ubuntu.com 1E9377A2BA9EF27F
echo "deb http://ppa.launchpad.net/ubuntu-toolchain-r/test/ubuntu xenial main" >> /etc/apt/sources.list
apt-get update
apt-get install -y --no-install-recommends gcc-9 gfortran-9
export CC=gcc-9
export FC=gfortran-9
git clone https://github.com/xianyi/OpenBLAS.git
cd OpenBLAS/
git checkout af2b0d0
make BUFFERSIZE=20 DYNAMIC_ARCH=1 USE_OPENMP=0 NUM_THREADS=64 BINARY=64 TARGET=ARMV8
  • Create & use manylinux_2_27 image based on Ubuntu 18.04

Despite Ubuntu 18.04 not being listed as a potential manylinux candidate over at https://github.com/mayeut/pep600_compliance, it's only because the symbol analysis is done on libstdc++ (gcc 8 based) rather than on the default compiler used (gcc 7)
Users of amazonlinux:2 won't be able to use wheels from this one.

@henryiii
Copy link
Contributor Author

henryiii commented Apr 12, 2021

Or go for 2_28 right away (at least for arm64)? Debian 10 has GCC 8.3.

We need newer compilers with older images; requiring the wheel to be based on an OS newer than the compiler doesn't work well. If you want to use GCC 9 or 10, now even 2_28 would be to old. This was available with RHEL based images, so it should(*) be possible here, too. CentOS 6 has GCC 8, and that's 2_12.

*: No statement about how hard it is, just that it should be possible.

@h-vetinari
Copy link

h-vetinari commented Apr 12, 2021

If you want to use GCC 9 or 10

That's quite a wide gap to bridge. Scipy is just recently thinking about bumping their gcc lower bound from 4.8 to 5.5 (not least due to being buildable on more exotic platforms) - the point being that I think very few projects can assume such modern compilers (with all that entails) for their entire user base.

This was available with RHEL based images, so it should(*) be possible here, too.

RH presumably spent a non-trivial amount of time on that... Don't get me wrong, that'd be awesome to have, but I'm thinking that even "cutting-edge" packages should normally be fine with the compilers of a ~2 year old mainstream distro.

That's part of the appeal of the perennial manylinux, IMO, because maintainers can choose themselves where they want to set their cutoff (assuming pypa would provide all those images).

@mayeut:

There are symbol conflicts with other 2.28 distros (at least Photon 3.0)

Photon already has a number of compatibility issues, apparently. From the POV of a maintainer, I want to cover the largest possible portion of my user base (vis-à-vis trade-offs in terms of maintainability) - and it's IMO a legitimate choice not to care about fringe distros (where users can still build from source, and are presumably used to that anyway).

@mayeut
Copy link
Member

mayeut commented Apr 24, 2021

Photon already has a number of compatibility issues, apparently.

Yes, it's fully headless and does not provide any libraries like X11. It's not a major issue but it's mentioned anyway. If one is using Photon, I guess they know it only works this way and won't rely on packages requiring these dependencies (e.g. use opencv-python-headless instead of opencv-python). It's more of a limitation than a compatibility issue.

From the POV of a maintainer, I want to cover the largest possible portion of my user base (vis-à-vis trade-offs in terms of maintainability) - and it's IMO a legitimate choice not to care about fringe distros (where users can still build from source, and are presumably used to that anyway).

Agreed, it's obviously up to the package maintainers to decide what they want to support. However, there's no reason for manylinux to knowingly provide images that are not compliant with PEP600.

@mayeut
Copy link
Member

mayeut commented Apr 24, 2021

I did some experiments with PPA in the openblas PR. It seems to be the way to go to update the toolchain here.
I was able to rebuild binutils-2.36.1 & gcc-10 from the upcoming Ubuntu 21.04 but targeting Ubuntu 16.04 and that works well when installed on Debian 9.
I do not have the proper skills to do that in a clean way I think and any help on the matter would be greatly appreciated.
I will try and up my skills w.r.t. providing a PPA with an up-to-date binutils however, for gcc-10, there's still the libstdc++ compatibility issue that's really out of my league here.

1st experiments:

@h-vetinari
Copy link

Regarding the toolchain issues, you might be interested how https://github.com/symengine/symengine-wheels handles this (because it builds based on conda-packages, which are compiled with much newer compilers).

@isuruf
Copy link

isuruf commented May 8, 2021

Is there some way to replicate the “RHEL dev toolset” in the Debian based images?

Yes, build gcc 10 with the patches from https://git.centos.org/rpms/devtoolset-10-gcc/blob/c7/f/SOURCES.
In devtoolset, libstdc++.so is a linker script that points to a static libstdc++ for the new parts and shared libstdc++ for the older parts and the C++ ABI is set to GCC-4 ABI.

@mayeut
Copy link
Member

mayeut commented May 8, 2021

@isuruf,

would you be willing to contribute a PPA which provide such installable artefacts and maintain those ?

PS, they do not only patch libstdc++.so, other bunch of *.so use the same trick (like libgcc_s.so) also, they patch some sources (for these tricks, for these tricks to actually work as well as generic bug-fix/red-hat specific fixes) and the build process. As you said, all the material is in the patches from RedHat, we would need to find someone willing to do the work and some support.

@isuruf
Copy link

isuruf commented May 8, 2021

would you be willing to contribute a PPA which provide such installable artefacts and maintain those ?

Nope. Sorry.

@mayeut
Copy link
Member

mayeut commented May 8, 2021

@h-vetinari,

Thanks for the link.
Using static linking should also work with the ppa:ubuntu-toolchain-r/test gcc-9
This trick is also mentioned in #118

Since those are tricks I'm certainly not comfortable getting those toolchains as default ones but these options are here for anyone interested in using them.

@mayeut
Copy link
Member

mayeut commented May 8, 2021

2nd round of experiments with PPA
I'm a bit more confident in what I've done this time around although this still requires someone to review this.

gcc is just a backport of hirsute gcc-10.3.0-1ubuntu1 with some features disabled (https://launchpadlibrarian.net/536669609/gcc-10_10.3.0-1ubuntu1_10.3.0-1ubuntu1~16.04.8.mayeut1.diff.gz)
It still requires RedHat patches mentioned in #1012 (comment) to ensure compatibility (I really don't think it's as easy as it might sound).

@mattip
Copy link
Contributor

mattip commented Jul 7, 2021

2nd round of experiments with PPA ...

@mayeut any progress around this? Is there a PR somewhere?

@mayeut
Copy link
Member

mayeut commented Jul 14, 2021

@mattip,

no progress since my last comment.
there's no PR, the PPA for tools & binutils shall be reviewed by someone with PPA knowledge directly from the diff of each package sources:

for gcc, I'm not likely to work more on this. I'm lacking experience to understand the patches from RH and in debian packaging to make this a clean installable package that won't mess with the system libraries.

@ghost
Copy link

ghost commented Sep 4, 2021

I compile a performance sensitive module.
I preconceived that the gcc used by manylinux_2_24 is newer, after seeing this issue, I changed it back to manylinux2014.

@h-vetinari
Copy link

h-vetinari commented Sep 10, 2021

From the POV of a maintainer, I want to cover the largest possible portion of my user base (vis-à-vis trade-offs in terms of maintainability) - and it's IMO a legitimate choice not to care about fringe distros (where users can still build from source, and are presumably used to that anyway).

Agreed, it's obviously up to the package maintainers to decide what they want to support. However, there's no reason for manylinux to knowingly provide images that are not compliant with PEP600.

I will maintain that it could very well be that 2_28 is a better trade-off overall. 2_24 has been stuck for months; 2_28 would work more or less with out-of-the-boximage compilers. Yes, it would cost support of ubuntu 18.04 and photon, but that's still better than nothing.

And obviously 2_24 could still be added afterwards, once the work for it converges.

@bhack
Copy link

bhack commented Nov 25, 2021

It seems that only with manylinux_2_31 we could have a standard ubuntu/debian distro:

https://github.com/mayeut/pep600_compliance#pep600-compliance-check

Stats:
https://mayeut.github.io/manylinux-timeline/

@mattip
Copy link
Contributor

mattip commented Nov 25, 2021

From that table, at each manylinux click we would loose have support for :

manylinux still valid distros (only taking into account EOL and not extended EOL)
manylinux2_26 amazon2 (EOL 2023-06-30), OpenSuse 15.2 (EOL 2021-12-31)
manylinux2_27 ubuntu18.04 (EOL 2023-04-30)
manylinux2_28 centos8/rhubi8 (EOL 2029-05-31), debian10 (EOL 2022-7-31), oraclelinux8 (EOL 2029-07-31).

Project who wish to support centos7/rhubi7 will need manylinux2_17 (manylinux2014) until 2024.

It seems amazon2 supports yum and installs gcc 7.3.1 as a base package, so maybe manylinux2_26 is a viable target? The image is available from docker docker pull amazonlinux. I did not check what versions of libraries it comes with other than gcc.

@h-vetinari
Copy link

From that table, at each manylinux click we would lose support for :

I think you're off-by-one here, as in: the right column needs to be shifted one row down. In particular manylinux2_28 still supports centos 8 / debian 10.

@mattip
Copy link
Contributor

mattip commented Nov 25, 2021

@h-vetinari thanks, I rephrased the top line of the comment. The table column heading was correct.

@h-vetinari
Copy link

Based on current data from the pep-compliance repo, the EOL of manylinux_2_24 is imminent. I've opened a separate issue to discuss this: #1332

@mayeut
Copy link
Member

mayeut commented Aug 28, 2022

#1369 will drop support for manylinux_2_24 images on January 1st, 2023. This issue will never be fixed.

@snnn
Copy link

snnn commented Apr 10, 2024

If we split the problem in two parts: the compiler and the runtime(gcc_s and libstdc++), we could find a different solution that just solves the first part. We may do cross-compiling with a very new C/C++ compiler(either GCC or LLVM/Clang) and an old rootfs that contains old version of Glibc/libstdc++/gcc_s/ ... . Then we do not have to depend on RHEL's devtoolset, and do not have to bind ourselves to a specific Linux distro. You may still use the latest C++ grammar, if the new grammar doesn't require a new runtime support. For example, you can use C++17 and target Ubuntu 16.04 but you cannot use C++17's std::filesystem library. I think this limitation is acceptable because anyway you cannot make it work on Apple's operating systems. If an open source project supports Linux, most likely it supports macOS too. The same concern applies. With this approach, we still can get many benefits from the new compiler and toolchain that:

  1. Better security. The new compilers are better patched, therefore they are more secure.
  2. Broader hardware support. For example, if you need to target a new ARM CPU that was just release recently.
  3. New C++ grammars, like ranged for.
  4. Better diagnostic warnings at build time
  5. New security flags. Like control flow guard.

@h-vetinari
Copy link

You may still use the latest C++ grammar, if the new grammar doesn't require a new runtime support. For example, you can use C++17 and target Ubuntu 16.04 but you cannot use C++17's std::filesystem library.

Yeah, except then every project would need to learn which C++ features require runtime support, which is really not realistic. People would get it wrong all the time, and stuff would definitely break.

@henryiii
Copy link
Contributor Author

Also, I know one of the issues is that the special arches generally require newer compilers. The reason NumPy couldn't use manylinux_2_24 wan't due to C++ features (they didn't use any), but because the old compiler was buggy on other architectures, they required 6.3+ or something like that. (Maybe it's just the compiler, and the libs are fine, but I'd be a bit worried)

@snnn
Copy link

snnn commented Apr 11, 2024

Yeah, except then every project would need to learn which C++ features require runtime support, which is really not realistic. People would get it wrong all the time, and stuff would definitely break

If it does not work, you will get a compile time error instead of a runtime error, so it is easy to handle. macOS already works in this way. The C++17 filesystem limitation is a well-known issue among iOS developers. So it is not unacceptable.

Also, my proposal has another benefit: better 32-bit support. Nowadays it's often seen a single compiler process takes more than 1GB memory, but a 32-bit system only has 2GB usermode address space. When you see GCC/LD hit the limitation, nothing else you can do.

@mattip
Copy link
Contributor

mattip commented Apr 11, 2024

If it does not work, you will get a compile time error instead of a runtime error

The beginning of the comments in this issue, closed for a year and a half, talked about how there is not a package with a split between the compiler and runtime. Is there now such a package or are you suggesting building a new compiler package? Who would maintain it? Where would it live?

@fweimer-rh
Copy link

Yeah, except then every project would need to learn which C++ features require runtime support, which is really not realistic. People would get it wrong all the time, and stuff would definitely break

If it does not work, you will get a compile time error instead of a runtime error, so it is easy to handle.

@snnn I'm not sure that's accurate. Why would you get a compiler error if you reference a symbol in the newer libstdc++? If you set things up so that the older libstdc++.so.6 is used for linking, in theory you could get a linker error, but I don't think building Python extensions is compatible with -Wl,-z,defs. So you wouldn't get an error, either.

@snnn
Copy link

snnn commented Apr 12, 2024

Is there now such a package or are you suggesting building a new compiler package?

Here are the details:
First, get a manylinux docker image, run it in interactive mode with bash,

docker run --rm -it quay.io/pypa/manylinux_2_28_x86_64 /bin/bash

Then in the container do:

dnf install gcc gcc-c++
rm -rf /opt/rh

Then, outside of the docker container, get a bash and run

docker ps

to get the docker instance id, then use docker export command to export the instance's rootfs as a tar file, then extract the tar file to somewhere.
Then you can shutdown the docker container.
Then install LLVM/Clang from somewhere. You can use the latest version.
Then create a hello world C/C++ problem, and use clang to compile it.

clang -o t test.c  --sysroot=/crossrootfs  -v -fuse-ld=lld

"/crossrootfs" is where the tarball was extracted to.
Then you can use objdump to check which glibc version the built binary requires, and transfer the binary to a target system to test.

If you think this idea is acceptable, then you can rebuild the manylinux_2_28_x86_64 with a different base image that you want to target to, for example, Ubuntu 16.04 or UBI8. If you use a standard UBI8 image from RH and do not manually install additional RPM packages there, you are guaranteed that it will receive good vulnerability management service from RH for free for many many years. The warranty is for the image you downloaded from RH, not RPM packages.

Is there now such a package or are you suggesting building a new compiler package?

We do not need to build a custom compiler. We can get everything we need from the operating system's package manager, or from other vendors. For example, you may use Ubuntu 22.04 as the host OS, then get the latest LLVM from https://releases.llvm.org/. The compiler should have very decent support for every new hardware.

@mayeut
Copy link
Member

mayeut commented Jun 1, 2024

I think that there are 3/4 different problems that are currently addressed by devtoolset/gcc-toolset:

  1. cross-building is not not well supported in the python ecosystem (see the draft https://peps.python.org/pep-0720/). devtoolset/gcc-toolset are native toolchains that only require minimal setup to work.
  2. getting better support for new hardware: devtoolset is outdated for sure (gcc-10). The current gcc-toolset in AlmaLinux could be updated to gcc-13 (currently gcc-12) - update risk to be evaluated. Using the latest LLVM with cross-compilation definitely works in this case.
  3. newer C++ standard: there are 2 cases (3/4) here as already mentioned earlier in the thread: the part that does not need runtime support & the part that does. The gcc-toolset provides for both through a custom linker script for gcc_s /libstdc++/.... It's most likely possible to use LLVM with gcc-toolset so as not to restrict the runtime features available to the bare feature set of the base distro.

@snnn, do you have a PoC somewhere (rather than the few commands posted here) ? I'd like to dig into this a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants