Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorboard_data_server is not conform to manylinux2010 #4928

Closed
DavHau opened this issue May 4, 2021 · 10 comments
Closed

tensorboard_data_server is not conform to manylinux2010 #4928

DavHau opened this issue May 4, 2021 · 10 comments
Assignees
Labels
core:rustboard //tensorboard/data/server/... type:build/install

Comments

@DavHau
Copy link

DavHau commented May 4, 2021

In the manylinux2010 tagged .whl release of tensorboard_data_server, the following file is not manylinux conform:

tensorboard_data_server/bin/server dynamically links against libcrypto.so.1.1 and libssl.so.1.1 which are not part of the allowed dependencies according to PEP 571 -- The manylinux2010 Platform Tag

On systems with explicit dependency management like NixOS this leads to trouble.

I guess either the manylinux2010 tag or the dependency on the mentioned libraries should be removed.

@wchargin wchargin added core:rustboard //tensorboard/data/server/... type:build/install labels May 4, 2021
@wchargin
Copy link
Contributor

wchargin commented May 4, 2021

Hi @DavHau—thanks for the report! We might be able to drop the
dependencies by switching from openssl-sys to rustls; I'll look into
that. If not feasible (or not sufficient), I'll look into changing the
platform tag; I'm a bit more cautious about this, but we'll see what we
can do.

wchargin added a commit that referenced this issue May 4, 2021
Summary:
The RustBoard data server now uses `rustls` as a TLS backend instead of
using OpenSSL. This culls the `libcrypto.so.1.1` and `libssl.so.1.1`
shared library dependencies, fixing #4928.

Test Plan:
Run with `--logdir gs://tensorboard-bench-logdir/edge_cgan` to verify
that RustBoard still works. Then, dump the `ldd` output:

```
$ cd tensorboard/data/server/
$ cargo build --release
$ ldd target/release/rustboard
	linux-vdso.so.1 (0x00007ffe8afbd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f422533e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f422531c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f42251d8000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f42251d2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f422500d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4225b66000)
```

…and verify that all the shared libraries are on the [manylinux2010
whitelist].

[manylinux2010 whitelist]: https://www.python.org/dev/peps/pep-0571/#the-manylinux2010-policy

wchargin-branch: rust-rustls
wchargin-source: 51e7f815217d9a1ce3252757b4de3672635424ca
@wchargin
Copy link
Contributor

wchargin commented May 4, 2021

@DavHau: Sent #4931 to fix this. If you want to give this a shot, you
can download the pre-built tensorboard-data-server wheel from the
GitHub Actions CI job:
https://github.com/tensorflow/tensorboard/suites/2650138713/artifacts/58249455

Or, you can of course build from source.

@DavHau
Copy link
Author

DavHau commented May 4, 2021

Thanks for the quick fix!

@DavHau DavHau closed this as completed May 4, 2021
wchargin added a commit that referenced this issue May 5, 2021
Summary:
The RustBoard data server now uses `rustls` as a TLS backend instead of
using OpenSSL. This culls the `libcrypto.so.1.1` and `libssl.so.1.1`
shared library dependencies, fixing #4928.

Test Plan:
Run with `--logdir gs://tensorboard-bench-logs/edge_cgan` to verify that
RustBoard still works. Then, dump the `ldd` output:

```
$ cd tensorboard/data/server/
$ cargo build --release
$ ldd target/release/rustboard
	linux-vdso.so.1 (0x00007ffe8afbd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f422533e000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f422531c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f42251d8000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f42251d2000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f422500d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f4225b66000)
```

…and verify that all the shared libraries are on the [manylinux2010
whitelist].

[manylinux2010 whitelist]: https://www.python.org/dev/peps/pep-0571/#the-manylinux2010-policy

wchargin-branch: rust-rustls
@wchargin
Copy link
Contributor

wchargin commented May 5, 2021

@DavHau: You're welcome; thanks again for the clear report! I've pushed
tensorboard-data-server==0.6.1 with a manylinux2010-compliant binary.
It's compatible with TensorBoard 2.5.0.

https://pypi.org/project/tensorboard-data-server/0.6.1/

@DavHau
Copy link
Author

DavHau commented May 6, 2021

BTW, just out of curiosity I was running auditwheel on your .whl file:

❯❯❯ auditwheel show tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl

tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl is
consistent with the following platform tag: "manylinux_2_24_x86_64".

The wheel references external versioned symbols in these system-
provided shared libraries: libdl.so.2 with versions {'GLIBC_2.2.5'},
libm.so.6 with versions {'GLIBC_2.2.5'}, librt.so.1 with versions
{'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.3', 'GCC_4.2.0',
'GCC_3.0'}, libpthread.so.0 with versions {'GLIBC_2.2.5',
'GLIBC_2.3.2', 'GLIBC_2.3.3'}, libc.so.6 with versions {'GLIBC_2.7',
'GLIBC_2.3.2', 'GLIBC_2.4', 'GLIBC_2.14', 'GLIBC_2.10', 'GLIBC_2.3.4',
'GLIBC_2.9', 'GLIBC_2.2.5', 'GLIBC_2.18'}

This constrains the platform tag to "manylinux_2_24_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.

Following the specs, it is still not manylinux2010 compatible, but since your last change it is at least "manylinux_2_24_x86_64".
You could just rename it.

EDIT: This doesn't raise any problem for me personally. I just thought I'd let you know.

@wchargin
Copy link
Contributor

wchargin commented May 6, 2021

Hmm. I'd picked manylinux2010 because that's what TensorFlow uses, and
running auditwheel on TensorFlow gives similar output:

$ auditwheel show ./tensorflow-2.4.1-cp36-cp36m-manylinux2010_x86_64.whl 

tensorflow-2.4.1-cp36-cp36m-manylinux2010_x86_64.whl is consistent
with the following platform tag: "manylinux_2_12_x86_64".

The wheel references external versioned symbols in these
system-provided shared libraries: librt.so.1 with versions
{'GLIBC_2.2.5'}, libgcc_s.so.1 with versions {'GCC_3.0', 'GCC_3.3'},
libm.so.6 with versions {'GLIBC_2.2.5'}, libpthread.so.0 with versions
{'GLIBC_2.12', 'GLIBC_2.2.5', 'GLIBC_2.3.2', 'GLIBC_2.3.3'},
libdl.so.2 with versions {'GLIBC_2.2.5'}, libstdc++.so.6 with versions
{'GLIBCXX_3.4.11', 'CXXABI_1.3', 'GLIBCXX_3.4', 'GLIBCXX_3.4.10',
'CXXABI_1.3.3', 'CXXABI_1.3.2', 'GLIBCXX_3.4.9'}, libc.so.6 with
versions {'GLIBC_2.3.4', 'GLIBC_2.3.2', 'GLIBC_2.3.3', 'GLIBC_2.11',
'GLIBC_2.4', 'GLIBC_2.10', 'GLIBC_2.7', 'GLIBC_2.3', 'GLIBC_2.6',
'GLIBC_2.9', 'GLIBC_2.2.5'}, _pywrap_tensorflow_internal.so with
versions {'tensorflow'}

This constrains the platform tag to "manylinux_2_12_x86_64". In order
to achieve a more compatible tag, you would need to recompile a new
wheel from source on a system with earlier versions of these
libraries, such as a recent manylinux image.

Are you running into symbol version issues with either tensorflow or
tensorboard-data-server in practice? If not, I'm inclined to keep
following what tensorflow does, and if there's an easy way to build
against a manylinux2010-compliant image from our CI, I'd be happy
to entertain it.

edit: I guess this is actually fine for TensorFlow, since

manylinux2010_x86_64 is now an alias for manylinux_2_12_x86_64

per PEP 600. I'm still inclined to defer action until there's a
problem, though I acknowledge that this is not strictly to spec.

Thanks for your patience; distributing Python binary distributions is
new to me.

@DavHau
Copy link
Author

DavHau commented May 6, 2021

No, I'm not running into any issues. Just people with an old glibc version might.
Thanks for your time as well.

@wchargin
Copy link
Contributor

wchargin commented May 8, 2021

Serendipitously, an article "Building Rust binaries in CI that work with
older GLIBC" was hot on /r/rust today:

So, if anyone ends up wanting to look into actually building to
manylinux2010 rather than either (a) doing nothing or (b) just changing
the platform tag, this may be a good place to start.

@DavHau
Copy link
Author

DavHau commented May 8, 2021

Interesting. Many people seem to have these issues with glibc versions and still nobody mentioned nix which is designed to solve exactly these kind of issues. It allows you to create environments with arbitrary package version. You can even have tools in your environment that depend on different glibc versions at the same time without having conflicts.

@aphedges
Copy link

aphedges commented Feb 8, 2022

This issue has caused a problem for me. When I run TensorBoard, I get the following error: lib/python3.9/site-packages/tensorboard_data_server/bin/server: /lib64/libc.so.6: version 'GLIBC_2.18' not found (required by lib/python3.9/site-packages/tensorboard_data_server/bin/server). (I removed irrelevant path parts, as well as fixed a quote so Markdown formatting works.) Although I did not encounter any other problems while using Tensorboard, the error message is concerning.

I am running on CentOS 7, and the most recent version of glibc on my system is GLIBC_2.17.1 This is consistent with PEP 599, which states that the manylinux2014 platform tag is based on CentOS 7 and supports up to GLIBC_2.17. According to PEP 571, manylinux2010 only supports up to GLIBC_2.12, which as you noted above, is what TensorFlow is compiled with.

Given that you said you target manylinux2010 because TensorFlow does, I searched the TensorFlow repository. There is a complex build process spread across many files, but their build process seems to specifically download an older version of glibc just to create wheels. For an example, look at tensorflow/tools/ci_build/Dockerfile.rbe.cuda11.1-cudnn8-ubuntu18.04-manylinux2010-multipython and tensorflow/tools/ci_build/devtoolset/build_devtoolset.sh.

I don't know if modifying the build process is extremely important (I can't seem to find any other glibc-related bugs in the issue tracker), but hopefully the scripts in the main TensorFlow repository can help fix the workflow if you deem it worth doing.

Footnotes

  1. I found it by running strings /lib64/libc.so.6 | grep '^GLIBC_' | sort --sort=version | uniq, which is based on https://gist.github.com/michaelchughes/85287f1c6f6440c060c3d86b4e7d764b#check-the-old-location-of-libcso6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core:rustboard //tensorboard/data/server/... type:build/install
Projects
None yet
Development

No branches or pull requests

3 participants