-
Notifications
You must be signed in to change notification settings - Fork 74.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tf.load_op_library unable to load manylinux2010 repaired custom ops #31807
Comments
I remember I encountered an issue when there is a collision of names for added kernel ops. (used to be fine for 1.14, not with new tf-nightly) Wondering if there are multiple versions of zero_out kernel ops? |
Thanks! Looking at the binaries' symbols I'm not seeing any duplication that isn't present in the .so before auditwheel repair though: Is there a way to increase the verbosity of the load_library call so we could see if there is a conflict or something else? The only major difference I see is that the repaired binary requires the newly copied |
My previous issue was the LMDBDataset. I initially implemented LMDBDataset (C++) into TF's core rep (tf.contrib) some time ago. Later on since we try to modularize, the LMDBDataset has been moved to tensorflow/io. So there are two copies if both tensorflow and tensorflow-io are loaded. That used to be fine. However, very recently I noticed that LMDBDataset in tensorflow/io is not working anymore with tf-nightly (couldn't remember which version but must be very recent), and I have to change the name in tensorflow/io to LMDBDatasetV2 to get around it. Don't know if this could be related as well. |
Ah the I wrote a patch for auditwheel, to get around the issue : tensorflow/io@02dcf4a |
@yongtang Amazing thanks so much! Could you explain what that file edit does / why that patch works (I'm assuming somehow tricks auditwheel to thinking the sharedlib is a common one on all systems)? We should probably describe this and include the patch in custom-op repo. |
EDIT -- Found out what policy.json was being editted: Thanks again for the patch @yongtang! |
In the auditwheel 5.2.0 which is recently released, one can use auditwheel repair --exclude libtensorflow_framework.so.2 --exclude libtensorflow_framework.so.1 --exclude libtensorflow_framework.so some_wheel.whl If one uses [tool.cibuildwheel.linux]
repair-wheel-command = "auditwheel repair --exclude libtensorflow_framework.so.2 --exclude libtensorflow_framework.so.1 --exclude libtensorflow_framework.so -w {dest_dir} {wheel}" |
System information
Describe the current behavior
Currently when I build a custom op in the
tensorflow/tensorflow:custom-op-ubuntu16
docker image using the defined steps I get an install-able pip packagetensorflow_zero_out-0.0.1-cp27-cp27mu-linux_x86_64.whl
This works fine, however if I repair that wheel to be manylinux2010 compliant, then
tf.load_op_library
will fail to find the custom-op.Notice
'zero_out'
&'zero_out_eager_fallback'
are not found in the loaded library for manylinux2010Code to reproduce the issue
Other info / logs
Here are the auditwheel repair logs:
repair.txt
Here are the readelf inspections of the so files:
readelf.txt
readelf-manylinux2010.txt
Here are the so files:
so-files.zip
cc @perfinion @gunan @yifeif
--------------------------EDIT--------------------
Here are the extracted whl directories which will work with the python
tf.load_op_library
commands from above. (Manylinux2010 repair makes it so the custom op depends on a newly copied libtensorflow_framework.so which is part of the new whl):custom-op-dirs.zip
The text was updated successfully, but these errors were encountered: