Build Tensorflow and libtensorflow from source#64716
Conversation
|
Darwin still uses binary builds for now. I don't have a Darwin machine so I can't test if source build works and fix it if it doesn't - unless someone wants to take on that role! |
|
(For some reason I cannot request a review from @yorickvP on Github) |
df1b94b to
568648c
Compare
568648c to
45298f1
Compare
|
Updated binary |
It seems that Google now build on newer macOS versions, I had the same problem with Tensorflow 1.13.1 when building on my own MacBook with Mojave. A newer cctools is needed, see #49371. I think this is the relevant PR: #61172 |
|
This is great! I'm hoping to get a fully static build at some point, but having the same protobuf on tensorflow as the rest of the code should fix our linker errors! :) You can't request a review from me because I don't have commit rights, I suppose. |
yorickvP
left a comment
There was a problem hiding this comment.
whoever built tensorflow upstream must have patched out that saved_model_portable_proto line. it's been in there for several version snow.
|
As I mentioned in #63616, there are still issues with the python2 build. Seems to be a general problem with parallelism on python2, only manifests itself here because its such a big project and basically hopeless to build without a lot of parallelism (used 12 cores). I already added one patch to python2 (#64067), but I still saw the issue (sometimes, its transient) after that. I suspect the second part of the fix in the python ticket is also needed. That was never backported by debian, so I had to do it myself (being not very experienced with C). I've pushed those changes to the branch of the other PR. Its not very polished, which is why I didn't push it before. I don't have time to work on this or give extensive review right now. But I can answer questions if you have any. |
|
@timokau I saw that one but IIUC this is a general Python problem, not Tensorflow-specific one, so I didn't really look into it. It shouldn't happen during the "big" Tensorflow build because I only build "source" package now and build the wheel separately with our |
45298f1 to
7b5a242
Compare
|
Neat, so you only need bazel for the C part and don't need to unnecessarily rebuild everything for libtensorflow, py2 tensorflow and py3 tensorflow? |
|
Sadly no, you still need to rebuild everything because their C libraries link to different Pythons and I presume use different APIs. Still, in the end you get something that resembles a usual source Python package with |
|
Oh, so you mean you just skip the final I don't think the |
|
The primary reason I switched back to the previous behavior was to ensure |
|
Will this work with bazel 0.28.0? #64633 |
|
My PR had some issues with bazel 0.27, not sure if @abbradar worked around those. But the bazel team has promised no further backwards incompatibilities for the next 3 releases (then they'll break backwards compatibility once more and release 1.0). |
Nice news! I didn't test 0.28 though yet, wanted to do it today but sadly I won't have time for one more rebuild.
Huh, didn't notice anything. Maybe you remember something specific? I apply patch for 0.27 from upstream and also add a flag to enable backwards compatibility with some behavior. |
|
Now that other parts are here I'll wait till staging goes into master and merge this (so that we don't break Python 2 + Tensorflow). |
|
Why not just retarget this PR to staging and merge now? Then we'll get it automatically on the next staging-next merge. |
|
Well, I usually avoid crowding staging with not-mass-rebuildy PRs to ease bisect in case it's needed (searching for problems among mass rebuild commits can be painful).
…On July 17, 2019 2:23:15 PM GMT+03:00, Timo Kaufmann ***@***.***> wrote:
Why not just retarget this PR to staging and merge now? Then we'll get
it automatically on the next staging-next merge.
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#64716 (comment)
--
Nikolay.
|
|
Fair enough. |
|
I tried to build tensorflow with SSE4.1 support, but unfortunately the build failed. Apparently that has to be specified at configure time now. This is my attempt to get it working: Feel free to cherry-pick and/or rebase. |
81651a2 to
388a1ed
Compare
388a1ed to
5f49e3c
Compare
This also changes it to a from-source build.
This merges work done by yorickvP and timokau in NixOS#63208 and NixOS#63616 respectively. Now the derivation builds both libtensorflow and the Python package and puts them into different outputs. Quite a bit of improvements were done on the top, including: * Use official tag revision as source, not a branch; * Use all system libraries possible (before only one was actually used); * Move various environment variables to the derivation itself from hooks; * Use source Python build instead of wheel build to ensure fixup hooks do their important jobs on libraries; * And more that I forgot!
They sometimes take separate flags.
Now need to be passed in the configure phase. abbradar: Don't change CUDA build hash.
5f49e3c to
cd0e461
Compare
|
Fixed in 7109546. Thanks for the heads up! |
Motivation for this change
Restore Tensorflow source build while leaving binary builds . Based on @yorickvP and @timokau work.with some parts taken from both.
Things done
sandboxinnix.confon non-NixOS)nix-shell -p nix-review --run "nix-review wip"./result/bin/)nix path-info -Sbefore and after)cc @uri-canva for changes regaring Bazel.
Tested both CUDA and non-CUDA builds of the Python library, both source and binary builds. Didn't test libtensorflow, I assume that if Python library works and
lddoutput is okay libtensorflow is okay too.