Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling an HLO module without bazel #5814

Open
maxstupp opened this issue Feb 23, 2021 · 14 comments
Open

Compiling an HLO module without bazel #5814

maxstupp opened this issue Feb 23, 2021 · 14 comments
Labels
contributions welcome The JAX team has not prioritized work on this. Community contributions are welcome. enhancement New feature or request

Comments

@maxstupp
Copy link

maxstupp commented Feb 23, 2021

Hello,

Is there currently a way to use the converted HLO modules from jax_to_hlo.py and run them outside the tensorflow repository?
(using steps in the example of #5337 at the moment)

I'm using https://github.com/FloopCZ/tensorflow_cc approach to use the Tensorflow C++ API outside the source code folders and compile my projects with cmake.

But some functionality like the pjrt client or the hlo_module_loader are not available. Is the only way to load these modules within the tensorflow repository?

Thanks!

@zhangqiaorjc
Copy link
Collaborator

@scottlegrand
Copy link

scottlegrand commented Feb 23, 2021

I'm guessing he wants to compile without bazel and this example is compiled with bazel. Also, is XLA tightly coupled to TensorFlow or is there a way to compile and run without bazel AND TensorFlow installed? That's what I'm trying to figure out as well. I chose Jax for my HPC project in the hopes I could avoid unnecessary dependency and code bloat inlining its networks into an existing codebase to allow it use neural networks as energy models for molecular dynamics. If that's not possible, let me know?

My guess is no on both counts ATM after asking around: XLA is tightly coupled to TensorFlow and bazel is, well, bazel.

@skye
Copy link
Member

skye commented Feb 24, 2021

This currently isn't possible because, as noted above, the required XLA and PJRT dependencies can only be built as part of the TF tree using bazel. I agree this is cumbersome.

I think the best solution would be for us to bundle up the required dependencies into a new shared library + headers that we would periodically build (with bazel) and release, and then you would build against the released library instead of building from source. This is non-trivial to setup and support indefinitely, so I can't promise anything right now, but maybe we can put something together. Especially if more people chime in to this issue expressing interest! (hint hint anyone who comes across this issue)

@hawkinsp @zhangqiaorjc do you have any further thoughts on this?

@skye skye added the enhancement New feature or request label Feb 24, 2021
@zhangqiaorjc
Copy link
Collaborator

Short of moving XLA out of TF, I don't see how we can avoid Bazel and TF tree dep...

@maxstupp
Copy link
Author

I am trying something very similiar to scottlegrand. My project is using cmake and I was hoping there would be a way to build a shared library with the necessary or even all headers and use it as an external package with cmake.

What I am looking for is pretty much the application of my jax network/functions in other c++ projects. So train them in python and use them in c++. The dependency on the whole TF tree and bazel makes this quite difficult.

I also tried the example with bazel https://github.com/google/jax/tree/master/examples/jax_cpp but the build process is failing for me (this is not so important, because it depends on bazel again). I attached my error message in a text file if someone wants to have a look.
Error dump.txt

@scottlegrand
Copy link

I'm seeing different issues with bazel 3.7.2 myself... I cannot build this example either... My application (www.ambermd.org) has ~30,000 users across a wide variety of machines, clusters, and linux variants. I just can't force bazel/tensorflow into the compilation and execution of this and expect it to work out well. I will keep my eye on this project to see if XLA is eventually made independent of TensorFlow/bazel though because I love this framework already.

Errors attached
bazel_errors.txt

@zhangqiaorjc
Copy link
Collaborator

@scottlegrand you could try setting bazel flag --check_visibility=false?

@zhangqiaorjc
Copy link
Collaborator

@Maxstu-zz is your setup able to build tensorflow itself? the errors seem to be in compiling llvm support which is required by MLIR in TF tree (we don't directly use it)...

@maxstupp
Copy link
Author

I am able to build the normal tensorflow pip package from source, aswell as the monolithic version with bazel build --config=opt --config=monolithic tensorflow:libtensorflow_cc.so tensorflow:install_headers.

@scottlegrand
Copy link

Sadly, that just leads to another cascade of errors related (I think) to the Abseil library (attached)...

bazel_errors.txt

@hawkinsp
Copy link
Collaborator

@scottlegrand I'm wondering if you need to run, say, TensorFlow's ./configure script first in the source tree. Those errors aren't really to do with Abseil, e.g., the first error is from standard C++ things inside LLVM. That implies to me that something about your compiler toolchain isn't configured correctly, or bazel hasn't detected it correctly. Hence my guess about you needing to run the ./configure script.

XLA isn't that closely bonded to TensorFlow, it's in the same repository mostly for convenience. We could separate it, but it's not clear to me what that achieves beyond moving code around.

@scottlegrand
Copy link

@hawkinsp I can neither build tensorflow (After running ./configure) nor does wiping the bazel cache then running ./configure inside the tensorflow directory fix this.

But these sorts of (to me) arbitrary errors with bazel and Tensorflow are why I want this reduced to a separate library for the sake of building and deploying AMBER in the wild. This is too complex ATM IMO. I know we could work towards fixing this on my machine, but as likely 1 of only 3 people supporting this app, we just don't have the resources to let it be this complicated nor the funding to hire someone for that role.

bazel_errors2.txt

@maxstupp
Copy link
Author

It would be awesome if there was a way to just build the whole xla part once with bazel into a library and then use it with any other compiler as an external package similiar to https://github.com/FloopCZ/tensorflow_cc. This would allow to write and train our networks in python with jax, then extract their hlo module and just call them in c++ for other projects (like molecular dynamic simulations) with just linking to the library including the hlo_module_loader, pjrt_client, etc.

@zhangqiaorjc
Copy link
Collaborator

@scottlegrand you could ask tensorflow for help for bazel issues.

@Maxstu-zz that's a useful enhancement, we welcome community contributions!

@zhangqiaorjc zhangqiaorjc added the contributions welcome The JAX team has not prioritized work on this. Community contributions are welcome. label Jun 17, 2021
@zhangqiaorjc zhangqiaorjc removed their assignment Jun 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributions welcome The JAX team has not prioritized work on this. Community contributions are welcome. enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants