Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java maven snapshots and releases #2675

Closed
yuzawa-san opened this issue Dec 16, 2019 · 20 comments
Closed

Java maven snapshots and releases #2675

yuzawa-san opened this issue Dec 16, 2019 · 20 comments
Labels
feature request request for unsupported feature or enhancement

Comments

@yuzawa-san
Copy link
Contributor

Is your feature request related to a problem? Please describe.
No

System information
Linux x86_64, Mac x86_64

Describe the solution you'd like
The java API #2215 was merged recently and seems likely it will be on a release in the near future. I would like to try testing around with this in my development environment and with some production traffic. Historically, we have had trouble with JNI integrations and would like to ensure there are no leaks or other bugs. The easiest way would be to use a snapshot build published to maven (ideally maven central). Is this work underway? If not could you please prioritize this work. Additionally, when the time comes, a published maven artifact would be most desired.

Describe alternatives you've considered
Building locally, placing snapshots on corporate internal maven. This would require us to pull periodically, and ideally your CI job would build the snapshots.

Additional context
n/a

@faxu
Copy link
Contributor

faxu commented Dec 17, 2019

CC @Craigacp

@Craigacp
Copy link
Contributor

Craigacp commented Dec 17, 2019

I'm looking into the maven central requirements now, but there are a bunch of things to work out around the artifact signing and group id before it can be uploaded there.

It should be straightforward to build the next release for Java on RHEL7/OL7/CentOS7, Ubuntu (provided you don't use gcc 9), and macOS. It also builds fine on Windows using the latest release of Visual Studio's build tools.

It's not going to be possible to build it for each individual execution provider and host it on maven central, so I expect the binary will probably be CPU only.

@faxu faxu added the feature request request for unsupported feature or enhancement label Dec 19, 2019
@faxu
Copy link
Contributor

faxu commented Dec 19, 2019

Since this is the first release of the API, it's still in preview and will take some time to mature before publishing to maven. Are you able to build from source as suggested?

In the future when it's available in maven, it will still just be limited build flavors.

@yuzawa-san
Copy link
Contributor Author

After some playing around for a bit (disclaimer I work primarily in java and I am not a regular user of cmake), I was able to compile it, albeit for a single architecture (linux 64bit). My mac build env is a mess unrelated to anything here, so I have not built it there yet. I did test the java functionality briefly. We are going to stress test and do additional checks before we consider using your library.

In regards to the longer term vision of how this should be distributed, here are my thoughts. It should be released to maven, preferably maven central repository. This makes it very easy for users to incorporate it into it their projects. Snapshots are a nice to have, but I guess technically not a requirement. I see that the azure pipeline seems to have a release and snapshot mechanism for the python library. Ideally, Java would have a similar mechanism. I guess the project maintainers would probably handle this.

There are some nuances in the actual artifacts that are published to maven. Specifically there is the nature of how different architectures (linux, windows, mac, etc) are handled. Typically the maven classifier is used to differentiate artifacts based on OS/arch. Additionally there is the other issue of how much to bundle in the artifact.

I was looking at the "onnxruntime4j-1.1.0-with-binaries.jar" it contains both the libonnxruntime.so and libonnxruntime4j_jni.so

I think we should approach this more like bytedeco's javacpp presets handles it: http://repo1.maven.org/maven2/org/bytedeco/opencv/4.1.2-1.5.2/ Feel free to download a few different jar's and run unzip -l whatever.jar to see what each contains.

There is a base artifact with the java classes. This would be like the plain onnxruntime4j-1.1.0.jar. Then there are a series of artifacts with classifiers for the different os/arch. Each of those would contain the libonnxruntime4j_jni.so or libonnxruntime4j_jni.dylib or onnxruntime4j_jni.dll respectively. Alternatively you could forego the classifiers and put them all in one jar, but this is probably against best practice (polluting the java class path with unused data) and may be difficult to have your azure pipeline do since it seems to be split up by arch.

There is another question about whether whether whatever is published should contain a compiled onnxruntime library. Probably for most users this would be desired as an easy quick start. However power users may want to provide their own onnxruntime library. I believe it may be possible to accomplish this by breaking out into another artifact: one with just the onnxruntime library that depends on a core artifact containing Java classes and points towards classified artifacts containing the onnxruntime4j_jni library.

Ideally, these are the following artifacts:

  • group: ai.onnx name: runtime-cpu version: 1.1.0 = This artifact would be used when the user wants a stock CPU onnxruntime library for whatever os they desire. This would be like the library available from the repo's releases page in the download area. This artifact would depend on the runtime-core artifact below. The POM refers to the following classified artifacts:
    • group: ai.onnx name: runtime-cpu version: 1.1.0 classifier: macosx-x86_64
    • group: ai.onnx name: runtime-cpu version: 1.1.0 classifier: linux-x86_64
    • group: ai.onnx name: runtime-cpu version: 1.1.0 classifier: windows-x86_64
  • group: ai.onnx name: runtime-gpu version: 1.1.0 = This artifact would be used when the user wants a stock GPU onnxruntime library for whatever os they desire. This would be like the library available from the repo's releases page in the download area. This artifact would depend on the runtime-core artifact below. The POM refers to the following classified artifacts:
    • group: ai.onnx name: runtime-gpu version: 1.1.0 classifier: macosx-x86_64
    • group: ai.onnx name: runtime-gpu version: 1.1.0 classifier: linux-x86_64
    • group: ai.onnx name: runtime-gpu version: 1.1.0 classifier: windows-x86_64
  • group: ai.onnx name: runtime-core version: 1.1.0 = This artifact would contain the java classes and a loader much like the one that currently exists in OnnxRuntime.java. The POM refers to the following classified artifacts. Each of these would contain the relevant onnxruntime4j_jni library for that os/arch.
    • group: ai.onnx name: runtime-core version: 1.1.0 classifier: macosx-x86_64
    • group: ai.onnx name: runtime-core version: 1.1.0 classifier: linux-x86_64
    • group: ai.onnx name: runtime-core version: 1.1.0 classifier: windows-x86_64

Use cases:

  • User wants to get started on a cpu with no customization: use runtime-cpu
  • User wants to get started on a gpu with no customization: use runtime-gpu
  • User wants to provide their own onnxruntime library: use runtime-core

To accommodate that last case, it may be necessary to refactor OnnxRuntime.java to try loading the onnxruntime from the resources first if it is present. onnxruntime4j_jni would be loaded always.
The developer could have loaded onnxruntime using a System.load or System.loadLibrary prior, in that case the link dependency would already be satisfied. If it was not already loaded, then I believe the java lib path is used. If no onnxruntime lib was found, then onnxruntime4j_jni would throw an unsatisfued link exception.

Let me know if you have any other questions or ideas.

@yuzawa-san
Copy link
Contributor Author

We could take inspiration from the tensorflow maven configuration https://github.com/tensorflow/tensorflow/tree/master/tensorflow/java/maven

They have an artifact with the classes (libtensorflow) and another two artifacts for the jni code (libtensorflow_jni and libtensorflow_jni_gpu). It appears both have both the main library (like libtensorflow_framework.so) and the JNI library (like libtensorflow_jni.so) in the same artifacts. I guess that limits the ability to "bring your own libtensorflow_framework". Additionally, they do not use classifiers and put all os/arch (linux,windows,darwin) into the same artifacts.

@yuzawa-san
Copy link
Contributor Author

I may attempt to package the artifacts in the manner posted above and post a gist or maybe open a pr. Is there a preference amongst the maintainers of maven versus gradle? I guess what is easier to work with in the azure pipeline environments? I am more accustomed to gradle. Gradle has a nice wrapper, and I think maven does as well but it is not built in.

@Craigacp
Copy link
Contributor

Craigacp commented Jan 14, 2020

The TF Java config is changing towards a JavaCPP style one, as the project is moving to use JavaCPP for it's native bindings.

I think it would be preferable to have a pom file which gets integrated into the jar by calling jar again, rather than using maven itself. Otherwise it's tricky to propagate the appropriate compile time flags from the CMake system through to the JNI compilation. I originally had the Java parts as a Gradle project but it was difficult to integrate into the build system for the rest of the project.

@saudet
Copy link

saudet commented Feb 3, 2020

It's very easy to get this working with GitHub Actions and JavaCPP, indeed. Check tensorflow/java#25 for an example of what we can do with about 100 lines of YAML of these days. If the only thing missing from JavaCPP is a plugin for Gradle, if there is a demand for it, this is something I can work on.

BTW, there isn't any issue with calling the CMake build from Maven, as per with these snapshots here: https://github.com/bytedeco/javacpp-presets/tree/master/onnxruntime
I wouldn't expect any problems using Gradle either.
@Craigacp Could you elaborate on the problems that you faced?

/cc @frankfliu

@Craigacp
Copy link
Contributor

Craigacp commented Feb 3, 2020

@saudet that's the wrong way around. It needs to call maven from cmake with the appropriate flags. @yuzawa-san has already refactored the build to use gradle for the Java parts, it's awaiting merging.

@saudet
Copy link

saudet commented Feb 4, 2020

@Craigacp That's not an answer to my question. I asked you that question before, and I also asked you before why you didn't like JavaCPP, and you didn't answer, but maybe that's one of the whys, so please elaborate more if you could. It would be helpful.

@Craigacp
Copy link
Contributor

Craigacp commented Feb 4, 2020

The aim is to drive the whole build through the build.py script and CMake combination that the rest of the library uses. The JNI binding needs to be compiled using the same flags as the onnx runtime native library, so it's easier to leave that in CMake. I originally had the CMake compile the Java parts and generate the native headers too, but CMake isn't very good at building Java projects so it's tricky to make the build more flexible. The new PR moves the Java build over to gradle while still using CMake to drive the whole build and to compile the jni parts. Calling CMake from Maven or Gradle doesn't help as maven/gradle isn't the start of the build process. The aim is to build the onnx runtime native library along with whichever of the c#, python and Java bindings are requested at the same time.

@saudet
Copy link

saudet commented Feb 5, 2020

The JNI binding needs to be compiled using the same flags as the onnx runtime native library

I see where you're coming from, but no, it doesn't. You're wrong. In fact, JNI bindings sometimes require a different set of flags than the rest of the code. CMake, Bazel, Autoconf, etc don't come with any special support for JNI for that. We need to tack these things on manually anyway, so why not do everything in one single place, such as with a tool like JavaCPP with it's own little "build system" for JNI, which we can also use as a plugin for Maven, Gradle, etc to build alongside Java source files? That makes much more sense to me than adding proper support for JNI and Java to all native build tools! I also discussed the value of this with @karllessard previously, see karllessard/tensorflow-java-sandbox#2 (comment).

Anyway, thinking about it some more, the C API is fine for what we need for our own applications, for
KonduitAI/konduit-serving#212. The Java bindings that you're offering in this repository are geared towards idiomatic Java, but we're not going to have our users use ONNX Runtime directly anyway and JavaCPP gets us 90% of the way in terms of usability of the resulting Java API. However, in production, we do care about performance, which isn't one of your goals here, so it doesn't make sense to rely on bindings that doesn't allow us to meet our targets. JavaCPP does prioritize performance, and it's typically even faster than manually written JNI, for example, see tensorflow/java#18 (comment). Moreover, as a company, we can't honestly support a system that has as many known issues as listed on pull #2215 (comment), and that you still refuse to do anything about.

Given your employer, you should know about the importance of all this for enterprises. In the meantime though, the C API is fine us, but for the sake of the community, I do hope one day you understand the issues at play here.

@Craigacp
Copy link
Contributor

Craigacp commented Feb 5, 2020

The JNI bindings are compiled with the same flags as that allows them to use conditional compilation to turn on or off the various execution providers. When they are turned off it throws a catchable exception that says this provider isn't available and you need to recompile the native library. There is currently no way to probe the available providers from the C or C++ APIs. Without this functionality then the Java binding could not present a sensible error message when the user requested a provider which was not available, and as Java doesn't have conditional compilation then the Java code has to present entry points for all the providers. If you know of a cross platform way to probe the functions exported by a native library from Java, then I'd be happy to replace this code. I note that the JavaCPP preset does not have any of this functionality, and so is limited to the CPU provider. It doesn't even appear to use the DNNL provider that it's compiled against as the function that enables DNNL on a SessionOptions is not present.

The native loading procedure is properly documented in the package-info.java and in a Java specific README.md in the PR for the new build system. It should have been documented in those places in the current release, but it was an oversight on my part.

If you have performance concerns or other issues that aren't simply a cover for complaints about not using your JavaCPP library, then open an issue or file a PR.

@saudet
Copy link

saudet commented Feb 17, 2020

The JNI bindings are compiled with the same flags as that allows them to use conditional compilation to turn on or off the various execution providers. When they are turned off it throws a catchable exception that says this provider isn't available and you need to recompile the native library. There is currently no way to probe the available providers from the C or C++ APIs. Without this functionality then the Java binding could not present a sensible error message when the user requested a provider which was not available, and as Java doesn't have conditional compilation then the Java code has to present entry points for all the providers. If you know of a cross platform way to probe the functions exported by a native library from Java, then I'd be happy to replace this code. I note that the JavaCPP preset does not have any of this functionality, and so is limited to the CPU provider. It doesn't even appear to use the DNNL provider that it's compiled against as the function that enables DNNL on a SessionOptions is not present.

That's not an issue with JavaCPP, but thanks for pointing this out! The C/C++ APIs of ONNX Runtime does in fact support that just fine. As @EmergentOrder found out quickly enough when he started to use them, the only line missing was an include = {..., "onnxruntime/core/providers/dnnl/dnnl_provider_factory.h"}, see bytedeco/javacpp-presets#841. We'll be able to include cuda_provider_factory and others the same way when we start having builds with CUDA, etc enabled.

The native loading procedure is properly documented in the package-info.java and in a Java specific README.md in the PR for the new build system. It should have been documented in those places in the current release, but it was an oversight on my part.

If you have performance concerns or other issues that aren't simply a cover for complaints about not using your JavaCPP library, then open an issue or file a PR.

I'm perfectly fine with not using JavaCPP, but neither you nor your employer are interested in providing an alternative. For Python, we have tools like conda, pip, cython, pybind11, etc that everyone uses. But for some reason that I still don't understand, people like you at Oracle and OpenJDK feel that, in the case of Java, it's perfectly alright for everyone to reinvent the wheel and build everything from scratch. That isn't going to work! In my opinion, you'll need to do something about it, sooner or later.

@XciD
Copy link

XciD commented Mar 3, 2020

Hello,

Any update on this ?

@saudet
Copy link

saudet commented Mar 3, 2020

@Craigacp FYI, I've added a CUDA-enabled build to the JavaCPP Presets and as expected it works just fine. If we try to call OrtSessionOptionsAppendExecutionProvider_CUDA() on a build that doesn't include support for CUDA, we get as below a very standard java.lang.UnsatisfiedLinkError that's very easy to understand and very easy for end users to deal with. Do you agree?

java.lang.UnsatisfiedLinkError: org.bytedeco.onnxruntime.global.onnxruntime.OrtSessionOptionsAppendExecutionProvider_CUDA(Lorg/bytedeco/onnxruntime/OrtSessionOptions;I)Lorg/bytedeco/onnxruntime/OrtStatus;
    at org.bytedeco.onnxruntime.global.onnxruntime.OrtSessionOptionsAppendExecutionProvider_CUDA (Native Method)
    at CXXApiSample.main (CXXApiSample.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:498)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run (ExecJavaMojo.java:282)
    at java.lang.Thread.run (Thread.java:748)

Moreover, snapshots are readily available for Linux, Mac, and Windows, both with and without CUDA:
https://github.com/bytedeco/javacpp-presets/tree/master/onnxruntime#the-pomxml-build-file
with a release "coming soon"...

@saudet
Copy link

saudet commented Mar 21, 2020

I've pushed commit bytedeco/javacpp-presets@f1e2aa4, and the Java API created by @Craigacp now gets bundled with the JavaCPP Presets for ONNX Runtime, so we can use it alongside the C and C++ APIs, all from Java. The ScoreMNIST.java sample works just fine as is with the artifacts from Maven Central on Linux, Mac, Windows, with or without DNNL and/or CUDA, all automatically detected at load time. It's also possible to add support for more platforms (Android, etc) and include more providers as optional extensions (nGraph, etc)...

@Craigacp With that I believe I've shown that JavaCPP has everything we need, but if you still feel there is something missing or something that needs to be fixed, please let me know. I will be happy to work on making JavaCPP better!

@saudet
Copy link

saudet commented Apr 15, 2020

I've released artifacts on Maven Central, which work just as well with Gradle, sbt, etc:
http://search.maven.org/#search%7Cga%7C1%7Corg.bytedeco%20onnxruntime
Let me know if there are any issues with those though. I'll be happy to fix them.

@saudet
Copy link

saudet commented May 28, 2020

BTW, there are now Gradle plugins for JavaCPP. It's very easy to use them for any native library, including ONNX Runtime, either to build it from source or to use existing precompiled artifacts:
https://github.com/bytedeco/gradle-javacpp
If there are any issues with them though, please let me know.

@Craigacp
Copy link
Contributor

Craigacp commented Jun 17, 2020

Binaries are now available on Maven Central for CPU and GPU with the v1.3.1 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request request for unsupported feature or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants