-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modern Java support #861
Comments
I've been looking into this recently, and I thought I'd record some of my findings that others may find interesting:
|
One middle ground would be to use the |
You raise a good point. In my mind, I was lumping together the processes of (1) uncompressing the archive and (2) parsing the classes themselves, but they're really separate things. If the cost of (2) outweighs the cost of (1), then it might not be as vital to optimize (1) (which is what #861 (comment) benchmarks) as I originally thought. |
I suspect the key thing is to control the amount of memory usage, which is more important than the time required. My inclination would be to use |
Upon further thought, the |
I don't think we should take the current SAW interface as given. If we need to force changes to the way JVM code is loaded into SAW to make things work better, I think that should be on the table. |
OK, here's one idea. As SAW is currently designed, we already have to specify the JARs and classpaths that a script needs ahead of time using the Instead of doing this, when SAW starts a script using modern Java, SAW can take all of the specified JARs and classpaths, pass them The upside is that we only need to traverse the classpaths once with this approach. The downside is that this process may take longer than is strictly required if the user specifies more classpaths than what are actually needed to run the classes they care about, but I don't think much can be done about that, save for advising users not to over-approximate their dependencies. |
I like the approach you're suggesting, @RyanGlScott. I think at the very least it's worth trying and benchmarking to see if it has any problematic overhead in common cases. |
Some further investigation reveals that
Although |
I have good news and bad news. tl;dr I have a working prototype, but there are some serious performance issues that we should consider carefully. The good news is that I've pushed a proof-of-concept implementation of the "use The bad news is that the startup overhead imposed by this approach is quite severe. I was worried about Another annoying problem is that we extract the contents of the JIMAGE file to a temporary directory which persists after running SAW. This leaves behind about 90 megabytes of cruft each time SAW is invoked. For a single invocation, this isn't really noticeable, but if you were to invoke SAW many times consecutively, I could foresee this adding up quickly. In fact, I'm too afraid to run the test suite at the moment, since I'm worried that I'll chew through too much disk space... The temporary directory size problem is likely solvable if we make sure to clean up the temporary directory that SAW extracts JIMAGE files to after SAW stops execution. (It's not yet clear to me what the best way to do this is, since SAW has various directions the control can flow in, but I'm sure there's a way we could accomplish this with something |
If the bad news from #861 (comment) has you frightened, perhaps we should consider some alternatives. Some ideas that come to mind:
|
Based on the above, I'm somewhat inclined to skip
The exact syntax of the string passed to One detail of the above is that it requires knowing where the
Does that sound reasonable to you? Am I missing anything? |
Ah, I really should have listed your suggestion—use A few refinements to your suggestions:
I think that if we want to do this The Right Way™, we should emulate Java's module path. As the name suggests, the module path is where Java looks for modules (in the form of JIMAGE files), in the exact same way that the classpath is where Java looks for classes (in the form of JAR and If SAW were to gain a
If we only ever extract a single class at a time, then I don't think we need to share a single temporary directory across all
As mentioned above, we can avoid ever adding the results of
One correction here: if you don't use This is basically the same approach that I tried in #861 (comment), but without the intermediate |
Also, thanks for teaching me about
I strongly suspect that this omits many classes that this would transitively depend on (e.g., |
Yeah, I think I agree with you about the Right Way of adding a notion of module path in addition to the existing class path. One thing I'd really like, however, would be for it to be possible for it to "just work" in many cases without user input. The need to specify the location of You make a really good point about not needing to keep And you may be right that the output of One final point is that, although I think it's worth having support for multiple JAR or JIMAGE dependency files, the vast, vast majority of the cases are likely to involve only the single My inclination for search order would be to first search the module path, then JAR files, then classpath directories, though I admit I haven't given it that much thought. |
I agree. It may be worth adding
Iterating
On the other hand,
I hunted around through
Even if we never do end up using a JAR or JIMAGE file beyond the ones shipped with Java itself, I'll sleep a little easier at night knowing that we could do so in the future if the need arose :) |
I spoke with @atomb about this recently. Here are some takeaways:
How, indeed, should we amortize the cost? The most expensive part of using Some questions:
|
Most of the action happens in: * The new `SAWScript.JavaTools` module, which exports functionality for locating a Java executable and finding its system properties. * `SAWScript.Options`, which now exposes a new `--java-bin-dirs` flag. (Alternatively, one can use the `PATH`.) The `processEnv` function now performs additional post-processing based on whether `--java-bin-dirs`/`PATH` are set. Fixes #1022, and paves the way to a better user experience for #861.
Most of the action happens in: * The new `SAWScript.JavaTools` module, which exports functionality for locating a Java executable and finding its system properties. * `SAWScript.Options`, which now exposes a new `--java-bin-dirs` flag. (Alternatively, one can use the `PATH`.) The `processEnv` function now performs additional post-processing based on whether `--java-bin-dirs`/`PATH` are set. Fixes #1022, and paves the way to a better user experience for #861.
One correction to my musings above: the module path actually isn't where one specifies the locations of JIMAGE files. Rather, it's the location where Java searches for JARs and This is arguably a good thing from SAW's point of view, as it means that we can avoid having a separate command-line flag for JIMAGE files at all. Instead, you point SAW at the directory where |
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `SAWScript.JavaCodebase`. This fixes #861. This depends on #1030 to work. Remaining tasks: * Ideally, the code in `SAWScript.JavaCodebase` would be upstreamed to `crucible-jvm`, where the all-important `Codebase` data type lives. Unfortunately, some parts of SAW (e.g., `java_verify` still rely on the `jvm-verifier` library, which defines a separate `Codebase` type. SAW is in the process of phasing out the use of `jvm-verifier` in favor of `crucible-jvm` (see #993), but until that happens, I needed to introduce some ugly hacks in order to make everything typecheck. In particular, the (hopefully temporary) `SAWScript.JavaCodebase` module defines a shim version of `Codebase` that puts the experimental new things that I added in an `ExperimentalCodebase` constructor, but preserving the ability to use the `jvm-verifier` version of `Codebase` in the `LegacyCodebase` constructor. If JDK 8 or earlier is used, then `LegacyCodebase` is chosen, and if JDK 9 or later is used, then `ExperimentalCodebase` is chosen. * Unfortunately, `java_verify` doesn't work with `ExperimentalCodebase`. Nor would we necessarily want to make this happen, as that would require upstreaming changes to `jvm-verifier`, which we are in the process of phasing out. As a result, this is blocked on #993. * The CI should be updated to test more versions of the JDK than just 8. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file.
I've implemented the caching idea from #861 (comment) in #1046, operating at the granularity of extracting one class at a time from The PR isn't ready to be merged yet, as there are some tasks that it is blocked on (#993 and #1030), but I'd welcome any feedback you could provide. |
Most of the action happens in: * The new `SAWScript.JavaTools` module, which exports functionality for locating a Java executable and finding its system properties. * `SAWScript.Options`, which now exposes a new `--java-bin-dirs` flag. (Alternatively, one can use the `PATH`.) The `processEnv` function now performs additional post-processing based on whether `--java-bin-dirs`/`PATH` are set. Fixes #1022, and paves the way to a better user experience for #861. Other things: * I ended up cargo-culting some `process`-related code from `SAWScript.Builtins` for use in `SAWScript.JavaTools`. To avoid blatant code duplication (and because I end up needing the exact same code later in another patch), I factored out this code into the new `SAWScript.ProcessUtils` module. I considered putting it in the existing `SAWScript.Utils` module, but that would have led to import cycles. Sigh.
Most of the action happens in: * The new `SAWScript.JavaTools` module, which exports functionality for locating a Java executable and finding its system properties. * `SAWScript.Options`, which now exposes a new `--java-bin-dirs` flag. (Alternatively, one can use the `PATH`.) The `processEnv` function now performs additional post-processing based on whether `--java-bin-dirs`/`PATH` are set. Fixes #1022, and paves the way to a better user experience for #861. Other things: * I ended up cargo-culting some `process`-related code from `SAWScript.Builtins` for use in `SAWScript.JavaTools`. To avoid blatant code duplication (and because I end up needing the exact same code later in another patch), I factored out this code into the new `SAWScript.ProcessUtils` module. I considered putting it in the existing `SAWScript.Utils` module, but that would have led to import cycles. Sigh.
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `SAWScript.JavaCodebase`. This fixes #861. Remaining tasks: * Ideally, the code in `SAWScript.JavaCodebase` would be upstreamed to `crucible-jvm`, where the all-important `Codebase` data type lives. Unfortunately, some parts of SAW (e.g., `java_verify` still rely on the `jvm-verifier` library, which defines a separate `Codebase` type. SAW is in the process of phasing out the use of `jvm-verifier` in favor of `crucible-jvm` (see #993), but until that happens, I needed to introduce some ugly hacks in order to make everything typecheck. In particular, the (hopefully temporary) `SAWScript.JavaCodebase` module defines a shim version of `Codebase` that puts the experimental new things that I added in an `ExperimentalCodebase` constructor, but preserving the ability to use the `jvm-verifier` version of `Codebase` in the `LegacyCodebase` constructor. If JDK 8 or earlier is used, then `LegacyCodebase` is chosen, and if JDK 9 or later is used, then `ExperimentalCodebase` is chosen. * Unfortunately, `java_verify` doesn't work with `ExperimentalCodebase`. Nor would we necessarily want to make this happen, as that would require upstreaming changes to `jvm-verifier`, which we are in the process of phasing out. As a result, this is blocked on #993. * The CI should be updated to test more versions of the JDK than just 8. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file.
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `SAWScript.JavaCodebase`. This fixes #861. Remaining tasks: * Ideally, the code in `SAWScript.JavaCodebase` would be upstreamed to `crucible-jvm`, where the all-important `Codebase` data type lives. Until that happens, I needed to introduce some ugly hacks in order to make everything typecheck. In particular, the (hopefully temporary) `SAWScript.JavaCodebase` module defines a version of `Codebase` that keeps track of JIMAGE-related paths. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it. This fixes #1003.
This allows `crucible-jvm` to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `Lang.JVM.Codebase`. This is part of a fix for GaloisInc/saw-script#861.
See GaloisInc/crucible#638 for the |
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. This leverages `crucible-jvm` changes from GaloisInc/crucible#634. This fixes #861. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it. This fixes #1003.
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. This leverages `crucible-jvm` changes from GaloisInc/crucible#634. This fixes #861. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it.
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. This leverages `crucible-jvm` changes from GaloisInc/crucible#634. This fixes #861. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it.
This allows SAW to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. This leverages `crucible-jvm` changes from GaloisInc/crucible#634. This fixes #861. Other things: * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it.
Once again, I have good news and bad news. The good news is that I've factored out basically all of the necessary functionality for reading from JIMAGE files into The bad news is that despite all this, GaloisInc/crucible#638 is actually quite broken still. I only realized this after trying to run the
In light of this, I'm forced to admit that there's still quite a bit of investigation that needs to happen to diagnose why Here is one proposal: given that this works well enough with the sorts of Java programs that SAW includes in its test suite, we could merge GaloisInc/crucible#638 and #1046 now, but explicitly label support for JDK 9+ as experimental in the SAW/Crucible documentation. (We may even consider printing a warning when a user launches these tools with JDK 9+.) We could then open a separate issue in the Does this sound reasonable? Or do you think we should hold off until more effort has been spent debugging these issues? |
Thanks for the detailed analysis! My inclination would be to go ahead and merge this (with appropriate warnings about JDK 9+). This is at least a strict improvement over what's in there right now. Most of the sort of verification we normally do with SAW will then work with newer JDKs, but stuff using more of the standard library still won't. That's better than not being able to do anything with modern JDKs. |
This adds basic functionality for `crucible-jvm` to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `Lang.JVM.Codebase`. This is part of a fix for GaloisInc/saw-script#861. In general, support for JDK 9 or later is still experimental, as there are still unresolved bugs to diagnose. See #641.
In that case, I've:
|
This adds basic functionality for `crucible-jvm` to deal with JDK 9 or later, which packages its standard library not in a JAR file, but in a JIMAGE file. Extracting `.class` files from JIMAGE files proves to be surprisingly tricky, and I've carefully documented the intricacies of doing so in `Note [Loading classes from JIMAGE files]` in `Lang.JVM.Codebase`. This is part of a fix for GaloisInc/saw-script#861. In general, support for JDK 9 or later is still experimental, as there are still unresolved bugs to diagnose. See #641.
This bumps the `crucible` submodule to include GaloisInc/crucible#638, which adds basic support for handling JDK 9 or later. JDK 9+ packages its standard library not in a JAR file, but in a JIMAGE file. For more details on how `crucible-jvm` handles JIMAGE files, refer to `Note [Loading classes from JIMAGE files]` in `Lang.JVM.Codebase`. This fixes #861, although there are still unsolved issues that arise when using modern JDKs with certain classes, such as `String`. As a result, I have decided to label support for JDK 9+ as experimental: * I have updated the SAW documentation to mention these shortcomings. * I have opened GaloisInc/crucible#641 to track the remaining issues. Other things: * GaloisInc/crucible#636 and GaloisInc/crucible#638 upstreamed the code from `SAWScript.JavaTools` and `SAWScript.ProcessUtils` into `crucible-jvm`, so we can remove these modules in favor of importing `Lang.JVM.JavaTools` and `Lang.JVM.ProcessUtils` from `crucible-jvm`. * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it.
This bumps the `crucible` submodule to include GaloisInc/crucible#638, which adds basic support for handling JDK 9 or later. JDK 9+ packages its standard library not in a JAR file, but in a JIMAGE file. For more details on how `crucible-jvm` handles JIMAGE files, refer to `Note [Loading classes from JIMAGE files]` in `Lang.JVM.Codebase`. This fixes #861, although there are still unsolved issues that arise when using modern JDKs with certain classes, such as `String`. As a result, I have decided to label support for JDK 9+ as experimental: * I have updated the SAW documentation to mention these shortcomings. * I have opened GaloisInc/crucible#641 to track the remaining issues. Other things: * GaloisInc/crucible#636 and GaloisInc/crucible#638 upstreamed the code from `SAWScript.JavaTools` and `SAWScript.ProcessUtils` into `crucible-jvm`, so we can remove these modules in favor of importing `Lang.JVM.JavaTools` and `Lang.JVM.ProcessUtils` from `crucible-jvm`. * I removed the dependency on the `xdg-basedir`, as it was unused. This dependency was likely added quite some time ago, and it appears that `saw-script` switched over to using XDG-related functionality from the `directory` library since then. I opted to use `directory` to find the `.cache` directory as well, so I have made that clear in the `.cabal` file. * The `biJavaCodebase :: Codebase` field of `BuiltinContext` is completely unused, which I noticed when making changes to the `Codebase` type. Let's just remove it.
For the sake of posterity, it's worth recording that my earlier claim that the test suite passes with JDK 15 is not quite true. The
|
I discovered, while working on standing up a new computer, that the way the SAW infrastructure interacts with the Java ecosystem is by now quite outdated. Starting with Java 9, the JRE no longer distributes its base libraries in a monolithic
rt.jar
, but instead distributes these classes in a new 'JMOD' file format. This is part of an overall infrastructure philosophy change that introduces a new link phase and utilityjlink
, which gathers together all the transitive dependencies of an executable into a "linked" executable.The following is the best explanation of this system I've found https://stackoverflow.com/questions/44732915/why-did-java-9-introduce-the-jmod-file-format
For our purposes, I think it would actually makes our life simpler. As we do with with LLVM (via
clang
andllvm-link
), we can rely on the JDK utilities likejlink
andjimage
to collect together all the pieces we need into one place, rather than having to do complicated classpath handling, discovering where the system has installed various bits we need, etc.The text was updated successfully, but these errors were encountered: