Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make JVM version configurable per-module #3716

Merged
merged 37 commits into from
Nov 20, 2024

Conversation

albertpchen
Copy link
Contributor

@albertpchen albertpchen commented Oct 10, 2024

Fixes #3480.

Changes:

  • adds coursier-jvm as a dependency to the util module, this library is for fetching JVMs using coursier and the coursier jvm-index

  • add a def javaHome: Task[Option[PathRef]] task to ZincWorkerModule. This defaults to None, which is the existing behavior of using mill's java home.

  • Users who want to use a custom JVM need to define a custom ZincWorkerModule and override def jvmId, optionally def jvmIndexVersion

  • updates the mockito and commons-io examples to use the new configuration options. Now these examples should run even when mill itself is not running on java 11.

  • This also required adding -encoding utf-8 to the projects' javac options.

  • update the run and test tasks to use the zincWorker's java home if a different one specified

Notes:

  • JavaModule#compile forks a new JVM for each compilation to support custom JVM versions, and ScalaModule#compile does not support custom JVM versions at all. This would require a deeper refactoring of ZincWorkerModule to fix

@albertpchen
Copy link
Contributor Author

I think I'm running into the same issue as this sbt/sbt-assembly#496. coursier-jvm pulls in shapless which has the problematic class. not sure what the correct fix is but I can think of a few options

  1. add export LC_ALL=C.UTF-8 to the workflow as mentioned in the issue
  2. since we might not want shapeless as a dependency anyway, reimplement the parts of coursier-jvm that we need. I don't think this should be too hard, since coursier already has all the fetching and caching logic

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

At a first skim this looks great. I didnt know coursier could fetch JVMs or that it was so easy to integrate

Is it possible to configure a module to point it at a JVM not listed in coursier's index? I imagine there would be some use cases for that

The cross module stuff is lazily initialized then cached, so it should be OK to perform heavy downloads and initialization, but is it possible to move the download into a task? That would help benefit from parallelization and log handling and error reporting and all that

@alexarchambault should probably review this as well, since he knows more about coursier than i do. One question is whether coursier JVM downloads are safe to run in parallel, or if they need to be locked/retries in mill and/or made safe upstream

@albertpchen
Copy link
Contributor Author

Is it possible to configure a module to point it at a JVM not listed in coursier's index? I imagine there would be some use cases for that

yes this is possible. I added def javaHome: Task[Option[PathRef]] task to the base ZincWorkerModule trait that can be overridden to set the JVM to an arbitrary path. we could also expose a configuration option to change the url we fetch the index from or make the in-memory JvmIndex mapping a configuration option. I'm not sure how many knobs should be exposed though, so I just added the one javaHome option.

The cross module stuff is lazily initialized then cached, so it should be OK to perform heavy downloads and initialization, but is it possible to move the download into a task? That would help benefit from parallelization and log handling and error reporting and all that

How do you make the cross value for a cross module the output of a task? I'm not sure how, since it needs to be invoked outside of a Task { ... } block

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Ah I see, you mean the index itself, and not the JVMs that the index references.

I think what we should do here is avoid using a Cross module to list out the various ZincWorkerModule.ForJvms, and instead just make it something that the user has to instantiate in their build. Something like:

object myZincWorker = new ZincWorkerModule.ForJvm("graalvm-community:23.0.0") 
object core extends JavaModule {
  override def zincWorker = ModuleRef(myZincWorker)
  object test extends JavaTests with TestModule.Junit4
}

That is a tiny bit more boilerplate in the build file, but avoids the need for Mill to know about the contents of the coursier index entirely, and avoids having a huge cross module inside ZincWorkerModule that is mostly unused.

We still want to provide a way to download the index and list out its entries for debugging purposes, but that can be a normal task or command since it won't change the shape of the moduletree/taskgraph

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault one question about https://github.com/coursier/jvm-index: do we offer any guarantees on that? e.g. does Coursier need to fetch it from Github every time? I'm concerned about having an operational dependency here, e.g. I don't want people's Mill builds to start failing if Github has an outage (that's also why we moved the Mill assembly jar from Github to Maven Central in 0.11.x)

To avoid a dependency on Github, maybe we can publish the index to maven central somewhere, since that's already a piece of infrastructure we are bound to? Or we can bundle snapshots of it with Mill when we make releases

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Come to think of it, if the index is bundled with Mill, then having it used to initialize the Cross module becomes not an issue at all

@alexarchambault
Copy link
Collaborator

one question about https://github.com/coursier/jvm-index: do we offer any guarantees on that? e.g. does Coursier need to fetch it from Github every time? I'm concerned about having an operational dependency here, e.g. I don't want people's Mill builds to start failing if Github has an outage (that's also why we moved the Mill assembly jar from Github to Maven Central in 0.11.x)

It's fetched by coursier just like a snapshot artifact. So it lives in the coursier cache (cs get https://github.com/coursier/jvm-index/raw/master/index.json fetches it and prints its path), and coursier checks for updates if the last check is longer than the coursier TTL (24h by default). If it's not in cache, it needs to be downloaded from GitHub, yes.

@alexarchambault
Copy link
Collaborator

It used to be pushed to Maven Central too, some time ago. This can always be re-instated (coursier/jvm-index#277).

@alexarchambault
Copy link
Collaborator

Come to think of it, if the index is bundled with Mill, then having it used to initialize the Cross module becomes not an issue at all

It's up to you, but beware that it's getting more and more heavyweight (~1.8 MB uncompressed, 110 kB gzipped)

@alexarchambault
Copy link
Collaborator

@albertpchen It looks like you inlined quite some code from coursier-jvm. Is it because of the shapeless issue you mentioned? It seems it's pulled by coursier-jvm by mistake (or not to break bin compat, I don't recall). You can safely exclude it manually when you pull it in Mill.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault I think 110kb is fine to bundle, the other issue is that it would fall stale relatively wuickly as new versions are added

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version. Could the logic to compute these be bundled into Mill, rather than bundling the generated JSON? That would have the effect that new versions of known distirbutions would automatically be supported, and we would only need to make a new release if new distributions are added or the URL pattern for a distribution changes (both of which should be much rarer than existing distirbutions adding new versions).

We could still bundle a list of known good versions, for debugging and error reporting purposes, but it would mean we could support unknown good versions as well

@alexarchambault
Copy link
Collaborator

alexarchambault commented Oct 11, 2024

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version

Somehow, except there are often subtleties. In https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.7+7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.7_7.tar.gz for example, 17.0.7+7 is the "build" version, and it ends up as both 17.0.7+7 and 17.0.7_7 in the URL.

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

I glanced through the indices, and it seems like a lot of the download paths are computed based on the artifact version

Somehow, except there are often subtleties. In https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.7+7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.7_7.tar.gz for example, 17.0.7+7 is the "build" version, and it ends up as both 17.0.7+7 and 17.0.7_7 in the URL.

Yes, but that is accounted for by the Scala code in coursier/jvm-index right? So if we bundle the few kb of classfiles we get all these special cases included, as well as support for any future versions (unless they add additional special cases)

@alexarchambault
Copy link
Collaborator

alexarchambault commented Oct 11, 2024

Yes, but that is accounted for by the Scala code in coursier/jvm-index right?

What do you mean? Right now, the index basically look like this:

$ cat "$(cs get https://github.com/coursier/jvm-index/raw/master/index.json)" | jq '.darwin.amd64["jdk@temurin"]'
…
  "1.17": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17%2B35/OpenJDK17-jdk_x64_mac_hotspot_17_35.tar.gz",
  "1.17.0.1": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.1%2B12/OpenJDK17U-jdk_x64_mac_hotspot_17.0.1_12.tar.gz",
  "1.17.0.10": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.10%2B7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.10_7.tar.gz",
  "1.17.0.11": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.11%2B9/OpenJDK17U-jdk_x64_mac_hotspot_17.0.11_9.tar.gz",
  "1.17.0.12": "tgz+https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.12%2B7/OpenJDK17U-jdk_x64_mac_hotspot_17.0.12_7.tar.gz",
…

There's some logic in the coursier-jvm module, so that the superfluous 1. is ignored, and versions such as just 17 actually match the highest 17.x version… if there's no exact match (there's one here, but 17.0 will actually match the highest 17.0.x version)

@alexarchambault
Copy link
Collaborator

But we can probably "compress" that a bit, with a pattern and "build" version listings (in the snippet above, 17+35, 17.0.1+12, etc., rather than 17, 17.0.1, etc.) and allow users to pass newer build versions if they'd like. But these build versions don't look handy…

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

@alexarchambault i mean the scala code in https://github.com/coursier/jvm-index/tree/master/src; isnt that what generates the json?

@alexarchambault
Copy link
Collaborator

i mean the scala code in https://github.com/coursier/jvm-index/tree/master/src; isnt that what generates the json?

It does, yeah

@lihaoyi
Copy link
Member

lihaoyi commented Oct 11, 2024

Could that coxe be bundled with Mill to generate the download URL from a jdk version that gets dynamically passed in? I see some parts of it are passing github tokens around, but its unclear to me how much of the logic needs to talk to github vs how much is just mangling strings locally

@albertpchen
Copy link
Contributor Author

@albertpchen It looks like you inlined quite some code from coursier-jvm. Is it because of the shapeless issue you mentioned? It seems it's pulled by coursier-jvm by mistake (or not to break bin compat, I don't recall). You can safely exclude it manually when you pull it in Mill.

yeah thanks for the tip. I've updated the PR

def ivyDeps = Agg(
build.Deps.coursier,
build.Deps.coursierJvm.exclude(
"com.github.alexarchambault" -> "argonaut-shapeless_6.3_2.13",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here explaining why it is excluded, for future reference

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that bumping coursier to 2.1.15 or higher should make that exclude unnecessary (trying that in #3943)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you did it already in this PR, so the exclude can be removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exclude can be removed here


// The first eight bytes are magic numbers followed by two bytes for major version and two bytes for minor version
// We are overriding to java 23 which corresponds to class file version 67
os.read.bytes(coreClassFile.get, 4, 4).toSeq == Seq[Byte](0, 0, 0, 67)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add a test to exercise .test and .run and verify that System.getProperty("java.version") returns the expected value. We should also exercise this for two different java versions, to verify that it's not just picking up the environmental default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should overriding the java version also affect .launcher and .assembly?

  • launcher is explicitly not meant to be portable: it's meant to be something you run on the same machine where it was built, so any JVM you installed would still be present

  • assembly is meant to be portable, e.g. run on a different machine from which it was built, so we probably can't rely on the configured java home for that

@lihaoyi
Copy link
Member

lihaoyi commented Nov 18, 2024

Took a pass at cleaning up the PR, I think it's in pretty good state now. @alexarchambault and @lefou can take another review pass.

One existing limitation is that setting the JVM version does not apply to Scala module compilation, as AFAICT Zinc does not support setting the javaHome for compiling Scala sbt/zinc#1498. We may need to manually fork and manage our own JVM subprocesses to make that work, which I think we can leave to a follow up

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from an quick scan. One concern about bin compat; see below.

@@ -1126,14 +1130,18 @@ trait JavaModule
)

@internal
def bspJvmBuildTarget: JvmBuildTarget =
def bspJvmBuildTarget: Task[JvmBuildTarget] = Task.Anon {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing this signature might produce trouble, since plugins extending JavaModule in traits could break.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it in place and add a new method

build.mill Outdated
@@ -118,6 +118,7 @@ object Deps {

val coursier = ivy"io.get-coursier::coursier:2.1.16"
val coursierInterface = ivy"io.get-coursier:interface:1.0.22"
val coursierJvm = ivy"io.get-coursier::coursier-jvm:2.1.15"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out-of-sync with the coursier version just above. Maybe the version could be put in a coursierVersion variable.

def ivyDeps = Agg(
build.Deps.coursier,
build.Deps.coursierJvm.exclude(
"com.github.alexarchambault" -> "argonaut-shapeless_6.3_2.13",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exclude can be removed here

def jvmIndex0(
ctx: Option[mill.api.Ctx.Log] = None,
coursierCacheCustomizer: Option[FileCache[Task] => FileCache[Task]] = None,
jvmIndexVersion: String = "release.latest"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be latest.release

Suggested change
jvmIndexVersion: String = "release.latest"
jvmIndexVersion: String = "latest.release"

cache = coursierCache0, // the coursier.cache.Cache instance to use
repositories = Resolve().repositories, // repositories to use
indexChannel = JvmChannel.module(
JvmChannel.centralModule(JvmChannel.defaultOs(), JvmChannel.defaultArchitecture()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default values don't need to be explicitly passed here

Suggested change
JvmChannel.centralModule(JvmChannel.defaultOs(), JvmChannel.defaultArchitecture()),
JvmChannel.centralModule(),

Comment on lines 189 to 191
), // use new indices published to Maven Central
os = None, // use defaults
arch = None // use defaults
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values can be removed (coursier/coursier#3160 undeprecates the JvmIndex.load overload in the next coursier release)

Suggested change
), // use new indices published to Maven Central
os = None, // use defaults
arch = None // use defaults
) // use new indices published to Maven Central

mainArgs,
workingDir,
streamOut,
true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this should be check

Suggested change
true,
check,

Comment on lines 56 to 63
if (process.exitCode == 0) {
val a = process.out.lines()
a
} else {
throw new Exception(
"Discover main classes subprocess failed (exit code " + process.exitCode + ")"
)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this requires passing check = false to callSubprocess just above (process.exitCode will be 0 in all cases else)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go with check = true and remove the special handling instead. This should "never" fail, and if it does better to have a verbose stack trace than an anemic custom error

trait ZincWorkerModule extends mill.Module with OfflineSupportModule with CoursierModule {
def jvmId: mill.define.Target[String] = Task[String] { "" }

def jvmIndexVersion: mill.define.Target[String] = "0.0.4-70-51469f"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this version should be managed from build.sc, so that we can update it from there

@lihaoyi lihaoyi merged commit a458d47 into com-lihaoyi:main Nov 20, 2024
27 checks passed
@lihaoyi
Copy link
Member

lihaoyi commented Nov 20, 2024

Thanks for your contribution @albertpchen! I'll pay out half the bounty for this PR since we split the work, email me your international bank transfer details to [email protected] and I will transfer the funds

)

@internal
def bspJvmBuildTargetTask: Task[JvmBuildTarget] = Task.Anon {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new impl will miss any customizations. It should instead reuse the value of deprecated method.

@lefou lefou added this to the 0.12.3 milestone Nov 21, 2024
@alexarchambault
Copy link
Collaborator

FYI, I just expanded the coursier/jvm-index README

@albertpchen
Copy link
Contributor Author

@lihaoyi sorry about the slow responses, and thanks for finishing up this PR for me! I think coursier-jvm is doing a lot of the heavy lifting here, so if you'd like to give the bounty to @alexarchambault that's fine with me.

@lihaoyi
Copy link
Member

lihaoyi commented Dec 14, 2024

@albertpchen no worries! If you don't want to take the bounty, I'll pass it to the Scala Center to support their good work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to select Java versions on a per-module basis (1000USD bounty)
4 participants