Add a `sourceGenerators` task to help out IDE config generation #4621

Baccata · 2025-02-25T10:21:32Z

Context : #4530

Adds a DeferredGeneratedSourcesModule that extends JavaModule, providing a def deferredGeneratedSourceTasks alternative to generatedSources. This allows BSP/GenIdea/Bloop to predict source roots without actually running the code generation logic.
Adds a def out to the Ctx class, allowing to query the location out folder within tasks.
Adds unit tests / integration tests where relevant

Previous description (kept for the record) :

@lihaoyi, I want to open this PR to continue the discussion. At the time of writing this, this adds a generatedSourceRoots parameter for tasks, as you've described, and uses it to compute the source directories for the bloop configuration, as opposed to actually running the allSources task. A similar change would need to be achieved in GenIdea and BSP.

So now, the problem I see is as follows : in order for this to be useful, it needs to be applied to all source generators provided OOTB by the mill codebase, but also third party providers of source-generator (like smithy4s), as the ability to jump to the sources is paramount for the user experience in IDEs.

Applying this change is a fair bit of work in this codebase, and a big ask for the larger ecosystem, and although I think SBT's approach of treating all source generators lazily is preferable to mill's current approach of running all source generators eagerly, I don't think I can weigh the benefits of changing the approach against the cost induced by breaking IDE behaviour for existing plugins.

With that in mind here's what I think would be a good mitigation :

// Rather than traversing the transitive inputs of `allSources`, we define this setting as a way to register lazy 
// source generators. We force the contract to return `Unit` to prevent the assumption that a returned `PathRef` would be
// added to the sources. 
def lazySourceGenerators = Seq[T[Unit]] 
 
 // Runs the lazy source generators, returning path refs computed from their `generatedSourceRoots`. 
 // When the tasks don't have `generatedSourceRoots` set, we assume that the `T.dest` is to be added as source root. 
 final def runLazySourceGenerators : T[Seq[PathRef]] = ???
 
// Does not run the lazy source generators, but rather computes the source roots they'll generate code in. 
// Wrapped in T because the evaluator is needed to get the T.dest of the lazy source generators. I presume that is 
// something we could thread in through the ctx (like `def destOf(task: T[_]) : Path`)  
final def lazyGeneratedSourceRoots: T[Seq[Path]] = ???
 
 // Keeps the current behaviour intact 
def allSources = T {
    sources() ++ generatedSources() ++ runLazySourceGenerators()
}

The Bloop, GenIdea and BSP then get amended to call sources(), generatedSources() and lazyGeneratedSourceRoots(), but avoids runLazySourceGenerators.

This approach would allow to alleviate the pain point raised in the discussion above, whilst retaining the current behaviour of the ecosystem. Mill plugin providers could be encouraged to use runLazySourceGenerators instead of generatedSources, but users would have a recourse to turn eager source generators into lazy ones if they need it to be (provided a withGeneratedSourceRoots is provided to them, that is)

lihaoyi · 2025-02-25T10:38:28Z

@Baccata one new development worth mentioning is that #4524 targeting 0.13.x, we've moved the computation of JavaModule source paths in general to static def sourcesFolders: Seq[os.SubPath] members computer before evaluation. This was done for reasons unrelated to codegen, but perhaps now that we're doing it a similar approach could be taken for generated sources?

Following the naming convention would just define a def JavaModule#generatedSourcesFolders: T[Seq[os.SubPath]] and use that for IDE codegen, which I think is similar to what you are proposing here, just to a slightly different name

Baccata · 2025-02-25T11:15:22Z

@lihaoyi, with the difference that the subPaths need to be tied to the tasks, as they end up being relative to their T.dest (as opposed to the module's path), hence why my initial proposal in the discussion was to have (roughly) :

case class SourceGenerator(task: T[Unit], sourceRoots: Seq[os.SubPath]) 

trait JavaModule {
  def sourceGenerators : Seq[SourceGenerator]
}

But, putting aside the details, I agree that source generators are better captured statically like the sourceFolders (and like SBT does it). The question remains regarding "what do you want to do with the current generatedSources task ?" : do you want to keep it around to avoid changing everything in mill and its ecosystem and introduce a static alternative (def sourceGenerators as above), or do you want to deprecate it completely in favour of the static version ?

Baccata · 2025-02-25T14:48:29Z

scalalib/src/mill/scalalib/JavaModule.scala

+    val dest = mill
+      .eval
+      .EvaluatorPaths
+      .resolveDestPaths(workspace / "out", task)


don't know whether there's a better way to do it, to be honest

Yeah this is fine. A bit verbose but that's the best we got

Baccata · 2025-02-25T14:49:18Z

scalalib/src/mill/scalalib/JavaModule.scala

  /**
   * The folders containing all source files fed into the compiler
   */
-  def allSources: T[Seq[PathRef]] = Task { sources() ++ generatedSources() }
+  def allSources: T[Seq[PathRef]] =
+    Task { sources() ++ generatedSources() ++ deferredGeneratedSources() }


Keeps the current (0.12.x) behaviour of generatedSources to avoid breaking the ecosystem

Baccata · 2025-02-25T14:50:06Z

contrib/bloop/src/mill/contrib/bloop/BloopImpl.scala

@@ -133,14 +137,19 @@ class BloopImpl(evs: () => Seq[Evaluator], wd: os.Path) extends ExternalModule {
      !module.asInstanceOf[outer.Module].skipBloop
    else true

+  private def deferredAllSources(m: JavaModule): T[Seq[PathRef]] = Task {
+    m.sources() ++ m.generatedSources() ++ m.allDeferredSourceRoots()


Keeps the current (0.12.x) behaviour of generatedSources being run to produce IDE config, in order to avoid breaking the ecosystem. In addition, adds roots that will be produced by deferredSourceGenerators when allSources/compile is run.

Baccata · 2025-02-25T15:54:58Z

See later changes for a showcase of what I meant in #4621 (comment) ... the names are horribly bad but will change them later

Baccata · 2025-02-26T14:47:34Z

bsp/worker/src/mill/bsp/worker/MillBuildServer.scala

@@ -234,12 +234,14 @@ private class MillBuildServer(
        case module: MillBuildRootModule =>


Not sure why this warrants a specific case, considering it extends JavaModule anyway

Could be some left over from a time, where MillBuildRootModule was either not a JavaModule or needed different treatment. If both cases are identical now, we can merge them.

Baccata · 2025-02-26T17:07:20Z

@lihaoyi, @lefou, I think this PR is just about ready, but I've got a failing integration test on my fork that I have no idea how to troubleshoot.

Any chance you could provide guidance ?

lihaoyi · 2025-02-27T06:06:52Z

@Baccata can you try reproducing the failure locally? You can run ci/test-mill-bootstrap.sh

Baccata · 2025-02-27T09:35:04Z

Yes, it reproduces locally. Still extremely puzzled as to what's wrong though.

Baccata · 2025-02-27T09:44:39Z

I'm also able to reproduce on the 0.12.x branch, so I presume it's nothing to do with this particular changeset.

lihaoyi · 2025-02-27T09:49:36Z

Oh I just realized this was targeting 0.12.x. Maybe its fixed by #4625 ?

…asks

Baccata · 2025-02-27T12:37:49Z

Okay I think this is ready. There's some android-specific job that keeps failing, but it seems to be failing on other PRs raised against 0.12.x, so I presume it's unrelated.

…asks

Baccata · 2025-02-28T16:14:57Z

Should I close this PR and re-open it anew to remove the noise of the previous conversations ?

lihaoyi · 2025-02-28T17:45:50Z

No leaving the PR as is is fine, I'll take a look when I have time

lihaoyi · 2025-03-01T01:51:50Z

scalalib/src/mill/scalalib/JavaModule.scala

+   * Computes the list of source roots that will be produced by [[sourceGenerators]] without
+   * actually running the generators in question.
+   */
+  def predictedGeneratedSourceRoots: T[Seq[PathRef]] = T {


The terminology is kind of weird here, as are the types. How about

def deferredGeneratedSourcePaths: T[Seq[os.Path]]

I don't like predicted, since it sounds like it could refer to all generated sources, which it doesn't. The earlier term deferred is a bit better since it sounds more like it is a separate thing from the other generated sources.

If it's just returning the paths, we can just return os.Paths rather than PathRefs

Baccata · 2025-03-05T09:22:15Z

What do you think of moving all changes currently done in JavaModule into a dedicated trait DeferredGeneratedSources and extends in from JavaModule? This will keep them closely together. It will also keep JavaModule more lean and free of that stuff. We could even avoid default-extending it and just mix it in in case it's needed.

Well my personal opinion is really that the current generatedSources should be deprecated in favour of the deferred version, for a similar argument that lead me to start the discussion : I think any kind of IDE integration should be as resilient as possible. I believe that if I have errors in a protobuf file that I use to generate Java or Scala code, I should still be able to load my project into the IDE and be able to change code that's not necessarily related to the protobuf pipeline. I therefore believe that the current generatedSources task, although intuitive, was ill-designed from the get-go, and that the use of the deferred methods should be encouraged instead within the mill ecosystem.

Therefore, I would rather the JavaModule had them than for them to be separate in a mix-in that IDE-config code needs to be aware of. Now I'd personally be fine having them in a separate trait that the JavaModule would inherit from, as conceptually they don't have to be tied to Java/Scala and other integrations could benefit from them.

Now that being said, that's just a personal opinion. If yourself and Haoyi make an executive decision to make this a mix-in, I'll abide and will amend the contribution in consequence.

lihaoyi · 2025-03-05T09:44:22Z

For now lets go with a separate trait. I think it's still a bit early to deprecate the existing workflow and tell people to use the new one. We probably want someone using it for a while and happy with it before we deprecate the old way, which would probably put it some time after the 0.13.0 release

Baccata · 2025-03-05T09:46:39Z

@lihaoyi : just to be clear : you want a separate trait that JavaModule inherits from, or a separate trait that would be mixed-in in userland ?

EDIT : and yeah, I agree that it's premature to deprecate at this point

lihaoyi · 2025-03-05T09:56:27Z

I don't have a strong opinion between those two options. @lefou do you have a preference?

lefou · 2025-03-05T10:30:10Z

TBH, I find the new proposed solution (although being a solution to some immediate issue) ugly and complicated. (I also don't think the current generatedSources is broken in any way. If you have generated sources that aren't required to be properly compiled, those are probably not sources but just resources. I admit, the IDE integration currently doesn't handle failures well.)

Therefore I tend to not force it on all users of JavaModule and make it completely opt-in. So, let user-land mix-in DeferredGeneratedSources if they really need to.

lihaoyi · 2025-03-05T11:03:58Z

mixin sounds good to me

Baccata · 2025-03-05T11:10:21Z

TBH, I find the new proposed solution (although being a solution to some immediate issue) ugly and complicated

I agree I don't like the naming, but don't really want to bikeshed on it, as long as the maintainers are happy with it.

If you have generated sources that aren't required to be properly compiled, those are probably not sources but just resources

I think that's a mis-interpretation of the problem. To simplify it as intuitively as possible :

Should I be able to load a project in my IDE if the Scala code it contains doesn't compile ? => everyone agrees on yes
Should I be able to load a project in my IDE if the protobuf code it depends on to generate Scala code is invalid => ???

Saying that you don't think the current generatedSources is broken in any way implies that your answer to 2) is "no", whereas I think "yes".

I tend to not force it on all users of JavaModule and make it completely opt-in.

Happy to abide 👌

Baccata · 2025-03-05T11:34:18Z

Extracted as another trait in 2d54b8f

lefou · 2025-03-05T11:50:46Z

Should I be able to load a project in my IDE if the protobuf code it depends on to generate Scala code is invalid => ???

Saying that you don't think the current generatedSources is broken in any way implies that your answer to 2) is "no", whereas I think "yes".

My answer is of course "yes". I do think our IDE integration should not fail, if some source generators don't work. We want see some warnings, of course. Also, compilation most likely won't work, even if the IDE bypasses the build tool to compile things, but that not the question here.

I think we need to solve the issue in the IDE integration, not in the task signature. If IDE integration just deals with a failed source generator and continues with either a stub-path or without the path of the generated sources at all (depends on how we fail or how we are able to recover). Once we repaired the source generator, we re-import and the missing generated sources should appear in the IDE. I don't see where in that picture a different task signature for generatedSources is needed. Once we give away an incomplete result (here a target path without the files that are supposed to be on that path), we give up on our principle of a immutable, cachable and reproducible results.

lihaoyi · 2025-03-05T11:56:01Z

I think we need to solve the issue in the IDE integration, not in the task signature. If IDE integration just deals with a failed source generator and continues with either a stub-path or without the path of the generated sources at all (depends on how we fail or how we are able to recover). Once we repaired the source generator, we re-import and the missing generated sources should appear in the IDE. I don't see where in that picture a different task signature for generatedSources is needed. Once we give away an incomplete result (here a target path without the files that are supposed to be on that path), we give up on our principle of a immutable, cachable and reproducible results.

The basic problem this PR is trying to solve is that currently, as I understand it, is that Mill does not provide the path to generated sources if generation fails: you only get a PathRef on success and nothing on failure. So there's no way to continue with a stub path, because no such path is available. def generatedSources gives you a Seq[PathRef] that works great on success, but gives nothing at all on failure, so there's nothing for the IDE to work with

This PR doesn't change the idea of immutable/cacheable/reproducible results, all it does is let you list out which tasks will contain generated sources, and at what paths, without needing to actually run them. That is what allows the IDE to have paths it can use even if the tasks themselves fail

lefou · 2025-03-05T12:19:06Z

Might be there is an unspoken assumption: We don't want to re-import/refresh the project in the IDE after we fixed source generation? Which means, all source paths need to stay stable. This is equivalent with what we modelled in bspCompileClassesPath or bspCompileClasspath which both return some UnresolvedPaths. These are typically implemented as follows: In case we know the implementation, we provide some realistic path (without producing the result we later expect in that location), otherwise we fall back to call the backing task.

lefou · 2025-03-05T12:54:36Z

I don't want to hold this PR any longer with discussions. The current solution as dedicated trait is isolated enough that it can be replaced with some later concept, once we have it.

I think we need to sit another round on the design table and find out, how a nice solution could look like. I think #1779 might be a part of the solution.

Just one last though. Have you considered to explicitly encode the fallback into the task result, e.g.

def instableGeneratedSources: Task[Either[Seq[os.Path], Seq[PathRef]]]

It's appealing as it is much less complex as a concept. Don't know, if it is enough to represent the problem?

lihaoyi · 2025-03-05T12:57:20Z

I think the issue with def instableGeneratedSources: Task[Either[Seq[os.Path], Seq[PathRef]]] would be how does one actually implement it? Presumably you'd do a try-catch, and in the catch case you would go look up the .dest folders of all the generated sources and return them as a list. That's basically identical to what this PR does, except it skips the try-catch part and just returns the .dest folders directly

lefou · 2025-03-05T13:02:53Z

except it skips the try-catch part and just returns the .dest folders directly

Do we make sure, the real implementation is returning the same .dest folder? Do we fail if not?

lihaoyi · 2025-03-05T13:05:00Z

Do we make sure, the real implementation is returning the .dest folder? Do we fail if not?

The PR doesn't do such checks IIRC. It could, but it doesn't

Baccata · 2025-03-05T13:24:08Z

Do we make sure, the real implementation is returning the same .dest folder? Do we fail if not?

Actually that's why I've elected to not require the NamedTasks yield a PathRef. Instead, the interface takes an existential. Initially I wanted to force them to return Unit to prevent the assumption that the returned value would be used in any way, but figured it may not be necessary.

I could change the interface to have it require Seq[NamedTask[PathRef]] which would allow to perform a check, but I think it'd be weird to force the user to essentially return PathRef(T.dest) (most of the time).

lihaoyi · 2025-03-05T13:30:54Z

I'd say for now just tag DeferredGeneratedSourcesModule with @experimental and try it out for now. That'll give us some experience to see how well it works in practice, and whether we want to pursue it further, tweak it, or try something else later

scalalib/src/mill/scalalib/DeferredGeneratedSourcesModule.scala

Baccata · 2025-03-05T14:42:38Z

Module marked @experimental in 3b7b174

lefou

Looks good to me.

lihaoyi · 2025-03-06T08:31:40Z

Thanks @Baccata ! I kicked off a publish run on the 0.12.x branch here https://github.com/com-lihaoyi/mill/actions/runs/13694389375 so once that's done the logs will have the a new development release you can try out. I think we don't need to port it to main immediately, since I wouldn't be surprised if it needed further changes or tweaks before we end up happy with it

lihaoyi · 2025-03-06T11:42:23Z

0.12.8-9-95f1fe @Baccata

Baccata and others added 2 commits February 24, 2025 17:39

WIP

520b668

[autofix.ci] apply automated fixes

b698811

Showcase how deferredSourceGenerators could work

c262ea6

Baccata commented Feb 25, 2025

View reviewed changes

Fix bloop tests

0d46995

Baccata added 3 commits February 26, 2025 14:21

Better names

62058f9

Integrated sourceGenerators in GenIdea

4d445fc

Integrated sourceGenerators in MillBuildServer

b9007ec

Baccata commented Feb 26, 2025

View reviewed changes

Baccata added 3 commits February 26, 2025 15:49

Cleanup

d0f2ab1

Make Mima happy

0057471

Fix tests

3392a08

Baccata changed the title ~~POC : static computation of source roots for code-generating tasks~~ Add a sourceGenerators task to help out IDE config generation Feb 26, 2025

Merge remote-tracking branch 'upstream/0.12.x' into baccata/codegen-t…

97ee7f3

…asks

Baccata marked this pull request as ready for review February 27, 2025 12:33

Merge remote-tracking branch 'upstream/0.12.x' into baccata/codegen-t…

8cb44e2

…asks

lihaoyi reviewed Mar 1, 2025

View reviewed changes

Add an def out to Ctx, make deferred source gens a mixin

2d54b8f

Use _ wildcard instead of *

8eb3a0e

Fix bloop tests

25f04dd

lefou reviewed Mar 5, 2025

View reviewed changes

scalalib/src/mill/scalalib/DeferredGeneratedSourcesModule.scala Outdated Show resolved Hide resolved

Baccata added 2 commits March 5, 2025 15:20

Avoid un-needed PathRef wrapping

9e736fa

Annotate DeferredGeneratedSourcesModule with @experimental

3b7b174

lefou approved these changes Mar 5, 2025

View reviewed changes

lihaoyi merged commit 95f1fe3 into com-lihaoyi:0.12.x Mar 6, 2025
29 of 31 checks passed

lefou added this to the 0.12.9 milestone Mar 6, 2025

Baccata deleted the baccata/codegen-tasks branch March 6, 2025 10:11

		@@ -234,12 +234,14 @@ private class MillBuildServer(
		case module: MillBuildRootModule =>

Add a sourceGenerators task to help out IDE config generation #4621

Add a sourceGenerators task to help out IDE config generation #4621

Conversation

Baccata commented Feb 25, 2025 • edited Loading

Previous description (kept for the record) :

lihaoyi commented Feb 25, 2025

Baccata commented Feb 25, 2025 • edited Loading

Baccata Feb 25, 2025

Choose a reason for hiding this comment

lihaoyi Mar 1, 2025

Choose a reason for hiding this comment

Baccata Feb 25, 2025

Choose a reason for hiding this comment

Baccata Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

Baccata commented Feb 25, 2025

Baccata Feb 26, 2025

Choose a reason for hiding this comment

lefou Feb 27, 2025

Choose a reason for hiding this comment

Baccata commented Feb 26, 2025

lihaoyi commented Feb 27, 2025

Baccata commented Feb 27, 2025

Baccata commented Feb 27, 2025

lihaoyi commented Feb 27, 2025

Baccata commented Feb 27, 2025

Baccata commented Feb 28, 2025

lihaoyi commented Feb 28, 2025

lihaoyi Mar 1, 2025

Choose a reason for hiding this comment

Baccata commented Mar 5, 2025 • edited Loading

lihaoyi commented Mar 5, 2025

Baccata commented Mar 5, 2025 • edited Loading

lihaoyi commented Mar 5, 2025

lefou commented Mar 5, 2025

lihaoyi commented Mar 5, 2025

Baccata commented Mar 5, 2025

Baccata commented Mar 5, 2025

lefou commented Mar 5, 2025

lihaoyi commented Mar 5, 2025 • edited Loading

lefou commented Mar 5, 2025 • edited Loading

lefou commented Mar 5, 2025

lihaoyi commented Mar 5, 2025

lefou commented Mar 5, 2025 • edited Loading

lihaoyi commented Mar 5, 2025

Baccata commented Mar 5, 2025 • edited Loading

lihaoyi commented Mar 5, 2025

Baccata commented Mar 5, 2025

lefou left a comment

Choose a reason for hiding this comment

lihaoyi commented Mar 6, 2025

lihaoyi commented Mar 6, 2025

Add a `sourceGenerators` task to help out IDE config generation #4621

Add a `sourceGenerators` task to help out IDE config generation #4621

Baccata commented Feb 25, 2025 •

edited

Loading

Baccata commented Feb 25, 2025 •

edited

Loading

Baccata Feb 25, 2025 •

edited

Loading

Baccata commented Mar 5, 2025 •

edited

Loading

Baccata commented Mar 5, 2025 •

edited

Loading

lihaoyi commented Mar 5, 2025 •

edited

Loading

lefou commented Mar 5, 2025 •

edited

Loading

lefou commented Mar 5, 2025 •

edited

Loading

Baccata commented Mar 5, 2025 •

edited

Loading