Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a sourceGenerators task to help out IDE config generation #4621

Merged
merged 19 commits into from
Mar 6, 2025

Conversation

Baccata
Copy link
Contributor

@Baccata Baccata commented Feb 25, 2025

Context : #4530

  • Adds a DeferredGeneratedSourcesModule that extends JavaModule, providing a def deferredGeneratedSourceTasks alternative to generatedSources. This allows BSP/GenIdea/Bloop to predict source roots without actually running the code generation logic.
  • Adds a def out to the Ctx class, allowing to query the location out folder within tasks.
  • Adds unit tests / integration tests where relevant

Previous description (kept for the record) :

@lihaoyi, I want to open this PR to continue the discussion. At the time of writing this, this adds a generatedSourceRoots parameter for tasks, as you've described, and uses it to compute the source directories for the bloop configuration, as opposed to actually running the allSources task. A similar change would need to be achieved in GenIdea and BSP.

So now, the problem I see is as follows : in order for this to be useful, it needs to be applied to all source generators provided OOTB by the mill codebase, but also third party providers of source-generator (like smithy4s), as the ability to jump to the sources is paramount for the user experience in IDEs.

Applying this change is a fair bit of work in this codebase, and a big ask for the larger ecosystem, and although I think SBT's approach of treating all source generators lazily is preferable to mill's current approach of running all source generators eagerly, I don't think I can weigh the benefits of changing the approach against the cost induced by breaking IDE behaviour for existing plugins.

With that in mind here's what I think would be a good mitigation :

// Rather than traversing the transitive inputs of `allSources`, we define this setting as a way to register lazy 
// source generators. We force the contract to return `Unit` to prevent the assumption that a returned `PathRef` would be
// added to the sources. 
def lazySourceGenerators = Seq[T[Unit]] 
 
 // Runs the lazy source generators, returning path refs computed from their `generatedSourceRoots`. 
 // When the tasks don't have `generatedSourceRoots` set, we assume that the `T.dest` is to be added as source root. 
 final def runLazySourceGenerators : T[Seq[PathRef]] = ???
 
// Does not run the lazy source generators, but rather computes the source roots they'll generate code in. 
// Wrapped in T because the evaluator is needed to get the T.dest of the lazy source generators. I presume that is 
// something we could thread in through the ctx (like `def destOf(task: T[_]) : Path`)  
final def lazyGeneratedSourceRoots: T[Seq[Path]] = ???
 
 // Keeps the current behaviour intact 
def allSources = T {
    sources() ++ generatedSources() ++ runLazySourceGenerators()
} 

The Bloop, GenIdea and BSP then get amended to call sources(), generatedSources() and lazyGeneratedSourceRoots(), but avoids runLazySourceGenerators.

This approach would allow to alleviate the pain point raised in the discussion above, whilst retaining the current behaviour of the ecosystem. Mill plugin providers could be encouraged to use runLazySourceGenerators instead of generatedSources, but users would have a recourse to turn eager source generators into lazy ones if they need it to be (provided a withGeneratedSourceRoots is provided to them, that is)

@lihaoyi
Copy link
Member

lihaoyi commented Feb 25, 2025

@Baccata one new development worth mentioning is that #4524 targeting 0.13.x, we've moved the computation of JavaModule source paths in general to static def sourcesFolders: Seq[os.SubPath] members computer before evaluation. This was done for reasons unrelated to codegen, but perhaps now that we're doing it a similar approach could be taken for generated sources?

Following the naming convention would just define a def JavaModule#generatedSourcesFolders: T[Seq[os.SubPath]] and use that for IDE codegen, which I think is similar to what you are proposing here, just to a slightly different name

@Baccata
Copy link
Contributor Author

Baccata commented Feb 25, 2025

@lihaoyi, with the difference that the subPaths need to be tied to the tasks, as they end up being relative to their T.dest (as opposed to the module's path), hence why my initial proposal in the discussion was to have (roughly) :

case class SourceGenerator(task: T[Unit], sourceRoots: Seq[os.SubPath]) 

trait JavaModule {
  def sourceGenerators : Seq[SourceGenerator]
}

But, putting aside the details, I agree that source generators are better captured statically like the sourceFolders (and like SBT does it). The question remains regarding "what do you want to do with the current generatedSources task ?" : do you want to keep it around to avoid changing everything in mill and its ecosystem and introduce a static alternative (def sourceGenerators as above), or do you want to deprecate it completely in favour of the static version ?

val dest = mill
.eval
.EvaluatorPaths
.resolveDestPaths(workspace / "out", task)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know whether there's a better way to do it, to be honest

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is fine. A bit verbose but that's the best we got

/**
* The folders containing all source files fed into the compiler
*/
def allSources: T[Seq[PathRef]] = Task { sources() ++ generatedSources() }
def allSources: T[Seq[PathRef]] =
Task { sources() ++ generatedSources() ++ deferredGeneratedSources() }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeps the current (0.12.x) behaviour of generatedSources to avoid breaking the ecosystem

@@ -133,14 +137,19 @@ class BloopImpl(evs: () => Seq[Evaluator], wd: os.Path) extends ExternalModule {
!module.asInstanceOf[outer.Module].skipBloop
else true

private def deferredAllSources(m: JavaModule): T[Seq[PathRef]] = Task {
m.sources() ++ m.generatedSources() ++ m.allDeferredSourceRoots()
Copy link
Contributor Author

@Baccata Baccata Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeps the current (0.12.x) behaviour of generatedSources being run to produce IDE config, in order to avoid breaking the ecosystem. In addition, adds roots that will be produced by deferredSourceGenerators when allSources/compile is run.

@Baccata
Copy link
Contributor Author

Baccata commented Feb 25, 2025

See later changes for a showcase of what I meant in #4621 (comment) ... the names are horribly bad but will change them later

@@ -234,12 +234,14 @@ private class MillBuildServer(
case module: MillBuildRootModule =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this warrants a specific case, considering it extends JavaModule anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be some left over from a time, where MillBuildRootModule was either not a JavaModule or needed different treatment. If both cases are identical now, we can merge them.

@Baccata
Copy link
Contributor Author

Baccata commented Feb 26, 2025

@lihaoyi, @lefou, I think this PR is just about ready, but I've got a failing integration test on my fork that I have no idea how to troubleshoot.

Any chance you could provide guidance ?

@Baccata Baccata changed the title POC : static computation of source roots for code-generating tasks Add a sourceGenerators task to help out IDE config generation Feb 26, 2025
@lihaoyi
Copy link
Member

lihaoyi commented Feb 27, 2025

@Baccata can you try reproducing the failure locally? You can run ci/test-mill-bootstrap.sh

@Baccata
Copy link
Contributor Author

Baccata commented Feb 27, 2025

Yes, it reproduces locally. Still extremely puzzled as to what's wrong though.

@Baccata
Copy link
Contributor Author

Baccata commented Feb 27, 2025

I'm also able to reproduce on the 0.12.x branch, so I presume it's nothing to do with this particular changeset.

@lihaoyi
Copy link
Member

lihaoyi commented Feb 27, 2025

Oh I just realized this was targeting 0.12.x. Maybe its fixed by #4625 ?

@Baccata Baccata marked this pull request as ready for review February 27, 2025 12:33
@Baccata
Copy link
Contributor Author

Baccata commented Feb 27, 2025

Okay I think this is ready. There's some android-specific job that keeps failing, but it seems to be failing on other PRs raised against 0.12.x, so I presume it's unrelated.

@Baccata
Copy link
Contributor Author

Baccata commented Feb 28, 2025

Should I close this PR and re-open it anew to remove the noise of the previous conversations ?

@lihaoyi
Copy link
Member

lihaoyi commented Feb 28, 2025

No leaving the PR as is is fine, I'll take a look when I have time

* Computes the list of source roots that will be produced by [[sourceGenerators]] without
* actually running the generators in question.
*/
def predictedGeneratedSourceRoots: T[Seq[PathRef]] = T {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The terminology is kind of weird here, as are the types. How about

def deferredGeneratedSourcePaths: T[Seq[os.Path]]

I don't like predicted, since it sounds like it could refer to all generated sources, which it doesn't. The earlier term deferred is a bit better since it sounds more like it is a separate thing from the other generated sources.

If it's just returning the paths, we can just return os.Paths rather than PathRefs

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

What do you think of moving all changes currently done in JavaModule into a dedicated trait DeferredGeneratedSources and extends in from JavaModule? This will keep them closely together. It will also keep JavaModule more lean and free of that stuff. We could even avoid default-extending it and just mix it in in case it's needed.

Well my personal opinion is really that the current generatedSources should be deprecated in favour of the deferred version, for a similar argument that lead me to start the discussion : I think any kind of IDE integration should be as resilient as possible. I believe that if I have errors in a protobuf file that I use to generate Java or Scala code, I should still be able to load my project into the IDE and be able to change code that's not necessarily related to the protobuf pipeline. I therefore believe that the current generatedSources task, although intuitive, was ill-designed from the get-go, and that the use of the deferred methods should be encouraged instead within the mill ecosystem.

Therefore, I would rather the JavaModule had them than for them to be separate in a mix-in that IDE-config code needs to be aware of. Now I'd personally be fine having them in a separate trait that the JavaModule would inherit from, as conceptually they don't have to be tied to Java/Scala and other integrations could benefit from them.

Now that being said, that's just a personal opinion. If yourself and Haoyi make an executive decision to make this a mix-in, I'll abide and will amend the contribution in consequence.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

For now lets go with a separate trait. I think it's still a bit early to deprecate the existing workflow and tell people to use the new one. We probably want someone using it for a while and happy with it before we deprecate the old way, which would probably put it some time after the 0.13.0 release

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

@lihaoyi : just to be clear : you want a separate trait that JavaModule inherits from, or a separate trait that would be mixed-in in userland ?

EDIT : and yeah, I agree that it's premature to deprecate at this point

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

I don't have a strong opinion between those two options. @lefou do you have a preference?

@lefou
Copy link
Member

lefou commented Mar 5, 2025

TBH, I find the new proposed solution (although being a solution to some immediate issue) ugly and complicated. (I also don't think the current generatedSources is broken in any way. If you have generated sources that aren't required to be properly compiled, those are probably not sources but just resources. I admit, the IDE integration currently doesn't handle failures well.)

Therefore I tend to not force it on all users of JavaModule and make it completely opt-in. So, let user-land mix-in DeferredGeneratedSources if they really need to.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

mixin sounds good to me

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

TBH, I find the new proposed solution (although being a solution to some immediate issue) ugly and complicated

I agree I don't like the naming, but don't really want to bikeshed on it, as long as the maintainers are happy with it.

If you have generated sources that aren't required to be properly compiled, those are probably not sources but just resources

I think that's a mis-interpretation of the problem. To simplify it as intuitively as possible :

  1. Should I be able to load a project in my IDE if the Scala code it contains doesn't compile ? => everyone agrees on yes
  2. Should I be able to load a project in my IDE if the protobuf code it depends on to generate Scala code is invalid => ???

Saying that you don't think the current generatedSources is broken in any way implies that your answer to 2) is "no", whereas I think "yes".

I tend to not force it on all users of JavaModule and make it completely opt-in.

Happy to abide 👌

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

Extracted as another trait in 2d54b8f

@lefou
Copy link
Member

lefou commented Mar 5, 2025

  1. Should I be able to load a project in my IDE if the protobuf code it depends on to generate Scala code is invalid => ???

Saying that you don't think the current generatedSources is broken in any way implies that your answer to 2) is "no", whereas I think "yes".

My answer is of course "yes". I do think our IDE integration should not fail, if some source generators don't work. We want see some warnings, of course. Also, compilation most likely won't work, even if the IDE bypasses the build tool to compile things, but that not the question here.

I think we need to solve the issue in the IDE integration, not in the task signature. If IDE integration just deals with a failed source generator and continues with either a stub-path or without the path of the generated sources at all (depends on how we fail or how we are able to recover). Once we repaired the source generator, we re-import and the missing generated sources should appear in the IDE. I don't see where in that picture a different task signature for generatedSources is needed. Once we give away an incomplete result (here a target path without the files that are supposed to be on that path), we give up on our principle of a immutable, cachable and reproducible results.

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

I think we need to solve the issue in the IDE integration, not in the task signature. If IDE integration just deals with a failed source generator and continues with either a stub-path or without the path of the generated sources at all (depends on how we fail or how we are able to recover). Once we repaired the source generator, we re-import and the missing generated sources should appear in the IDE. I don't see where in that picture a different task signature for generatedSources is needed. Once we give away an incomplete result (here a target path without the files that are supposed to be on that path), we give up on our principle of a immutable, cachable and reproducible results.

The basic problem this PR is trying to solve is that currently, as I understand it, is that Mill does not provide the path to generated sources if generation fails: you only get a PathRef on success and nothing on failure. So there's no way to continue with a stub path, because no such path is available. def generatedSources gives you a Seq[PathRef] that works great on success, but gives nothing at all on failure, so there's nothing for the IDE to work with

This PR doesn't change the idea of immutable/cacheable/reproducible results, all it does is let you list out which tasks will contain generated sources, and at what paths, without needing to actually run them. That is what allows the IDE to have paths it can use even if the tasks themselves fail

@lefou
Copy link
Member

lefou commented Mar 5, 2025

Might be there is an unspoken assumption: We don't want to re-import/refresh the project in the IDE after we fixed source generation? Which means, all source paths need to stay stable. This is equivalent with what we modelled in bspCompileClassesPath or bspCompileClasspath which both return some UnresolvedPaths. These are typically implemented as follows: In case we know the implementation, we provide some realistic path (without producing the result we later expect in that location), otherwise we fall back to call the backing task.

@lefou
Copy link
Member

lefou commented Mar 5, 2025

I don't want to hold this PR any longer with discussions. The current solution as dedicated trait is isolated enough that it can be replaced with some later concept, once we have it.

I think we need to sit another round on the design table and find out, how a nice solution could look like. I think #1779 might be a part of the solution.

Just one last though. Have you considered to explicitly encode the fallback into the task result, e.g.

def instableGeneratedSources: Task[Either[Seq[os.Path], Seq[PathRef]]]

It's appealing as it is much less complex as a concept. Don't know, if it is enough to represent the problem?

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

I think the issue with def instableGeneratedSources: Task[Either[Seq[os.Path], Seq[PathRef]]] would be how does one actually implement it? Presumably you'd do a try-catch, and in the catch case you would go look up the .dest folders of all the generated sources and return them as a list. That's basically identical to what this PR does, except it skips the try-catch part and just returns the .dest folders directly

@lefou
Copy link
Member

lefou commented Mar 5, 2025

except it skips the try-catch part and just returns the .dest folders directly

Do we make sure, the real implementation is returning the same .dest folder? Do we fail if not?

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

Do we make sure, the real implementation is returning the .dest folder? Do we fail if not?

The PR doesn't do such checks IIRC. It could, but it doesn't

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

Do we make sure, the real implementation is returning the same .dest folder? Do we fail if not?

Actually that's why I've elected to not require the NamedTasks yield a PathRef. Instead, the interface takes an existential. Initially I wanted to force them to return Unit to prevent the assumption that the returned value would be used in any way, but figured it may not be necessary.

I could change the interface to have it require Seq[NamedTask[PathRef]] which would allow to perform a check, but I think it'd be weird to force the user to essentially return PathRef(T.dest) (most of the time).

@lihaoyi
Copy link
Member

lihaoyi commented Mar 5, 2025

I'd say for now just tag DeferredGeneratedSourcesModule with @experimental and try it out for now. That'll give us some experience to see how well it works in practice, and whether we want to pursue it further, tweak it, or try something else later

@Baccata
Copy link
Contributor Author

Baccata commented Mar 5, 2025

Module marked @experimental in 3b7b174

Copy link
Member

@lefou lefou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@lihaoyi lihaoyi merged commit 95f1fe3 into com-lihaoyi:0.12.x Mar 6, 2025
29 of 31 checks passed
@lihaoyi
Copy link
Member

lihaoyi commented Mar 6, 2025

Thanks @Baccata ! I kicked off a publish run on the 0.12.x branch here https://github.com/com-lihaoyi/mill/actions/runs/13694389375 so once that's done the logs will have the a new development release you can try out. I think we don't need to port it to main immediately, since I wouldn't be surprised if it needed further changes or tweaks before we end up happy with it

@lefou lefou added this to the 0.12.9 milestone Mar 6, 2025
@Baccata Baccata deleted the baccata/codegen-tasks branch March 6, 2025 10:11
@lihaoyi
Copy link
Member

lihaoyi commented Mar 6, 2025

0.12.8-9-95f1fe @Baccata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants