Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[XABT] Scan for JLO needed wrappers in LinkAssembliesNoShrink. #9893

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jpobst
Copy link
Contributor

@jpobst jpobst commented Mar 7, 2025

Today, application builds use Cecil to scan through all Android assemblies at least twice:

  • Debug: _LinkAssembliesNoShrink, _GenerateJavaStubs
  • Release: _ILLink, _GenerateJavaStubs, _RemoveRegisterAttribute

This is a costly operation that we would like to only perform once to help speed up application builds. The long-term goal is to move the (many) steps currently performed by _GenerateJavaStubs into "linker" steps.

This PR moves the Cecil scanning required for the first step: "generating Java Callable Wrappers".

Implementation

This does not move generating JCW Java code to the linker, it only moves scanning for the JLO information needed to generate JCWs to the linker. This information is persisted as an XML file in /obj next to the processed assembly.

Example:

  • obj/Debug/net10.0-android/android/assets/MyApplication.dll
  • obj/Debug/net10.0-android/android/assets/MyApplication.jlo.xml

Doing it this way has two benefits: it helps keep the linker from getting too complex (eg: generating Java code) and it helps incremental builds. If an assembly has not changed since the last build, the JLO information in the .jlo.xml is still valid and will be reused. The assembly does not need to be re-scanned. (For example, the ~70 AndroidX assemblies that the MAUI template uses do not need to be re-scanned on every Debug build.)

The process of actually generating the JCWs is done by a new GenerateJavaCallableWrappers task in the _GenerateJavaStubs target that consumes the .jlo.xml files and outputs the .java files.

Release Builds

In an ideal Release build, we would (probably?) run this new step as part of the ILLink invocation. However there are some tricky limitations to running in the linker pipeline, chiefly being that the linker cannot load external assemblies like Java.Interop.Tool.JavaCallableWrappers.dll (and dependencies) that we need. This can be worked around by including the needed source files in the linker assembly, but this will require additional work.

Instead, for now, we will add an invocation a new _LinkAssembliesAdditionalSteps target to linked builds that runs after ILLink. This calls the existing LinkAssembliesNoShrink task to scan for JLOs but does not execute the other linker tasks that have already been run. This allows us to move forward with converting to "linker steps" for making Debug builds faster without the extra work required to move these steps into ILLink. A future enhancement can perform this work if desired. (There is also some question as to whether we actually want to create new ILLink linker steps.)

Note this temporarily leaves the $(_AndroidJLOCheckedBuild) flag that compares the generated Java files to the previous ones. This will likely be useful ensuring future moved steps are correct as well.

@jpobst jpobst force-pushed the dev/jpobst/generate-jcws branch 3 times, most recently from b2d097d to bb13f01 Compare March 8, 2025 01:57
Comment on lines +1501 to +1502
<LinkAssembliesNoShrink
AlreadyLinked="true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to make a new task:

class LinkAssembliesNoShrink : FindJavaObjectsTask // Or some better name

Just to reduce confusion around <LinkAssembliesNoShrink/> and AlreadyLinked=true/false.

The base type would write the XML files, and LinkAssembliesNoShrink would do the trimmer steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree that this can be confusing. This seems like a good solution to that confusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one consideration is whether we think running these additional steps after ILLink will be temporary or not. If we believe the final goal is to implement them as proper linker steps that ILLink runs then this confusion goes away and we would want to keep them all in <LinkAssembliesNoShrink>.

Comment on lines -1469 to +1470
Outputs="@(ResolvedAssemblies->'$(MonoAndroidIntermediateAssemblyDir)%(DestinationSubPath)')">
Outputs="@(ResolvedAssemblies->'$(MonoAndroidIntermediateAssemblyDir)%(DestinationSubPath)');@(ResolvedAssemblies->'$(MonoAndroidIntermediateAssemblyDir)%(DestinationSubDirectory)%(Filename).jlo.xml')">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need the two sets of output files here? It seems like the files copied to $(MonoAndroidIntermediateAssemblyDir)%(DestinationSubPath) would always change when the *.jlo.xml file also changes?

This would make MSBuild check a list of extra filestamps, and maybe it's not needed?

Copy link
Contributor Author

@jpobst jpobst Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on this one. It will absolutely work without the additional check for the normal expected scenarios. It would fall apart in the "probably would never happen" scenario if someone were to delete the .jlo.xml file without deleting the .dll file.

If we are ok with this then we can save a few milliseconds by not adding the additional outputs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing I don't know is if the target will "build partially" if you have N inputs and N * 2 outputs.

I've only ever done N inputs and N + 1 outputs (+1 is build.props).

Copy link
Contributor Author

@jpobst jpobst Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure either, but I have tested this with incremental builds and they do still build partially. 😄

If nothing else, it's good to know this is a valid pattern that can be used in the future if needed.

Comment on lines +143 to +148
// Assemblies are modified in-place, which updates the modified time. We need
// to touch the .jlo.xml file so that we will not re-run the JLO scanner on
// the next incremental build.
// TODO: This task should probably create copies of the assemblies instead of modifying them in place.
var jlo_xml_file = Path.ChangeExtension (path, ".jlo.xml");
File.SetLastWriteTimeUtc (jlo_xml_file, DateTime.UtcNow);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this actually happen in the place these files are written?

Above could do something like:

if (!Files.CopyIfStreamChanged (sw.BaseStream, destinationJLOXml)) {
        // If the file was written, the lastwritetime is updated. Touch the file otherwise.
        File.SetLastWriteTimeUtc (destinationJLOXml, DateTime.UtcNow);
}

Copy link
Contributor Author

@jpobst jpobst Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that MarshalMethodsAssemblyRewriter is modifying the .dll in-place much later in the build process (_GenerateJavaStubs) than the .jlo.xml is being written (_LinkAssembliesNoShrink). This causes it to always be rescanned on the next incremental build of _LinkAssembliesNoShrink.

There are 2 other possible fixes for this:

  • If we remove the .jlo.xml files as Outputs for _LinkAssembliesNoShrink then this will not be needed.
  • When MarshalMethodsAssemblyRewriter gets moved to _LinkAssembliesNoShrink it can probably be ordered so that it isn't modifying the assembly after it is being scanned for JLO.

(This does rely on the assumption that MarshalMethodsAssemblyRewriter is not changing the assembly in a way that changes the JLO information we need out of it, which is currently a correct assumption.)

Comment on lines +47 to +48
var wrappers = ConvertToCallableWrappers (types);
XmlExporter.Export (destinationJLOXml, wrappers, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use:

using (var sw = MemoryStreamPool.Shared.CreateStreamWriter ())
       XmlExporter.Export (sw, wrappers, true);
       Files.CopyIfStreamChanged (sw.BaseStream, destinationJLOXml);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this "optimization".

CopyIfStreamChanged adds the following overhead:

  • Read existing file from disk
  • Hash existing file
  • Hash stream contents

At best, it eliminates writing the file to disk. While I admit I haven't tested it, I suspect the reading + hashing (+ maybe writing) is more expensive than simply always writing the file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used for writing *.java files, as we were trying to prevent javac and d8/r8 from running again. This made it where assemblies could change, *.java files kept same timestamps.

It may not be needed here if some other slow target isn't using the .xml files as inputs.

@@ -1535,10 +1571,21 @@ because xbuild doesn't support framework reference assemblies.
<_MergedManifestDocuments Condition=" '$(AndroidManifestMerger)' == 'legacy' " Include="@(ExtractedManifestDocuments)" />
</ItemGroup>

<GenerateJavaCallableWrappers
CodeGenerationTarget="$(_AndroidJcwCodegenTarget)"
OutputDirectory="$(IntermediateOutputPath)android\src"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many concerns around incremental build performance. Three of which are:

  1. Reducing time spent parsing the .dll files.
  2. Reducing time spent generating the .java source
  3. Reducing time spent compiling the .java source into .class files. (javac is slow.)
  4. "Synchronization"

This PR addresses (1) and maybe (2) by making assembly parsing incremental: we only need to process assemblies which have changed.

Should we consider expanding upon (2), (3) and (4)?

The "Synchronization" question is rephrased as "how do we ensure that removed types are deleted"? For example, if the MyApp.dll assembly has:

class Demo : Java.Lang.Object {
}

then MyApp.jlo.xml will mention Demo, and $(IntermediateOutputPath)android\src will contain crc64…/Demo.java, and $(IntermediateOutputPath)android\bin\classes will contain crc64…/Demo.class.

We then update MyApp.dll to remove the Demo type. What happens? What could or should happen?

If we don't remove src/crc64…/Demo.java and classes/crc64…/Demo.class, then we have an unsynchronized state of the world. (I think we've had previous bugs due to this; perhaps @dellis1972 remembers?)

If we delete src and classes, that means we need to regenerate .java files for everything, and recompile for everything. We maintain synchronization, but at a cost. (At least we saved on assembly parsing!)

An idea I've been kicking around (mentally and with Dean) was to have per-assembly directories. Instead of $(IntermediateOutputPath)android\src, have $(IntermediateOutputPath)android\jcw\$(AssemblyName)\src and $(IntermediateOutputPath)android\jcw\$(AssemblyName)\classes. When the assembly changes, we delete just the …\jcw\$(AssemblyName) directory, and no others. This should reduce the number of .java files we need to generate and javac invocations on incremental builds.

I don't know if this is actually viable; javac -d OUTDIR can only have one directory, so a per-assembly classes directory means we'd have N javac invocations (one per assembly) instead of 1, and then there's the question of ordering (A.dll may depend on B.dll, creating an ordering constraint on javac).

Even if per-assembly classes directories aren't viable, per-assembly src directories should be viable: delete just the per-assembly src directory and regenerate it based on the .jlo.xml file.

Which brings us to <GenerateJavaCallableWrappers/>: is there a way to make GenerateJavaCallableWrappers.OutputDirectory a per-assembly value?

Copy link
Contributor Author

@jpobst jpobst Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that this PR only attempts to tackle (1) and that this is just a piece of the build performance puzzle. However tackling (1) is a very large effort as it also requires updating the other "_GenerateJavaStubs" pieces (marshal method rewriting, type map generation, acw map generation, android manifest merging) before we can actually get the win by removing the additional Cecil scan. As such, I think we should focus on that first.


Having said that and looking ahead...

Doing (2) incrementally should be ~easy, but also probably not a huge win. This step is now very cheap (<50 ms on Android template).

Doing (3) better, if possible, would be a good win. However we haven't examined it enough to know what is viable here. Even better if work here can also make D8 incremental as it is even slower than javac. (dotnet new android template FastDev build: _CompileJava: 1.83s, _CompileToDalvik: 4.07s)

(4) is just something to keep in mind for any future attempts to optimize (2) and (3). We need to ensure that incremental builds remove .java files that should no longer be generated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would check a dotnet new maui project to see the real cost of _CompileJava or _CompileToDalvik.

Once you bring their dependency tree of AndroidX libraries they get a lot slower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would check a dotnet new maui project to see the real cost of _CompileJava or _CompileToDalvik.

Yep, 16.72s and 22.94s respectively, on dotnet new maui.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants