Clean build: SourceFile(s) read twice #2691

jukzi · 2024-07-08T13:33:03Z

During Project/"Clean all projects" SourceFile(s) are read multiple times.
Once from ReadManager - exepected.
And later another time from Parser.getMethodBodies() - without further digging i guess this is wrong. It is prepared to use a ReadManager there, but readManager=null.

Thread [Compiler Processing Task] (Suspended (breakpoint at line 11694 in org.eclipse.jdt.internal.compiler.parser.Parser))	
	org.eclipse.jdt.internal.compiler.parser.Parser.getMethodBodies(org.eclipse.jdt.internal.compiler.ast.CompilationUnitDeclaration) line: 11694	
	org.eclipse.jdt.internal.compiler.Compiler.process(org.eclipse.jdt.internal.compiler.ast.CompilationUnitDeclaration, int) line: 890	
	org.eclipse.jdt.internal.compiler.ProcessTaskManager.compile() line: 163

The text was updated successfully, but these errors were encountered:

jukzi · 2024-07-09T06:14:35Z

Both reads happen during org.eclipse.jdt.internal.compiler.Compiler.compile(ICompilationUnit[], boolean)
The first read happens during org.eclipse.jdt.internal.compiler.Compiler.beginToCompile(org.eclipse.jdt.internal.compiler.env.ICompilationUnit[]) line: 393
and the second read is triggered from
org.eclipse.jdt.internal.compiler.Compiler.processCompiledUnits(int, boolean) line: 606

reproducible with org.eclipse.jdt.core.tests.builder.AnnotationDependencyTests.testTypeAnnotationDependency()

jukzi · 2024-07-09T07:44:46Z

In Compiler.internalBeginToCompile() a decision is made if a full parse is done or only a dietParse. By default it is only a dietParse because parseThreshold<=0. That diet parse skips parsing the method bodies. To get them later a second parse has to occur. That 2nd parse is done in another Thread ("Compiler Processing Task"). Nevertheless the main compilation has to wait on that result anyway.
I measured an experiment with diet parse vs full parse.

i.e. directly using a full parse uses more cpu time but is overall 10% faster because it removes the extra read and a second parse.
So i do not see a good reason for a first diet parse.
@iloveeclipse WDYT, any idea why there is first a diet parse only?
also code like
this.parseThreshold = 0; // will request a full parse
seems to be wrong. any parseThreshold <= 0 may do the opposite: it does a dietParse if totalUnits>0

iloveeclipse · 2024-07-09T08:20:24Z

WDYT, any idea why there is first a diet parse only?

See https://github.com/eclipse-jdt/eclipse.jdt.core/wiki/ECJ-Parse

I'm not a compiler expert (but @srikanth-sankaran is), but I assume diet parse looks for obvious errors first (as it skips method bodies), e.g. if class definition is already wrong, no reason to parse methods and do more work if basics are broken anyway.

srikanth-sankaran · 2024-07-09T10:57:13Z

For a number of IDE operations it may suffice to work with empty method bodies altogether or drill down into just one method alone (where the cursor is).

Think of providing an outline view, generating a Java model, code completion, jump to declaration F3, search for overrides, hierarchy computations etc.

So a dietParse need not always be followed by parse of all method bodies.

In the batch compiler, it is certainly the case that we will populate all method bodies. But even here, the design breakdown into various phases may aesthetically fit better with the model of dietParse followed by parse of method bodies.

This is just one design. Javac doesn't have an equivalent mode.

Do you have numbers for a large project's full build ?

Any change here would also have to take into account memory utilization.

We have the capability and deploy that to null out method bodies once we don't require them too.

jukzi · 2024-07-09T11:30:21Z

For a number of IDE operations it may suffice to work with empty method bodies altogether or drill down into just one method alone (where the cursor is).

All such operations where only a single file is parsed complete in a blink of an eye anyway. A performance difference can only perceived during parsing many files. Like build, search or type hierarchy.

Do you have numbers for a large project's full build ?

see screenshots in #2691 (comment)
Building platform workspace takes 65sec vs 73sec in Compiler.compile

srikanth-sankaran · 2024-07-09T12:47:19Z

Well by all means propose a patch and let us see all it takes to make it green with the full test suite and take a call.

Short answer is there is nothing sacrosanct about the split into diet parse plus populating that diet AST with parsed method bodies. As I said javac doesn't follow such an approach.

It may be significant work though given this is a long standing design choice that is pervasive in implementation.

But before you invest much effort let us ask @stephan-herrmann for his opinion on this too to make sure I haven't overlooked some key observation here.

Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691

stephan-herrmann · 2024-07-09T14:47:52Z

As this challenges one of the foundations of JDT's design, any change must be done with utmost caution.

Some comments:

All such operations where only a single file is parsed complete in a blink of an eye anyway.

Consider the example of code completion: indeed parsing may be super fast, but when later the AST is resolved not having to resolve uninteresting methods may very well make a significant difference. Type inference is a candidate that can take several seconds per file.

MatchLocator takes the opposite approach, its method purgeMethodStatements() implements what @srikanth-sankaran mentioned above: remove statements after a full parse. But this method cannot directly be re-used for completion, as it uses a search-specific approach.

I think it would be very hard to demonstrate that a change of strategy would be beneficial for all uses of the compiler.

I could imagine, though, an approach that specifically tells the compiler when it is called by a builder or as batch compiler that it should do a full parse right away (and make sure it doesn't still try a second parse for method bodies :) ).

stephan-herrmann · 2024-07-09T14:52:05Z

[...] removes the extra read [...]

Do we know how much of the overhead relates to parsing vs. re-reading? Some implementations of ICompilationUnit cache the file content, so that's another way to avoid the extra read.

jukzi · 2024-07-09T15:14:44Z

Do we know how much of the overhead relates to parsing vs. re-reading? Some implementations of ICompilationUnit cache the file content, so that's another way to avoid the extra read.

SourceFile does intentional not cache the content to save memory and the second read does not use the ReadManager which results in ~ 6 seconds read overhead in my example:

stephan-herrmann · 2024-07-09T20:03:43Z

SourceFile does intentional not cache the content

@srikanth-sankaran if the CU doesn't cache, do you see value in caching the source in CompilationUnitDeclaration instead? At least this could mean we no longer need Parser.stashTextualRepresentation(FunctionalExpression) :)

Of course such cached source can be freed during compilation (if no functional expressions: after parseMethodBodies(), otherwise after resolve).

srikanth-sankaran · 2024-07-10T04:57:05Z

All such operations where only a single file is parsed complete in a blink of an eye anyway. A performance difference can only perceived during parsing many files. Like build, search or type hierarchy.

I am all for experimentation - but I echo @stephan-herrmann's concerns about As this challenges one of the foundations of JDT's design, any change must be done with utmost caution. and I think it would be very hard to demonstrate that a change of strategy would be beneficial for all uses of the compiler.

In fact I would rephrase his last sentence and say I think it would be very hard to demonstrate that a change of strategy would NOT BE HARMFUL for ANY uses of the compiler.

So let us by all means explore but Festina lente! Let us make haste slowly!

I could imagine, though, an approach that specifically tells the compiler when it is called by a builder or as batch compiler that it should do a full parse right away (and make sure it doesn't still try a second parse for method bodies :) ).

As per my earlier comment, let us not focus exclusively on time to the exclusion of space. Case in point, while we are in this call stack:

LookupEnvironment.completeTypeBindings() line: 558	
Compiler.internalBeginToCompile(ICompilationUnit[], int) line: 880	
Compiler.beginToCompile(ICompilationUnit[]) line: 393	
Compiler.compile(ICompilationUnit[], boolean) line: 443	
Compiler.compile(ICompilationUnit[]) line: 425	
BatchImageBuilder(AbstractImageBuilder).compile(SourceFile[], SourceFile[], boolean) line: 414	
BatchImageBuilder.compile(SourceFile[], SourceFile[], boolean) line: 211	
BatchImageBuilder(AbstractImageBuilder).compile(SourceFile[]) line: 345	
BatchImageBuilder.build() line: 79	
JavaBuilder.buildAll() line: 286	
JavaBuilder.build(int, Map, IProgressMonitor) line: 192

we have ALL the source files parsed in diet mode and that is perfectly good enough to carry out the operation of "completing type bindings" which has these stages:

		CHECK_AND_SET_IMPORTS,
		CONNECT_TYPE_HIERARCHY,
		SEAL_TYPE_HIERARCHY,
		BUILD_FIELDS_AND_METHODS,
		INTEGRATE_ANNOTATIONS_IN_HIERARCHY,
		CHECK_PARAMETERIZED_TYPES

Now I worry what is the implication of a full parse at this point for memory utilization - we will be holding complete parse trees of all compilation units which is not necessary for the operation at hand.

While we may observe a 6 seconds savings for platforms workspace, are there projects out there where the memory pressure could zoom out of control resulting in thrashing ??

srikanth-sankaran · 2024-07-10T04:59:49Z

As this challenges one of the foundations of JDT's design, any change must be done with utmost caution.

Some comments:

All such operations where only a single file is parsed complete in a blink of an eye anyway.

Consider the example of code completion: indeed parsing may be super fast, but when later the AST is resolved not having to resolve uninteresting methods may very well make a significant difference. Type inference is a candidate that can take several seconds per file.

Correct, in a large file if code assist starts resolving fuller parse trees, we will start seeing sluggish interactive performance.
Also second the point about inference in some corner and not so corner cases taking significant times - this can cause slowdown in interactive performance at a distant point in source

MatchLocator takes the opposite approach, its method purgeMethodStatements() implements what @srikanth-sankaran mentioned above: remove statements after a full parse. But this method cannot directly be re-used for completion, as it uses a search-specific approach.

I think it would be very hard to demonstrate that a change of strategy would be beneficial for all uses of the compiler.

I could imagine, though, an approach that specifically tells the compiler when it is called by a builder or as batch compiler that it should do a full parse right away (and make sure it doesn't still try a second parse for method bodies :) ).

Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691

srikanth-sankaran · 2024-07-10T09:41:58Z

SourceFile does intentional not cache the content

@srikanth-sankaran if the CU doesn't cache, do you see value in caching the source in CompilationUnitDeclaration instead? At least this could mean we no longer need Parser.stashTextualRepresentation(FunctionalExpression) :)

I can connect the dots from across different tickets and make out you don't like this design 😆

To answer, may be.

jukzi · 2024-07-10T11:35:00Z

Now I worry what is the implication of a full parse at this point for memory utilization - we will be holding complete parse trees of all compilation units which is not necessary for the operation at hand.

Thats an argument. So maybe the initial idea to have 2 phases of parse is to have less memory. I took a measurent: Building platform workspace i put a breakpoint while compiling the bigest project. That is jd.ui with 1759 sourceUnits. The breakpoint was at the completeTypeBindings() call. I forced a GC and took heapdump. i did it once with a dietparse and another time with full parse.
With Full parse the heapdump was ~481 MB
With diet parse the heapdump was ~415 MB

The difference is a total of ~ 66 MB (+15%) distributed like this:

An example GC root of the most added memory is:

At this point i don't make any conclusion yet but just want to share the result.

stephan-herrmann · 2024-07-10T13:51:25Z

@srikanth-sankaran if the CU doesn't cache, do you see value in caching the source in CompilationUnitDeclaration instead? At least this could mean we no longer need Parser.stashTextualRepresentation(FunctionalExpression) :)

I can connect the dots from across different tickets and make out you don't like this design 😆

There's nothing "wrong" with your design, only a funny connection between a trick in the parser and a rabbit-out-of-the-hat in LE.copy() :) - my comment was only meant to suggest that I prefer one design that cleanly covers several issues, rather than several strategies. OTOH, memory considerations already speak against my suggestion.

stephan-herrmann · 2024-07-10T13:55:35Z

The difference is a total of ~ 66 MB (+15%) distributed like this:

Compare that to a 11% speedup, this sound like a consideration we would perhaps want to pass on to users to decide? OTOH, build time has no hard limit, memory may have ...

srikanth-sankaran · 2024-07-11T04:24:51Z

The difference is a total of ~ 66 MB (+15%) distributed like this:

Compare that to a 11% speedup, this sound like a consideration we would perhaps want to pass on to users to decide? OTOH, build time has no hard limit, memory may have ...

I am not aware of serious complaints about build times. It is great to proactively track and improve, of course. That said, if our performance improvement resource budget is limited - which I suspect it is, I would imagine there ought to be other things we can pursue that don't have equivalent downside risk. Not to be seen as discouraging further exploration here.

jukzi · 2024-07-11T08:43:36Z

I am not aware of serious complaints about build times

Nothing to complain about, but if we could make it faster my colleagues who are waiting 15min each day for their builds to complete could use their time better then for waiting. So if you have any idea how to get it faster please tell. To my measurements there is no hotspot except the file accesses on windows.

I think we could also avoid the second reading of the files with caching a softreference of the content, as i have plenty of unused memory.... but for now its vacation time.

iloveeclipse · 2024-07-11T09:27:33Z

I think we could also avoid the second reading of the files with caching a softreference of the content

That would be less invasive change as all what I saw proposed before.

stephan-herrmann · 2024-07-11T13:07:07Z

my colleagues who are waiting 15min each day for their builds

in my experience most significant waste of time typically happens when builders are triggered unnecessarily, in particular when builders are triggering each other, or full builds happening after restart although previous exit should have saved a clean state, etc. Perhaps this was already improved recently, and anyway those issues are hard to reproduce & fix, but to me that sounds more like dealing with minutes rather then the few seconds reported above.

ymmv

During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691

During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691

During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. #2691

jukzi added the performance label Jul 8, 2024

jukzi pushed a commit to jukzi/eclipse.jdt.core that referenced this issue Jul 9, 2024

Avoid read source twice during compile eclipse-jdt#2691

0361043

Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691

jukzi mentioned this issue Jul 9, 2024

Avoid read source twice during compile #2691 #2694

Closed

jukzi pushed a commit to jukzi/eclipse.jdt.core that referenced this issue Jul 9, 2024

Avoid read source twice during compile eclipse-jdt#2691

fd0b41b

Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691

jukzi mentioned this issue Jul 9, 2024

Avoid read source twice during compile #2691 #2695

Closed

jukzi pushed a commit to jukzi/eclipse.jdt.core that referenced this issue Jul 10, 2024

Avoid read source twice during compile eclipse-jdt#2691

6d599ff

Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691

jukzi mentioned this issue Sep 4, 2024

[performance] Avoid reading SourceFile twice #2910

Closed

jukzi linked a pull request Sep 4, 2024 that will close this issue

[performance] Avoid reading SourceFile twice #2910

Closed

jukzi mentioned this issue Sep 30, 2024

[performance] Avoid reading SourceFile twice #3030

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean build: SourceFile(s) read twice #2691

Clean build: SourceFile(s) read twice #2691

jukzi commented Jul 8, 2024 •

edited

Loading

jukzi commented Jul 9, 2024 •

edited

Loading

jukzi commented Jul 9, 2024

iloveeclipse commented Jul 9, 2024

srikanth-sankaran commented Jul 9, 2024

jukzi commented Jul 9, 2024

srikanth-sankaran commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

jukzi commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

srikanth-sankaran commented Jul 10, 2024

srikanth-sankaran commented Jul 10, 2024

srikanth-sankaran commented Jul 10, 2024

jukzi commented Jul 10, 2024 •

edited

Loading

stephan-herrmann commented Jul 10, 2024

stephan-herrmann commented Jul 10, 2024 •

edited

Loading

srikanth-sankaran commented Jul 11, 2024

jukzi commented Jul 11, 2024

iloveeclipse commented Jul 11, 2024

stephan-herrmann commented Jul 11, 2024

Clean build: SourceFile(s) read twice #2691

Clean build: SourceFile(s) read twice #2691

Comments

jukzi commented Jul 8, 2024 • edited Loading

jukzi commented Jul 9, 2024 • edited Loading

jukzi commented Jul 9, 2024

iloveeclipse commented Jul 9, 2024

srikanth-sankaran commented Jul 9, 2024

jukzi commented Jul 9, 2024

srikanth-sankaran commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

jukzi commented Jul 9, 2024

stephan-herrmann commented Jul 9, 2024

srikanth-sankaran commented Jul 10, 2024

srikanth-sankaran commented Jul 10, 2024

srikanth-sankaran commented Jul 10, 2024

jukzi commented Jul 10, 2024 • edited Loading

stephan-herrmann commented Jul 10, 2024

stephan-herrmann commented Jul 10, 2024 • edited Loading

srikanth-sankaran commented Jul 11, 2024

jukzi commented Jul 11, 2024

iloveeclipse commented Jul 11, 2024

stephan-herrmann commented Jul 11, 2024

jukzi commented Jul 8, 2024 •

edited

Loading

jukzi commented Jul 9, 2024 •

edited

Loading

jukzi commented Jul 10, 2024 •

edited

Loading

stephan-herrmann commented Jul 10, 2024 •

edited

Loading