-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean build: SourceFile(s) read twice #2691
Comments
Both reads happen during reproducible with org.eclipse.jdt.core.tests.builder.AnnotationDependencyTests.testTypeAnnotationDependency() |
In Compiler.internalBeginToCompile() a decision is made if a full parse is done or only a dietParse. By default it is only a dietParse because parseThreshold<=0. That diet parse skips parsing the method bodies. To get them later a second parse has to occur. That 2nd parse is done in another Thread ("Compiler Processing Task"). Nevertheless the main compilation has to wait on that result anyway. |
See https://github.com/eclipse-jdt/eclipse.jdt.core/wiki/ECJ-Parse I'm not a compiler expert (but @srikanth-sankaran is), but I assume diet parse looks for obvious errors first (as it skips method bodies), e.g. if class definition is already wrong, no reason to parse methods and do more work if basics are broken anyway. |
For a number of IDE operations it may suffice to work with empty method bodies altogether or drill down into just one method alone (where the cursor is). Think of providing an outline view, generating a Java model, code completion, jump to declaration F3, search for overrides, hierarchy computations etc. So a dietParse need not always be followed by parse of all method bodies. In the batch compiler, it is certainly the case that we will populate all method bodies. But even here, the design breakdown into various phases may aesthetically fit better with the model of dietParse followed by parse of method bodies. This is just one design. Javac doesn't have an equivalent mode. Do you have numbers for a large project's full build ? Any change here would also have to take into account memory utilization. We have the capability and deploy that to null out method bodies once we don't require them too. |
All such operations where only a single file is parsed complete in a blink of an eye anyway. A performance difference can only perceived during parsing many files. Like build, search or type hierarchy.
see screenshots in #2691 (comment) |
Well by all means propose a patch and let us see all it takes to make it green with the full test suite and take a call. Short answer is there is nothing sacrosanct about the split into diet parse plus populating that diet AST with parsed method bodies. As I said javac doesn't follow such an approach. It may be significant work though given this is a long standing design choice that is pervasive in implementation. But before you invest much effort let us ask @stephan-herrmann for his opinion on this too to make sure I haven't overlooked some key observation here. |
Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691
Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691
As this challenges one of the foundations of JDT's design, any change must be done with utmost caution. Some comments:
Consider the example of code completion: indeed parsing may be super fast, but when later the AST is resolved not having to resolve uninteresting methods may very well make a significant difference. Type inference is a candidate that can take several seconds per file. MatchLocator takes the opposite approach, its method purgeMethodStatements() implements what @srikanth-sankaran mentioned above: remove statements after a full parse. But this method cannot directly be re-used for completion, as it uses a search-specific approach. I think it would be very hard to demonstrate that a change of strategy would be beneficial for all uses of the compiler. I could imagine, though, an approach that specifically tells the compiler when it is called by a builder or as batch compiler that it should do a full parse right away (and make sure it doesn't still try a second parse for method bodies :) ). |
Do we know how much of the overhead relates to parsing vs. re-reading? Some implementations of |
@srikanth-sankaran if the CU doesn't cache, do you see value in caching the source in CompilationUnitDeclaration instead? At least this could mean we no longer need Parser.stashTextualRepresentation(FunctionalExpression) :) Of course such cached source can be freed during compilation (if no functional expressions: after parseMethodBodies(), otherwise after resolve). |
I am all for experimentation - but I echo @stephan-herrmann's concerns about In fact I would rephrase his last sentence and say So let us by all means explore but Festina lente! Let us make haste slowly!
As per my earlier comment, let us not focus exclusively on time to the exclusion of space. Case in point, while we are in this call stack:
we have ALL the source files parsed in diet mode and that is perfectly good enough to carry out the operation of "completing type bindings" which has these stages:
Now I worry what is the implication of a full parse at this point for memory utilization - we will be holding complete parse trees of all compilation units which is not necessary for the operation at hand. While we may observe a 6 seconds savings for platforms workspace, are there projects out there where the memory pressure could zoom out of control resulting in thrashing ?? |
Correct, in a large file if code assist starts resolving fuller parse trees, we will start seeing sluggish interactive performance.
|
Use parse() instead of dietParse() to avoid parse during Parser.getMethodBodies() eclipse-jdt#2691
I can connect the dots from across different tickets and make out you don't like this design 😆 To answer, may be. |
There's nothing "wrong" with your design, only a funny connection between a trick in the parser and a rabbit-out-of-the-hat in LE.copy() :) - my comment was only meant to suggest that I prefer one design that cleanly covers several issues, rather than several strategies. OTOH, memory considerations already speak against my suggestion. |
Compare that to a 11% speedup, this sound like a consideration we would perhaps want to pass on to users to decide? OTOH, build time has no hard limit, memory may have ... |
I am not aware of serious complaints about build times. It is great to proactively track and improve, of course. That said, if our performance improvement resource budget is limited - which I suspect it is, I would imagine there ought to be other things we can pursue that don't have equivalent downside risk. Not to be seen as discouraging further exploration here. |
Nothing to complain about, but if we could make it faster my colleagues who are waiting 15min each day for their builds to complete could use their time better then for waiting. So if you have any idea how to get it faster please tell. To my measurements there is no hotspot except the file accesses on windows. I think we could also avoid the second reading of the files with caching a softreference of the content, as i have plenty of unused memory.... but for now its vacation time. |
That would be less invasive change as all what I saw proposed before. |
in my experience most significant waste of time typically happens when builders are triggered unnecessarily, in particular when builders are triggering each other, or full builds happening after restart although previous exit should have saved a clean state, etc. Perhaps this was already improved recently, and anyway those issues are hard to reproduce & fix, but to me that sounds more like dealing with minutes rather then the few seconds reported above. ymmv |
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. eclipse-jdt#2691
During compile parsing happens in two stages: 1. diet parse (any blocks like method bodies are skipped) 2. parse bodies Both phases did read the source .java file from file system. With this change the file contents is kept in CompilationResult.contentRef until no longer needed. It is cached in a SoftReference to avoid OutOfMemoryError. #2691
During Project/"Clean all projects" SourceFile(s) are read multiple times.
Once from ReadManager - exepected.
And later another time from Parser.getMethodBodies() - without further digging i guess this is wrong. It is prepared to use a ReadManager there, but readManager=null.
The text was updated successfully, but these errors were encountered: