-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review: perf: Precompile patterns for identifier checks #4072
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SirYwell,
This is great! I left one suggestion for how we might be able to squeeze out even more performance.
On a side note, if this turns out to still be a bottleneck after this optimization, we might want to try not using regex at all as it's kind of slow. It's probably even faster to just write custom search functions that operate directly on the string.
src/main/java/spoon/support/reflect/reference/CtReferenceImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ignore the optimization I suggested earlier for now. See my two new comments.
private static final Pattern IS_ARRAY_OR_INSTANCE = Pattern.compile("\\[\\]|@"); | ||
private static final Pattern IS_INNER_OR_GENERIC = Pattern.compile("\\.|<|>"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two patterns are not used to identify arrays or generics, so I think the names are a bit misleading. Could you rename them appropriately? E.g. IS_INNER_OR_GENERIC
would be more fittingly named NESTED_OR_GENERIC_SPLITTER
or something like that.
@@ -101,9 +105,9 @@ private void checkIdentiferForJLSCorrectness(String simplename) { | |||
*/ | |||
//JDTTreeBuilderHelper.computeAnonymousName returns "$numbers$Name" so we have to skip them if they start with numbers | |||
//allow empty identifier because they are sometimes used. | |||
if (!simplename.matches("<.*>|\\d.*|^.{0}$")) { | |||
if (!IS_INNER_OR_GENERIC_OR_EMPTY.matcher(simplename).matches()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I look at this, the more I'm thinking it's completely unnecessary to use a regex for it. The regex only actually checks the first and last characters so it seems redundant to use a regex at all. Seems to me like something like this should be faster and work just as well:
private static boolean isEmptyOrLocalOrGeneric(String identifier) {
return identifier.isEmpty()
|| Character.isDigit(identifier.charAt(0))
|| (identifier.startsWith("<") && identifier.endsWith(">"));
}
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice, a human, static compilation of the regexp :) 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a misunderstanding on my side, it's actually not about generics but about names like <init>
there. I re-used your code there, the check for anonymous/local classes was moved to the other part of the code.
I'll look into it in a few days again, thanks for your feedback so far. |
I spend a little bit more time on this and rewrote the logic without any regular expressions. I wrote microbenchmarks with JMH (https://github.com/SirYwell/spoon-benchmark, can be run with I changed the In general, the new implementation is more restrictive (e.g. I also tried to add dense documentation to avoid further confusion. The bechmark result when running it on my machine (I rearranged the lines):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to look very good! This is significantly more explicit than using a regex. As we don't have any good performance regression tests at the moment, this PR becomes more valuable when it also improves the code itself. Very nice restructuring of the entire thing, as we already iterated over the entire string in checkIdentifierChars
in addition to matching with regex, your restructuring is simply much better.
I have a few comments on the final code but after that I think we can merge.
As a general comment, I'd recommend toning down the use of inline comments. In many cases in this PR, they either cover up insufficiently detailed code (e.g. saying that 0
is a character to expect nothing in particular can be rewritten with a variable name) or state the obvious (e.g. that a keyword is not allowed in an identifier, when the code literally says if this is a keyword return false
).
if (start == i) { // first char of a part | ||
if (!Character.isJavaIdentifierStart(name.charAt(i))) { | ||
return false; | ||
} | ||
} else { | ||
if (!Character.isJavaIdentifierPart(name.charAt(i))) { | ||
return false; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Collapse?
if (start == i) { // first char of a part | |
if (!Character.isJavaIdentifierStart(name.charAt(i))) { | |
return false; | |
} | |
} else { | |
if (!Character.isJavaIdentifierPart(name.charAt(i))) { | |
return false; | |
} | |
} | |
if (i == start && !Character.isJavaIdentifierStart(name.charAt(i)) | |
|| !Character.isJavaIdentifierPart(name.charAt(i))) { | |
return false; | |
} |
i++; | ||
} | ||
int start = i; // used to mark the beginning of a part | ||
char expectNext = 0; // 0 = do not expect anything |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid unnamed magic values :)
char expectNext = 0; // 0 = do not expect anything | |
final char anything = 0; | |
char expectNext = anything; |
int start = i; // used to mark the beginning of a part | ||
char expectNext = 0; // 0 = do not expect anything | ||
for (; i < name.length(); i++) { | ||
if (expectNext != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem
if (expectNext != 0) { | |
if (expectNext != anything) { |
if (name.charAt(i) != expectNext) { | ||
return false; | ||
} else if (name.charAt(i) == expectNext) { | ||
expectNext = 0; // reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expectNext = 0; // reset | |
expectNext = anything; |
- manually, because Github does not seem to allow additional changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spend a little bit more time on this and rewrote the logic without any regular expressions. I wrote microbenchmarks with JMH (https://github.com/SirYwell/spoon-benchmark, can be run with ./gradlew jmh).
Completely missed this comment, but those numbers look very promising! I'm looking forward to trying this out with some of my dependent projects, based on some ad-hoc tests I ran it looks like a pretty substantial performance improvement.
Anyway, this all looks good to me now!
Many thanks @SirYwell, this is an excellent contribution. |
@I-Al-Istannen and I started looking into ways to improve performance of https://github.com/I-Al-Istannen/JavadocApi.
When looking into profilings, I noticed that
Pattern.compile()
had a very high invocation count (in fact, it was the highest of all traced methods). I tracked down the callees and ended up in in methods ofCtReferenceImpl
where regular expressions were used in combination withString#matches()
,String#replaceAll()
andString#split()
. Precompiling these patterns seems to have a very high impact in our case. Some of the comparisons we've done with and without this patch:From @I-Al-Istannen with the linked JavadocApi:
(scanning JavaFX, upper is without patch, lower is with patch)
(scanning OpenJDK 16, upper is without patch, lower is with patch)
With YourKit, I profiled the
CtGenerationTest
, this is a screenshot of the differences focused on thecheckIdentifierForJLSCorrectness
method: