-
-
Notifications
You must be signed in to change notification settings - Fork 355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
review: chore: Try to list new errors in javadoc regression check #4025
review: chore: Try to list new errors in javadoc regression check #4025
Conversation
Hi @I-Al-Istannen, This is definitely an improvement, but looking at your Python script, I think that going "all the way" isn't that much harder. By all the way, I mean the following redesign of the whole thing:
This isn't really that hard to implement. It's relatively easy to parse the changed lines from a Git diff (see e.g. this parse function), and from there all that remains is to filter by file and line range (see e.g. this filtering function). WDYT?
Activate GitHub Actions in your fork, and run it there. You can also change which branch you compare with if you don't want to mess up the master branch. |
Of course, if you feel like that's too much work, any improvement is of course welcome. I'm also thinking it would be beneficial to convert the whole script into Python, I can't quite recall why I wrote it in bash :) |
That's an interesting approach, I'll try it out. I'll have to see where exactly checkstyle reports its errors, I image it might not catch the following. We'll see. /*
- * @param foobar a param
*
*/
public void foo(String foobar); I wanted to avoid tinkering with your existing scripts too much, that's why it takes the checkstyle output files and keeps the bash script. I wasn't sure whether any other part of your project depends on them, sometimes bash scripts are used in interesting ways from different places :) As you seem to be fine with a change there, I could try to remove the middleman.
That's what I always end up with when hacking together CI pipelines without worrying too much about it, so maybe that was a simiiar motivation ;) Bash's quite convenient if you just shell out to other projects and combine some tools. I will give diff parsing a go. Thankfully it isn't quite as high stakes as this one... |
A, right, that will of course not be caught! So, added violations are easy, but actual regressions are harder. Need to think a bit about that. |
Sorry for the wall of text :/ So far I am not convinced you can accurately attribute the regressions to their line numbers without investing quite a bit of effort.
If we notice that a deleted line touches any javadoc, the diff might not have the corresponding function/class declaration checkstyle reports errors for (the context of a diff is limited to X lines). So we'd need to look at the original file on disk and find the corresponding function/class declaration. That is quite doable. Now that we know the errors checkstyle reports for an element we know was touched in the diff, we need to figure out if they are new. To do this we need to figure out the corresponding element in the comparison branch. As we know this change only impacted Javadoc, the qualified name of the element is the same (iff the containing class didn't change its name). So we could try to find the element with the same name and parameters in the comparison branch and see if the errors existed there. This would require special handling for types, fields and methods as their qualified name and signature is made up of different elements. I think we'd end up with a considerably more complex algorithm that can very well contain some bugs. I am not sure that'd be worth it. The CI job should block all regressions and provide useful starting points for fixing them, it shouldn't be complex enough to easily contain bugs and doesn't have to absolutely perfect at highlighting the location. We could also just assume that nobody in their right mind would introduce a regression in things they didn't actually add in the PR and perform the current dance of comparing absolute numbers, using the logic in this PR to match up error messages but be smarter about added methods, so that their location always corresponds to the actual change the contributor made. I am not sure how much extra work that'd be. It is quite probable that there is a neat algorithm I couldn't think of. Matching elements against each other sounds like it should be a solved problem. |
Thanks for the analysis, I agree that it sounds like way more effort than it's worth. And you know what, it's not only deletions that are problematic. Think about this case where we add a new parameter, but forget to document it: /**
* Sum numbers.
* @param first The first number
* @param second The second number
* @return Sum of the provided numbers
*/
public int sum(
int first,
- int second) {
+ int second,
+ int third) {
[...] Even in this simple case of introducing a violation of the javadoc rule, we can't just look at the added lines, we need to match methods to added lines (as the parameters are spread across multiple lines). So clearly, I didn't think this one through well enough, and my initial best-effort of just counting before-and-after is probably the most fool-proof way of an objective measure of "you made it worse". On a side note, we could probably get to a pretty decent implementation using gumtree-spoon to match elements across revisions, but creating such a utility would be a project in and of itself. A possibly pretty cool one though, I'm not aware of any off-the-shelf tool for Java for tracking changes across revisions like this. Sorry if I created extra work for you! In any case, let's go with your original idea, let me know when it's ready to review. |
556bf8d
to
94da700
Compare
Before this patch the CI Javadoc regression check would only look at the amount of violations and fail the build if they increased: ``` JAVADOC QUALITY SCORE (lower is better) Compare: 2584 Current: 2592 Javadoc quality has deteriorated! Run the chore/ci-checkstyle-javadoc.sh script locally to find errors See INRIA#3923 for details ``` This wasn't really helpful when actually finding the errors, as you would need to run the checkstyle command locally on both branches and diff the output yourself. This patch adds a small python script that tries to (somewhat intelligently) perform this diff in the CI and print a more useful failure message. The line numbers in the output will not be correct if the exact same violation message appears multiple times in a single file, but it is better than having no feedback.
5bfde4b
to
d3b404e
Compare
No harm done, it was an interesting tangent :) Gumtree-spoon sounds like the solution to the solved problem, that could be an interesting project. But I will pass on that for now ;) I think the code works in the CI as well, as I tested it on my fork now. GH apparently formats the output, but it is still readable IMHO: I can also convert the bash script and integrate it in the python script if you'd like. If not, I'd mark the PR as ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks mostly good! I think we can start with just adding this Python script, and then converting the whole thing into Python in a separate PR. This is kind of standalone.
Anyway, the thing I noted in your Python code is that it's mostly about computing amounts of occurrences and differences between those. There's built-in support for that in the stdlib, so I've added some comments on how the code can be simplified (probably, I haven't actually tested any of this :). Have a look and see what you think.
chore/find_javadoc_regression.py
Outdated
def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]: | ||
"""Tries to find out which lines are *new* in other and which were already present in the reference.""" | ||
other = other.copy() | ||
for line, amount in reference.items(): | ||
for _ in range(0, amount): | ||
if line in other: | ||
if other[line] > 1: | ||
other[line] -= 1 | ||
else: | ||
other.pop(line) | ||
|
||
# Lines that are only in the reference aren't critical, as they were | ||
# apparently fixed in "other" | ||
return other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use collections.Counter
, this can be computed as a subtraction
def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]: | |
"""Tries to find out which lines are *new* in other and which were already present in the reference.""" | |
other = other.copy() | |
for line, amount in reference.items(): | |
for _ in range(0, amount): | |
if line in other: | |
if other[line] > 1: | |
other[line] -= 1 | |
else: | |
other.pop(line) | |
# Lines that are only in the reference aren't critical, as they were | |
# apparently fixed in "other" | |
return other | |
def try_match_lines(reference: collections.Counter[str], other: collections.Counter[str]) -> Dict[str, int]: | |
"""Tries to find out which lines are *new* in other and which were already present in the reference.""" | |
return other - reference |
Again, maybe it doesn't even need to be a function of its own anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll leave it as a function because I like the comment trying to explain why we subtract them in that order. It is probably perfectly clear without it, but I am slightly more positive that I will understand what I did in a year with it present.
This reverts commit fbe2291.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I had one teeny tiny note in the script, but it's not important. You can address it if you'd like, I'll merge tomorrow morning regardless.
If you feel like it, translating the entire script into Python (say Python 3.6 for good compatibility) is appreciated.
Hide Counter typehints Python 3.6 doesn't understand yet
Right, I forgot that using collections like that was a new thing. But darn is it nice to be able to just type list[str]
instead of from typing import List; List[str]
:)
@@ -72,6 +83,8 @@ function main() { | |||
echo "Javadoc quality has deteriorated!" | |||
echo "Run the chore/ci-checkstyle-javadoc.sh script locally to find errors" | |||
echo "See $RELATED_ISSUE_URL for details" | |||
echo -e "\n" | |||
python3 chore/find_javadoc_regression.py "$OUTPUT_FILE_COMPARE" "$OUTPUT_FILE_OWN" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a good reason why we'd want to compose this whole thing into Python: the name of the Python executable is something of a moving target right now. Ever since Python 2 went belly-up, the conventions have been in flux. Some systems now only have python
.
We want it to be possible to run this script locally with as little effort as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your ci-extra
script actually broke in a docker container I tried it in because the ubuntu latest container symlinked python
to python2
and your scripts require python 3. That's why I opted for python3
as the executable name here.
But yea, the whole situation is a bit unfortunate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, ci-extra.sh
requiring a little bit of tinkering to get running is fine IMO, as it's not meant to run locally. But as this script is meant to be run locally both to fix regressions and to actually improve the Javadoc quality as part of #3923, we'll want to make it as hassle-free as humanly possible.
Python 3.6+ should arguably be more hassle-free than bash, as the former also works on (native) Windows.
Co-authored-by: Simon Larsén <[email protected]>
Thanks @I-Al-Istannen, very nice work! |
Motivation
Before this patch the CI Javadoc regression check would only look at the amount of violations and fail the build if they increased:
This wasn't really helpful when actually finding the errors, as you would need to run the checkstyle command locally on both branches and diff the output yourself.
Changes
This patch adds a small python script that tries to (somewhat intelligently) perform this diff in the CI and print a more useful failure message.
The line numbers in the output will not be correct if the exact same violation message appears multiple times in a single file, but it is better than having no feedback.
Help wanted
How do you test changes that affect your CI process? I tested them locally by:
./chore/ci-checkstyle-javadoc.sh COMPARE_WITH_MASTER
This, however, doesn't quite help me verify it doesn't crash in your CI environment.
Example output