Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review: chore: Try to list new errors in javadoc regression check #4025

Merged
merged 11 commits into from
Jul 7, 2021

Conversation

I-Al-Istannen
Copy link
Collaborator

Motivation

Before this patch the CI Javadoc regression check would only look at the amount of violations and fail the build if they increased:

JAVADOC QUALITY SCORE (lower is better)
    Compare: 2584
    Current: 2592

Javadoc quality has deteriorated!
Run the chore/ci-checkstyle-javadoc.sh script locally to find errors
See https://github.com/inria/spoon/issues/3923 for details

This wasn't really helpful when actually finding the errors, as you would need to run the checkstyle command locally on both branches and diff the output yourself.

Changes

This patch adds a small python script that tries to (somewhat intelligently) perform this diff in the CI and print a more useful failure message.

The line numbers in the output will not be correct if the exact same violation message appears multiple times in a single file, but it is better than having no feedback.

Help wanted

How do you test changes that affect your CI process? I tested them locally by:

  1. Checking out this PR's branch
  2. Randomly deleting some helpful javadoc
  3. Committing the changes
  4. Running ./chore/ci-checkstyle-javadoc.sh COMPARE_WITH_MASTER

This, however, doesn't quite help me verify it doesn't crash in your CI environment.

Example output

JAVADOC QUALITY SCORE (lower is better)
    Compare: 2584
    Current: 2592
    
Javadoc quality has deteriorated!
Run the chore/ci-checkstyle-javadoc.sh script locally to find errors
See https://github.com/inria/spoon/issues/3923 for details


You likely added the following new errors. Line numbers might be incorrect if there are multiple violations of the same type in the file.
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/visitor/CtVisitor.java:116:10: Erwartetes Tag @param für '<T>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/visitor/CtVisitor.java:543:54: Erwartetes Tag @param für 'pattern'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/factory/CoreFactory.java:131: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/factory/CoreFactory.java:136:10: Erwartetes Tag @param für '<T>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:40: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:46: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:47:10: Erwartetes Tag @param für '<C>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:47:71: Erwartetes Tag @param für 'variable'. [JavadocMethod]

@slarse
Copy link
Collaborator

slarse commented Jul 2, 2021

Hi @I-Al-Istannen,

This is definitely an improvement, but looking at your Python script, I think that going "all the way" isn't that much harder. By all the way, I mean the following redesign of the whole thing:

  1. Run Javadoc checkstyle on the current branch
  2. Run git diff with the master branch
  3. Find all checkstyle violations that appear on added lines present in the diff
    • If there are any violations, print violations and fail build
    • Otherwise, all good!

This isn't really that hard to implement. It's relatively easy to parse the changed lines from a Git diff (see e.g. this parse function), and from there all that remains is to filter by file and line range (see e.g. this filtering function).

WDYT?

How do you test changes that affect your CI process?

Activate GitHub Actions in your fork, and run it there. You can also change which branch you compare with if you don't want to mess up the master branch.

@slarse
Copy link
Collaborator

slarse commented Jul 2, 2021

Of course, if you feel like that's too much work, any improvement is of course welcome.

I'm also thinking it would be beneficial to convert the whole script into Python, I can't quite recall why I wrote it in bash :)

@I-Al-Istannen
Copy link
Collaborator Author

That's an interesting approach, I'll try it out. I'll have to see where exactly checkstyle reports its errors, I image it might not catch the following. We'll see.

 /*
- * @param foobar a param
  *
  */
 public void foo(String foobar);

I wanted to avoid tinkering with your existing scripts too much, that's why it takes the checkstyle output files and keeps the bash script. I wasn't sure whether any other part of your project depends on them, sometimes bash scripts are used in interesting ways from different places :)

As you seem to be fine with a change there, I could try to remove the middleman.

I'm also thinking it would be beneficial to convert the whole script into Python, I can't quite recall why I wrote it in bash :)

That's what I always end up with when hacking together CI pipelines without worrying too much about it, so maybe that was a simiiar motivation ;) Bash's quite convenient if you just shell out to other projects and combine some tools.

I will give diff parsing a go. Thankfully it isn't quite as high stakes as this one...

@slarse
Copy link
Collaborator

slarse commented Jul 2, 2021

I image it might not catch the following. We'll see.

A, right, that will of course not be caught! So, added violations are easy, but actual regressions are harder. Need to think a bit about that.

@I-Al-Istannen
Copy link
Collaborator Author

I-Al-Istannen commented Jul 2, 2021

Sorry for the wall of text :/

So far I am not convinced you can accurately attribute the regressions to their line numbers without investing quite a bit of effort.


  • We can ignore deletions that happen inside a function, as they won't ever impact the javadoc coverage.
  • We can ignore deletions that happen in the declaration of a function, as they will count as an "added" line somewhere and checkstyle should report them.
  • We can not ignore deletions that happen anywhere in the javadoc, as they might cause a regression.

If we notice that a deleted line touches any javadoc, the diff might not have the corresponding function/class declaration checkstyle reports errors for (the context of a diff is limited to X lines). So we'd need to look at the original file on disk and find the corresponding function/class declaration. That is quite doable.
Using this we can assume that we know the exact line (and therefore all violations) checkstyle reported for any (relevant) touched function/class. If the whole element was deleted, checkstyle will not report anything and we can drop the match.

Now that we know the errors checkstyle reports for an element we know was touched in the diff, we need to figure out if they are new. To do this we need to figure out the corresponding element in the comparison branch. As we know this change only impacted Javadoc, the qualified name of the element is the same (iff the containing class didn't change its name). So we could try to find the element with the same name and parameters in the comparison branch and see if the errors existed there.

This would require special handling for types, fields and methods as their qualified name and signature is made up of different elements.

I think we'd end up with a considerably more complex algorithm that can very well contain some bugs. I am not sure that'd be worth it. The CI job should block all regressions and provide useful starting points for fixing them, it shouldn't be complex enough to easily contain bugs and doesn't have to absolutely perfect at highlighting the location.


We could also just assume that nobody in their right mind would introduce a regression in things they didn't actually add in the PR and perform the current dance of comparing absolute numbers, using the logic in this PR to match up error messages but be smarter about added methods, so that their location always corresponds to the actual change the contributor made. I am not sure how much extra work that'd be.


It is quite probable that there is a neat algorithm I couldn't think of. Matching elements against each other sounds like it should be a solved problem.

@slarse
Copy link
Collaborator

slarse commented Jul 3, 2021

Thanks for the analysis, I agree that it sounds like way more effort than it's worth.

And you know what, it's not only deletions that are problematic. Think about this case where we add a new parameter, but forget to document it:

 /**
  * Sum numbers.
  * @param first The first number
  * @param second The second number
  * @return Sum of the provided numbers
  */
 public int sum(
     int first,
-    int second) {
+    int second,
+    int third) {
[...]

Even in this simple case of introducing a violation of the javadoc rule, we can't just look at the added lines, we need to match methods to added lines (as the parameters are spread across multiple lines). So clearly, I didn't think this one through well enough, and my initial best-effort of just counting before-and-after is probably the most fool-proof way of an objective measure of "you made it worse".

On a side note, we could probably get to a pretty decent implementation using gumtree-spoon to match elements across revisions, but creating such a utility would be a project in and of itself. A possibly pretty cool one though, I'm not aware of any off-the-shelf tool for Java for tracking changes across revisions like this.

Sorry if I created extra work for you! In any case, let's go with your original idea, let me know when it's ready to review.

@I-Al-Istannen I-Al-Istannen force-pushed the chore/ci-javadoc-error-messages branch 3 times, most recently from 556bf8d to 94da700 Compare July 4, 2021 09:19
Before this patch the CI Javadoc regression check would only look at the
amount of violations and fail the build if they increased:

```
JAVADOC QUALITY SCORE (lower is better)
    Compare: 2584
    Current: 2592

Javadoc quality has deteriorated!
Run the chore/ci-checkstyle-javadoc.sh script locally to find errors
See INRIA#3923 for details
```

This wasn't really helpful when actually finding the errors, as you
would need to run the checkstyle command locally on both branches and
diff the output yourself. This patch adds a small python script that
tries to (somewhat intelligently) perform this diff in the CI and print
a more useful failure message.

The line numbers in the output will not be correct if the exact same
violation message appears multiple times in a single file, but it is
better than having no feedback.
@I-Al-Istannen I-Al-Istannen force-pushed the chore/ci-javadoc-error-messages branch from 5bfde4b to d3b404e Compare July 4, 2021 09:57
@I-Al-Istannen
Copy link
Collaborator Author

Sorry if I created extra work for you!

No harm done, it was an interesting tangent :) Gumtree-spoon sounds like the solution to the solved problem, that could be an interesting project. But I will pass on that for now ;)


I think the code works in the CI as well, as I tested it on my fork now. GH apparently formats the output, but it is still readable IMHO:
grafik

I can also convert the bash script and integrate it in the python script if you'd like. If not, I'd mark the PR as ready for review.

Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good! I think we can start with just adding this Python script, and then converting the whole thing into Python in a separate PR. This is kind of standalone.

Anyway, the thing I noted in your Python code is that it's mostly about computing amounts of occurrences and differences between those. There's built-in support for that in the stdlib, so I've added some comments on how the code can be simplified (probably, I haven't actually tested any of this :). Have a look and see what you think.

chore/find_javadoc_regression.py Outdated Show resolved Hide resolved
chore/find_javadoc_regression.py Outdated Show resolved Hide resolved
Comment on lines 41 to 54
def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]:
"""Tries to find out which lines are *new* in other and which were already present in the reference."""
other = other.copy()
for line, amount in reference.items():
for _ in range(0, amount):
if line in other:
if other[line] > 1:
other[line] -= 1
else:
other.pop(line)

# Lines that are only in the reference aren't critical, as they were
# apparently fixed in "other"
return other
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use collections.Counter, this can be computed as a subtraction

Suggested change
def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]:
"""Tries to find out which lines are *new* in other and which were already present in the reference."""
other = other.copy()
for line, amount in reference.items():
for _ in range(0, amount):
if line in other:
if other[line] > 1:
other[line] -= 1
else:
other.pop(line)
# Lines that are only in the reference aren't critical, as they were
# apparently fixed in "other"
return other
def try_match_lines(reference: collections.Counter[str], other: collections.Counter[str]) -> Dict[str, int]:
"""Tries to find out which lines are *new* in other and which were already present in the reference."""
return other - reference

Again, maybe it doesn't even need to be a function of its own anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll leave it as a function because I like the comment trying to explain why we subtract them in that order. It is probably perfectly clear without it, but I am slightly more positive that I will understand what I did in a year with it present.

chore/find_javadoc_regression.py Show resolved Hide resolved
@I-Al-Istannen I-Al-Istannen changed the title WIP: chore: Try to list new errors in javadoc regression check review: chore: Try to list new errors in javadoc regression check Jul 6, 2021
@I-Al-Istannen
Copy link
Collaborator Author

Workflow run for dfcca17 with a failure can be found here and without a failure here.

Copy link
Collaborator

@slarse slarse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I had one teeny tiny note in the script, but it's not important. You can address it if you'd like, I'll merge tomorrow morning regardless.

If you feel like it, translating the entire script into Python (say Python 3.6 for good compatibility) is appreciated.

Hide Counter typehints Python 3.6 doesn't understand yet

Right, I forgot that using collections like that was a new thing. But darn is it nice to be able to just type list[str] instead of from typing import List; List[str] :)

chore/find_javadoc_regression.py Outdated Show resolved Hide resolved
@@ -72,6 +83,8 @@ function main() {
echo "Javadoc quality has deteriorated!"
echo "Run the chore/ci-checkstyle-javadoc.sh script locally to find errors"
echo "See $RELATED_ISSUE_URL for details"
echo -e "\n"
python3 chore/find_javadoc_regression.py "$OUTPUT_FILE_COMPARE" "$OUTPUT_FILE_OWN"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a good reason why we'd want to compose this whole thing into Python: the name of the Python executable is something of a moving target right now. Ever since Python 2 went belly-up, the conventions have been in flux. Some systems now only have python.

We want it to be possible to run this script locally with as little effort as possible.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your ci-extra script actually broke in a docker container I tried it in because the ubuntu latest container symlinked python to python2 and your scripts require python 3. That's why I opted for python3 as the executable name here.

But yea, the whole situation is a bit unfortunate...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ci-extra.sh requiring a little bit of tinkering to get running is fine IMO, as it's not meant to run locally. But as this script is meant to be run locally both to fix regressions and to actually improve the Javadoc quality as part of #3923, we'll want to make it as hassle-free as humanly possible.

Python 3.6+ should arguably be more hassle-free than bash, as the former also works on (native) Windows.

Co-authored-by: Simon Larsén <[email protected]>
@slarse slarse merged commit eaa49d3 into INRIA:master Jul 7, 2021
@slarse
Copy link
Collaborator

slarse commented Jul 7, 2021

Thanks @I-Al-Istannen, very nice work!

@I-Al-Istannen I-Al-Istannen deleted the chore/ci-javadoc-error-messages branch July 7, 2021 07:57
@monperrus monperrus mentioned this pull request Aug 19, 2021
woutersmeenk pushed a commit to woutersmeenk/spoon that referenced this pull request Aug 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants