review: chore: Try to list new errors in javadoc regression check #4025

I-Al-Istannen · 2021-07-02T10:45:05Z

Motivation

Before this patch the CI Javadoc regression check would only look at the amount of violations and fail the build if they increased:

JAVADOC QUALITY SCORE (lower is better)
    Compare: 2584
    Current: 2592

Javadoc quality has deteriorated!
Run the chore/ci-checkstyle-javadoc.sh script locally to find errors
See https://github.com/inria/spoon/issues/3923 for details

This wasn't really helpful when actually finding the errors, as you would need to run the checkstyle command locally on both branches and diff the output yourself.

Changes

This patch adds a small python script that tries to (somewhat intelligently) perform this diff in the CI and print a more useful failure message.

The line numbers in the output will not be correct if the exact same violation message appears multiple times in a single file, but it is better than having no feedback.

Help wanted

How do you test changes that affect your CI process? I tested them locally by:

Checking out this PR's branch
Randomly deleting some helpful javadoc
Committing the changes
Running ./chore/ci-checkstyle-javadoc.sh COMPARE_WITH_MASTER

This, however, doesn't quite help me verify it doesn't crash in your CI environment.

Example output

JAVADOC QUALITY SCORE (lower is better)
    Compare: 2584
    Current: 2592
    
Javadoc quality has deteriorated!
Run the chore/ci-checkstyle-javadoc.sh script locally to find errors
See https://github.com/inria/spoon/issues/3923 for details


You likely added the following new errors. Line numbers might be incorrect if there are multiple violations of the same type in the file.
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/visitor/CtVisitor.java:116:10: Erwartetes Tag @param für '<T>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/visitor/CtVisitor.java:543:54: Erwartetes Tag @param für 'pattern'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/factory/CoreFactory.java:131: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/factory/CoreFactory.java:136:10: Erwartetes Tag @param für '<T>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:40: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:46: Fehlendes @return-Tag. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:47:10: Erwartetes Tag @param für '<C>'. [JavadocMethod]
  [ERROR] /tmp/spoon/src/main/java/spoon/reflect/code/CtTypePattern.java:47:71: Erwartetes Tag @param für 'variable'. [JavadocMethod]

slarse · 2021-07-02T11:33:30Z

Hi @I-Al-Istannen,

This is definitely an improvement, but looking at your Python script, I think that going "all the way" isn't that much harder. By all the way, I mean the following redesign of the whole thing:

Run Javadoc checkstyle on the current branch
Run git diff with the master branch
Find all checkstyle violations that appear on added lines present in the diff
- If there are any violations, print violations and fail build
- Otherwise, all good!

This isn't really that hard to implement. It's relatively easy to parse the changed lines from a Git diff (see e.g. this parse function), and from there all that remains is to filter by file and line range (see e.g. this filtering function).

WDYT?

How do you test changes that affect your CI process?

Activate GitHub Actions in your fork, and run it there. You can also change which branch you compare with if you don't want to mess up the master branch.

slarse · 2021-07-02T11:36:58Z

Of course, if you feel like that's too much work, any improvement is of course welcome.

I'm also thinking it would be beneficial to convert the whole script into Python, I can't quite recall why I wrote it in bash :)

I-Al-Istannen · 2021-07-02T11:55:39Z

That's an interesting approach, I'll try it out. I'll have to see where exactly checkstyle reports its errors, I image it might not catch the following. We'll see.

 /*
- * @param foobar a param
  *
  */
 public void foo(String foobar);

I wanted to avoid tinkering with your existing scripts too much, that's why it takes the checkstyle output files and keeps the bash script. I wasn't sure whether any other part of your project depends on them, sometimes bash scripts are used in interesting ways from different places :)

As you seem to be fine with a change there, I could try to remove the middleman.

I'm also thinking it would be beneficial to convert the whole script into Python, I can't quite recall why I wrote it in bash :)

That's what I always end up with when hacking together CI pipelines without worrying too much about it, so maybe that was a simiiar motivation ;) Bash's quite convenient if you just shell out to other projects and combine some tools.

I will give diff parsing a go. Thankfully it isn't quite as high stakes as this one...

slarse · 2021-07-02T12:11:27Z

I image it might not catch the following. We'll see.

A, right, that will of course not be caught! So, added violations are easy, but actual regressions are harder. Need to think a bit about that.

I-Al-Istannen · 2021-07-02T18:11:23Z

Sorry for the wall of text :/

So far I am not convinced you can accurately attribute the regressions to their line numbers without investing quite a bit of effort.

We can ignore deletions that happen inside a function, as they won't ever impact the javadoc coverage.
We can ignore deletions that happen in the declaration of a function, as they will count as an "added" line somewhere and checkstyle should report them.
We can not ignore deletions that happen anywhere in the javadoc, as they might cause a regression.

If we notice that a deleted line touches any javadoc, the diff might not have the corresponding function/class declaration checkstyle reports errors for (the context of a diff is limited to X lines). So we'd need to look at the original file on disk and find the corresponding function/class declaration. That is quite doable.
Using this we can assume that we know the exact line (and therefore all violations) checkstyle reported for any (relevant) touched function/class. If the whole element was deleted, checkstyle will not report anything and we can drop the match.

Now that we know the errors checkstyle reports for an element we know was touched in the diff, we need to figure out if they are new. To do this we need to figure out the corresponding element in the comparison branch. As we know this change only impacted Javadoc, the qualified name of the element is the same (iff the containing class didn't change its name). So we could try to find the element with the same name and parameters in the comparison branch and see if the errors existed there.

This would require special handling for types, fields and methods as their qualified name and signature is made up of different elements.

I think we'd end up with a considerably more complex algorithm that can very well contain some bugs. I am not sure that'd be worth it. The CI job should block all regressions and provide useful starting points for fixing them, it shouldn't be complex enough to easily contain bugs and doesn't have to absolutely perfect at highlighting the location.

We could also just assume that nobody in their right mind would introduce a regression in things they didn't actually add in the PR and perform the current dance of comparing absolute numbers, using the logic in this PR to match up error messages but be smarter about added methods, so that their location always corresponds to the actual change the contributor made. I am not sure how much extra work that'd be.

It is quite probable that there is a neat algorithm I couldn't think of. Matching elements against each other sounds like it should be a solved problem.

slarse · 2021-07-03T15:29:46Z

Thanks for the analysis, I agree that it sounds like way more effort than it's worth.

And you know what, it's not only deletions that are problematic. Think about this case where we add a new parameter, but forget to document it:

 /**
  * Sum numbers.
  * @param first The first number
  * @param second The second number
  * @return Sum of the provided numbers
  */
 public int sum(
     int first,
-    int second) {
+    int second,
+    int third) {
[...]

Even in this simple case of introducing a violation of the javadoc rule, we can't just look at the added lines, we need to match methods to added lines (as the parameters are spread across multiple lines). So clearly, I didn't think this one through well enough, and my initial best-effort of just counting before-and-after is probably the most fool-proof way of an objective measure of "you made it worse".

On a side note, we could probably get to a pretty decent implementation using gumtree-spoon to match elements across revisions, but creating such a utility would be a project in and of itself. A possibly pretty cool one though, I'm not aware of any off-the-shelf tool for Java for tracking changes across revisions like this.

Sorry if I created extra work for you! In any case, let's go with your original idea, let me know when it's ready to review.

Before this patch the CI Javadoc regression check would only look at the amount of violations and fail the build if they increased: ``` JAVADOC QUALITY SCORE (lower is better) Compare: 2584 Current: 2592 Javadoc quality has deteriorated! Run the chore/ci-checkstyle-javadoc.sh script locally to find errors See INRIA#3923 for details ``` This wasn't really helpful when actually finding the errors, as you would need to run the checkstyle command locally on both branches and diff the output yourself. This patch adds a small python script that tries to (somewhat intelligently) perform this diff in the CI and print a more useful failure message. The line numbers in the output will not be correct if the exact same violation message appears multiple times in a single file, but it is better than having no feedback.

I-Al-Istannen · 2021-07-04T10:10:24Z

Sorry if I created extra work for you!

No harm done, it was an interesting tangent :) Gumtree-spoon sounds like the solution to the solved problem, that could be an interesting project. But I will pass on that for now ;)

I think the code works in the CI as well, as I tested it on my fork now. GH apparently formats the output, but it is still readable IMHO:

I can also convert the bash script and integrate it in the python script if you'd like. If not, I'd mark the PR as ready for review.

slarse

This looks mostly good! I think we can start with just adding this Python script, and then converting the whole thing into Python in a separate PR. This is kind of standalone.

Anyway, the thing I noted in your Python code is that it's mostly about computing amounts of occurrences and differences between those. There's built-in support for that in the stdlib, so I've added some comments on how the code can be simplified (probably, I haven't actually tested any of this :). Have a look and see what you think.

chore/find_javadoc_regression.py

slarse · 2021-07-06T08:20:15Z

chore/find_javadoc_regression.py

+def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]:
+    """Tries to find out which lines are *new* in other and which were already present in the reference."""
+    other = other.copy()
+    for line, amount in reference.items():
+        for _ in range(0, amount):
+            if line in other:
+                if other[line] > 1:
+                    other[line] -= 1
+                else:
+                    other.pop(line)
+
+    # Lines that are only in the reference aren't critical, as they were
+    # apparently fixed in "other"
+    return other


If you use collections.Counter, this can be computed as a subtraction

Suggested change

def try_match_lines(reference: Dict[str, int], other: Dict[str, int]) -> Dict[str, int]:

"""Tries to find out which lines are *new* in other and which were already present in the reference."""

other = other.copy()

for line, amount in reference.items():

for _ in range(0, amount):

if line in other:

if other[line] > 1:

other[line] -= 1

else:

other.pop(line)

# Lines that are only in the reference aren't critical, as they were

# apparently fixed in "other"

return other

def try_match_lines(reference: collections.Counter[str], other: collections.Counter[str]) -> Dict[str, int]:

"""Tries to find out which lines are *new* in other and which were already present in the reference."""

return other - reference

Again, maybe it doesn't even need to be a function of its own anymore

I think I'll leave it as a function because I like the comment trying to explain why we subtract them in that order. It is probably perfectly clear without it, but I am slightly more positive that I will understand what I did in a year with it present.

chore/find_javadoc_regression.py

This reverts commit fbe2291.

I-Al-Istannen · 2021-07-06T11:15:14Z

Workflow run for dfcca17 with a failure can be found here and without a failure here.

slarse

LGTM!

I had one teeny tiny note in the script, but it's not important. You can address it if you'd like, I'll merge tomorrow morning regardless.

If you feel like it, translating the entire script into Python (say Python 3.6 for good compatibility) is appreciated.

Hide Counter typehints Python 3.6 doesn't understand yet

Right, I forgot that using collections like that was a new thing. But darn is it nice to be able to just type list[str] instead of from typing import List; List[str] :)

chore/find_javadoc_regression.py

slarse · 2021-07-06T11:57:52Z

chore/ci-checkstyle-javadoc.sh

@@ -72,6 +83,8 @@ function main() {
        echo "Javadoc quality has deteriorated!"
        echo "Run the chore/ci-checkstyle-javadoc.sh script locally to find errors"
        echo "See $RELATED_ISSUE_URL for details"
+        echo -e "\n"
+        python3 chore/find_javadoc_regression.py "$OUTPUT_FILE_COMPARE" "$OUTPUT_FILE_OWN"


Here's a good reason why we'd want to compose this whole thing into Python: the name of the Python executable is something of a moving target right now. Ever since Python 2 went belly-up, the conventions have been in flux. Some systems now only have python.

We want it to be possible to run this script locally with as little effort as possible.

Your ci-extra script actually broke in a docker container I tried it in because the ubuntu latest container symlinked python to python2 and your scripts require python 3. That's why I opted for python3 as the executable name here.

But yea, the whole situation is a bit unfortunate...

Well, ci-extra.sh requiring a little bit of tinkering to get running is fine IMO, as it's not meant to run locally. But as this script is meant to be run locally both to fix regressions and to actually improve the Javadoc quality as part of #3923, we'll want to make it as hassle-free as humanly possible.

Python 3.6+ should arguably be more hassle-free than bash, as the former also works on (native) Windows.

Co-authored-by: Simon Larsén <[email protected]>

slarse · 2021-07-07T06:20:06Z

Thanks @I-Al-Istannen, very nice work!

I-Al-Istannen force-pushed the chore/ci-javadoc-error-messages branch 3 times, most recently from 556bf8d to 94da700 Compare July 4, 2021 09:19

I-Al-Istannen force-pushed the chore/ci-javadoc-error-messages branch from 5bfde4b to d3b404e Compare July 4, 2021 09:57

slarse suggested changes Jul 6, 2021

View reviewed changes

I-Al-Istannen added 2 commits July 6, 2021 12:18

Use a Counter

dc3cf46

Use a list comprehension for relevant line filtering

45152de

I-Al-Istannen changed the title ~~WIP: chore: Try to list new errors in javadoc regression check~~ review: chore: Try to list new errors in javadoc regression check Jul 6, 2021

I-Al-Istannen added 7 commits July 6, 2021 12:26

Rename try_match_lines to find_added_violation_lines

a1dace5

Try to trigger GH workflow

9a96941

Try to trigger GA workflow

59edbfa

Hide Counter typehints Python 3.6 doesn't understand yet

dfcca17

Revert workflow changes

f06623e

Trigger workflow for run with no javadoc changes

fbe2291

Revert "Trigger workflow for run with no javadoc changes"

c4b9664

This reverts commit fbe2291.

slarse approved these changes Jul 6, 2021

View reviewed changes

Omit redundant counter copy

bbf1491

Co-authored-by: Simon Larsén <[email protected]>

slarse merged commit eaa49d3 into INRIA:master Jul 7, 2021

I-Al-Istannen deleted the chore/ci-javadoc-error-messages branch July 7, 2021 07:57

I-Al-Istannen mentioned this pull request Jul 7, 2021

review: chore: Migrate Javadoc regression script to python #4032

Merged

monperrus mentioned this pull request Aug 19, 2021

release Spoon 9.1.0 #4104

Merged

woutersmeenk pushed a commit to woutersmeenk/spoon that referenced this pull request Aug 29, 2021

chore: Try to list new errors in javadoc regression check (INRIA#4025)

94b24e5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

review: chore: Try to list new errors in javadoc regression check #4025

review: chore: Try to list new errors in javadoc regression check #4025

I-Al-Istannen commented Jul 2, 2021

slarse commented Jul 2, 2021 •

edited

Loading

slarse commented Jul 2, 2021

I-Al-Istannen commented Jul 2, 2021

slarse commented Jul 2, 2021

I-Al-Istannen commented Jul 2, 2021 •

edited

Loading

slarse commented Jul 3, 2021 •

edited

Loading

I-Al-Istannen commented Jul 4, 2021

slarse left a comment

slarse Jul 6, 2021

I-Al-Istannen Jul 6, 2021

I-Al-Istannen commented Jul 6, 2021

slarse left a comment •

edited

Loading

slarse Jul 6, 2021

I-Al-Istannen Jul 6, 2021

slarse Jul 6, 2021

slarse commented Jul 7, 2021

review: chore: Try to list new errors in javadoc regression check #4025

review: chore: Try to list new errors in javadoc regression check #4025

Conversation

I-Al-Istannen commented Jul 2, 2021

Motivation

Changes

Help wanted

Example output

slarse commented Jul 2, 2021 • edited Loading

slarse commented Jul 2, 2021

I-Al-Istannen commented Jul 2, 2021

slarse commented Jul 2, 2021

I-Al-Istannen commented Jul 2, 2021 • edited Loading

slarse commented Jul 3, 2021 • edited Loading

I-Al-Istannen commented Jul 4, 2021

slarse left a comment

Choose a reason for hiding this comment

slarse Jul 6, 2021

Choose a reason for hiding this comment

I-Al-Istannen Jul 6, 2021

Choose a reason for hiding this comment

I-Al-Istannen commented Jul 6, 2021

slarse left a comment • edited Loading

Choose a reason for hiding this comment

slarse Jul 6, 2021

Choose a reason for hiding this comment

I-Al-Istannen Jul 6, 2021

Choose a reason for hiding this comment

slarse Jul 6, 2021

Choose a reason for hiding this comment

slarse commented Jul 7, 2021

slarse commented Jul 2, 2021 •

edited

Loading

I-Al-Istannen commented Jul 2, 2021 •

edited

Loading

slarse commented Jul 3, 2021 •

edited

Loading

slarse left a comment •

edited

Loading