Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BestFitScanner doesn't handle joint declarations well (affects DeadStoreProcessor (rule 1854)) #522

Closed
slarse opened this issue May 7, 2021 · 16 comments · Fixed by #524
Closed
Labels
bug Something isn't working

Comments

@slarse
Copy link
Collaborator

slarse commented May 7, 2021

Maybe you already have small reproducible examples and I am not sure if this is even related but running java -jar sorald-0.2.0-jar-with-dependencies.jar repair --source=Test.java --rule-key=1854 (Unused assignments should be removed) on

public class Test {
    public static void main(String[] args) {
        String x = "hello", y = "world";
        System.out.println(x);
    }
}

yields:

public class Test {
    public static void main(String[] args) {String y = "world";
        String x;
        System.out.println(x);
    }
}

Originally posted by @lyxell in #337 (comment)

@slarse
Copy link
Collaborator Author

slarse commented May 7, 2021

There are two problems here:

There's a missing newline after the opening curly brace. I thought I fixed that in INRIA/spoon#3856, but apparently I didn't, at least not for statement lists.

The joint declaration is strangely modeled in Spoon. It's modeled as two separate CtVariable, but with the exact same source position. As we match Sonar rule violations to Spoon elements by best positional match, and these will always tie, it will simply pick the first one. So we need to expand the matching rules in order to accommodate for joint declarations.

@slarse slarse added the bug Something isn't working label May 7, 2021
@lyxell
Copy link

lyxell commented May 7, 2021

The joint declaration is strangely modeled in Spoon. It's modeled as two separate CtVariable, but with the exact same source position. As we match Sonar rule violations to Spoon elements by best positional match, and these will always tie, it will simply pick the first one. So we need to expand the matching rules in order to accommodate for joint declarations.

Would it be impossible to change this in Spoon? Also, how is the source position represented by Spoon? Is it by line/column number, token index or file offset?

Where can I find the code that handles the matching of Sonar rule violations to Spoon elements?

@slarse
Copy link
Collaborator Author

slarse commented May 7, 2021

The joint declaration is strangely modeled in Spoon. It's modeled as two separate CtVariable, but with the exact same source position. As we match Sonar rule violations to Spoon elements by best positional match, and these will always tie, it will simply pick the first one. So we need to expand the matching rules in order to accommodate for joint declarations.

Would it be impossible to change this in Spoon?

Change: probably not. That would be backwards incompatible, and Spoon is very much built around being backwards compatible. Personally, I think it should be modeled with a separate element in the metamodel (say, CtJointDeclaration), in turn containing the individual variable declarations. That would better reflect the actual syntactical structure, but it's breaking.

@monperrus do you have any immediate thoughts on the modelling of joint declarations in Spoon?

Also, how is the source position represented by Spoon? Is it by line/column number, token index or file offset?

Line, column and character index. See SourcePosition.

Where can I find the code that handles the matching of Sonar rule violations to Spoon elements?

It's in sorald.sonar.BestFitScanner.

@lyxell
Copy link

lyxell commented May 7, 2021

Given a declaration like this where both variables are unused, Sonar seems to report the following columns for 1854:

String x = "hello", y = "world";
         ^^^^^^^^^    ^^^^^^^^^

Is getOriginalSourceFragment given by Spoon really unstable?

Otherwise, maybe CtVariables could get special treatment in BestFitScanner
and instead of doing elemSourceStart = elem.getSourceStart() do something like
elemSourceStart = elem.getSourceStart() + elem.getOriginalSourceFragment().indexOf(elem.getSimpleName()) in the hope of a scenario like this:

String x = "hello", y = "world";
         ^^^^^^^^^    ^^^^^^^^^  Sonar
       ^^^^^^^^^^^^^^^^^^^^^^^^^ CtVariable x
                    ^^^^^^^^^^^^ CtVariable y

This is assuming that getOriginalSourceFragment and getSimpleName actually work the way I imagine.

@lyxell
Copy link

lyxell commented May 8, 2021

Another heuristic that might work is to not use the intersection algorithm for CtVariables and instead find the matching node by searching for the different return values of getSimpleName from the left end of the fragment returned by Sonar:

Candidates: {x, y}

String x = "hello", y = "world";
         ^^^^^^^^^ Sonar

String x = "hello", y = "world";
       ^^^^^^^^^^^ Search

Result: {x}

There's probably a plethora of corner cases though and I don't know if it would be possible to differentiate between this case (finding nodes in variable declarations) and the case where the violation is a normal assignment.

@slarse
Copy link
Collaborator Author

slarse commented May 10, 2021

Is getOriginalSourceFragment given by Spoon really unstable?

I'd say no, I've used it for years.

maybe CtVariables could get special treatment in BestFitScanner

I think special treatment for CtVariable is the easiest way to do this. I'm thinking your second solution would work pretty well. Whenever we find a joint declaration in a CtVariable that appears on the same line as a rule violation (that targets CtVariable or any supertype of it), we compute the hamming distance between the source fragment from sonar and the name of the variables, and pick the one with the greatest match. WDYT?

@slarse slarse changed the title DeadStoreProcessor (rule 1854) doesn't handle joint declarations well BestFitScanner doesn't handle joint declarations well (affects DeadStoreProcessor (rule 1854)) May 10, 2021
@lyxell
Copy link

lyxell commented May 10, 2021

Yeah, I think the Hamming distance would work, however, I think the Sonar fragment does not actually include the name, so the fragment would need to be expanded first, in which case some kind of string search would be needed anyway:

String x = "hello", y = "world";
         ^^^^^^^^^    ^^^^^^^^^

Also, I think the detection algorithm would need some careful planning

Whenever we find a joint declaration in a CtVariable that appears on the same line as a rule violation

There's probably a lot of annoying and weird code in the wild that could be troublesome.
Some things that immediately come to mind:

// (pathological) declaration and assignment on the same line
String x = 1, y = 2; z = 20;
// for loop with multiple declarations
for (int x = 0, y = 10; x < 10; y = y + 1)
// declaration split over multiple lines
String CONSTANT_A = "constant a",
    CONSTANT_B = "constant b",
    CONSTANT_C = "constant c";

@slarse
Copy link
Collaborator Author

slarse commented May 10, 2021

Yeah, I think the Hamming distance would work, however, I think the Sonar fragment does not actually include the name, so the fragment would need to be expanded first, in which case some kind of string search would be needed anyway:

String x = "hello", y = "world";
         ^^^^^^^^^    ^^^^^^^^^

Right, that's a good point. The way you suggest to extract the names should work fine. We can use getOriginalSourceFragment() on the CtCompilationUnit related to the current file to get all of the text (the CU contains the root fragment), or simply read the file from disk separately and cache that.

There's probably a lot of annoying and weird code in the wild that could be troublesome.

We can just run the intersection algorithm first to find matching candidates whose source positions overlap with the rule violation (recall that each variable in a joint declaration gets the source position of the entire declaration in Spoon), and then further filter those with the hamming distance approach. That should work in theory, I think, and then we'll just work out the practical problems by adding more and more annoying test cases.

@lyxell
Copy link

lyxell commented May 10, 2021

Right, that's a good point. The way you suggest to extract the names should work fine. We can use getOriginalSourceFragment() on the CtCompilationUnit related to the current file to get all of the text (the CU contains the root fragment), or simply read the file from disk separately and cache that.

Yeah, the string search should work I think, searching backwards from the fragment one would need to make sure to read until the whitespace is hit to disambiguate between two identifiers where one is the suffix of the other, say xyz and yz.

We can just run the intersection algorithm first to find matching candidates whose source positions overlap with the rule violation (recall that each variable in a joint declaration gets the source position of the entire declaration in Spoon), and then further filter those with the hamming distance approach. That should work in theory, I think, and then we'll just work out the practical problems by adding more and more annoying test cases.

Yeah, this sounds like it should work.

@slarse
Copy link
Collaborator Author

slarse commented May 10, 2021

Great, so we have a plan! Want me to go ahead and draft an implementation of this? Or do you want to give it a go?

@lyxell
Copy link

lyxell commented May 10, 2021

Feel free to go ahead, I'll try to follow the progress.

@slarse
Copy link
Collaborator Author

slarse commented May 10, 2021

Okay, I'll likely get started tomorrow morning. I'll ping you in the PR once I've got a bare-minimum draft ready.

@slarse
Copy link
Collaborator Author

slarse commented May 11, 2021

I realized that there's one very annoying corner case where a variable isn't initialized. Something like this:

int x = 2, y, z = 3;

While the y obviously can't be a dead store, it does mean that the proposed backward-search isn't a one-size-fits-all solution for joint declarations (there could be other rule violations that don't require an initializer).

@slarse
Copy link
Collaborator Author

slarse commented May 11, 2021

Dead stores aren't detected in loop headers, but unused local variables are (rule S1481). We'd need to add a processor for that rule to verify that the approach works for variables in loop headers.

@lyxell
Copy link

lyxell commented May 11, 2021

Dead stores aren't detected in loop headers

Does this refer to the detection conducted by Sonar?

We'd need to add a processor for that rule to verify that the approach works for variables in loop headers.

Yeah, I guess a processor for rule S1481 should be added anyway since this case:

public class Unused {
    public void unused() {
        String x;
        /* ... code that does not use x */
    }
}

does only trigger S1481 and not S1854.

@slarse
Copy link
Collaborator Author

slarse commented May 11, 2021

Does this refer to the detection conducted by Sonar?

Yes.

Yeah, I guess a processor for rule S1481 should be added anyway since this case:

Yup, and it's also a trivial processor to implement. Excluding boilerplate, it should just be this:

protected void repairInternal(CtLocalVariable<?> localVar) {
    localVar.delete();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging a pull request may close this issue.

2 participants