-
Notifications
You must be signed in to change notification settings - Fork 319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ANTLR grammar based C++ language module #937
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Die vielen else if
etwa in enterJumpStatement
könnte man durch ein eleganteres Konstrukt ersetzen. Mein Vorschlag wäre, dass man in den enter/exit Methoden wenn es geht nur noch Maps erstellt und die eigentliche Logik in eine Hilfsmethode auslagert. Sähe dann so aus:
public void enterJumpStatement(CPP14Parser.JumpStatementContext context) {
processContextEnterOrExit(context, Map.of(
context.Break(), CPPTokenType.BREAK,
context.Continue(), CPPTokenType.CONTINUE,
context.Goto(), CPPTokenType.GOTO,
context.Return(), CPPTokenType.RETURN
));
}
private void processContextEnterOrExit(ParserRuleContext context, Map<TerminalNode, CPPTokenType> tokenTypeMap) {
for (Map.Entry<TerminalNode, CPPTokenType> tokenTypeEntry: tokenTypeMap.entrySet()) {
if (tokenTypeEntry.getKey() != null) {
addEnter(tokenTypeEntry.getValue(), context.getStart());
return;
}
}
}
Methoden wie enterSelectionStatement
könnte man so wohl auch entschachteln.
Sonst denke ich es würde zur Leserlichkeit beitragen wenn man die Contexts einzeln importieren würde, sodass man im Code statt etwa CPP14Parser.FunctionDefinitionContext
nur noch FunctionDefinitionContext
stehen hat. Dann wäre es auch konsistent mit dem TokenGeneratingTreeScanner
im Java Sprachmodul, da wird es auch so gemacht.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the default minimum token match, I would say leave it as is, we may adapt that in the future. I will have a look at the if-else problem and see if I can find a solution.
languages/cpp2/src/main/java/de/jplag/cpp2/CPPTokenListener.java
Outdated
Show resolved
Hide resolved
@Override | ||
public void enterSelectionStatement(CPP14Parser.SelectionStatementContext context) { | ||
if (context.Switch() != null) { | ||
addEnter(CPPTokenType.SWITCH_BEGIN, context.getStart()); | ||
} else if (context.If() != null) { | ||
addEnter(CPPTokenType.IF_BEGIN, context.getStart()); | ||
if (context.Else() != null) { | ||
addEnter(CPPTokenType.ELSE, context.Else().getSymbol()); | ||
} | ||
} | ||
} | ||
|
||
@Override | ||
public void exitSelectionStatement(CPP14Parser.SelectionStatementContext context) { | ||
if (context.Switch() != null) { | ||
addEnter(CPPTokenType.SWITCH_END, context.getStop()); | ||
} else if (context.If() != null) { | ||
addEnter(CPPTokenType.IF_END, context.getStop()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grammar is the problem that leads to the weird token order. In the grammar, an if-else combination is represented by a single selection statement, IF statement1 ELSE statement2
. Thus, something like this:
if(...)
a
else if (...)
b
else
c
is depicted as two nested selection statements:
IF a ELSE (IF b ELSE c)
Now, this is where things get interresting. The visitor order is not how we read this statement. The visit order is like this:
- enter outer selection statement
- enter inner statement
- exit inner selection statement
- exit outer selection statement
However, as the else is part of the selection statement, it will be either visited on entering the statement (with the if keyword and thus before the if body) or when exiting the selection statement (after the else body).
The trick to solving this is utilizing the enter/exitBlockDeclaration method and tracking the last relevant keyword. This is done in some of the newer language modules. This solves not only the token order issue but also the issue that your if-end tokens are currently not placed correctly! They are all at the end after both selection statements. They are not closing off their if blocks.
Rough solution (without the required imports or changes to token types), fixing token order, if-end placement and missing else_end:
Click to show...
@Override | |
public void enterSelectionStatement(CPP14Parser.SelectionStatementContext context) { | |
if (context.Switch() != null) { | |
addEnter(CPPTokenType.SWITCH_BEGIN, context.getStart()); | |
} else if (context.If() != null) { | |
addEnter(CPPTokenType.IF_BEGIN, context.getStart()); | |
if (context.Else() != null) { | |
addEnter(CPPTokenType.ELSE, context.Else().getSymbol()); | |
} | |
} | |
} | |
@Override | |
public void exitSelectionStatement(CPP14Parser.SelectionStatementContext context) { | |
if (context.Switch() != null) { | |
addEnter(CPPTokenType.SWITCH_END, context.getStop()); | |
} else if (context.If() != null) { | |
addEnter(CPPTokenType.IF_END, context.getStop()); | |
} | |
} | |
@Override | |
public void enterSelectionStatement(CPP14Parser.SelectionStatementContext context) { | |
if (context.Switch() != null) { | |
addEnter(CPPTokenType.SWITCH_BEGIN, context.getStart()); | |
trackedState.add(CPPTokenType.SWITCH_END); | |
} else if (context.If() != null) { | |
addEnter(CPPTokenType.IF_BEGIN, context.getStart()); | |
trackedState.add(CPPTokenType.IF_END); | |
if (context.Else() != null) { | |
trackedState.add(CPPTokenType.ELSE); | |
trackedState.add(CPPTokenType.ELSE_END); | |
lastElseToken = context.Else().getSymbol(); | |
} | |
} | |
} | |
static Deque<CPPTokenType> trackedState = new ArrayDeque<>(); | |
static Token lastElseToken = null; | |
@Override | |
public void enterStatement(StatementContext context) { | |
if (context.getParent().getRuleIndex() == CPP14Parser.RULE_selectionStatement) { | |
if (trackedState.peekFirst() == CPPTokenType.ELSE) { | |
addEnter(trackedState.removeFirst(), lastElseToken); | |
} | |
} | |
} | |
@Override | |
public void exitStatement(StatementContext context) { | |
if (context.getParent().getRuleIndex() == CPP14Parser.RULE_selectionStatement) { | |
if (trackedState.peekFirst() == CPPTokenType.ELSE_END) { | |
addExit(trackedState.removeFirst(), context.getStop()); | |
} | |
if (trackedState.peekFirst() == CPPTokenType.IF_END) { | |
addExit(trackedState.removeFirst(), context.getStop()); | |
} | |
} | |
} |
This is how the tokens are then extracted:
1 void a(int a, int b, int x, int y) {
|FUNCTION{
|VARDEF|VARDEF|VARDEF|VARDEF
2 if (a < b) {
|IF{
3 {x = 5;}
|ASSIGN
4 } else if (a > b) {
|}IF
|ELSE{
|IF{
5 y = 10;
|ASSIGN
6 x = y + b;
|ASSIGN
7 } else {
|}ELSE
|}IF
|ELSE{
8 y = -20;
|ASSIGN
9 }
|}ELSE
10 }
|}FUNCTION
|EOF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I hoped to avoid additional state.
This solves not only the token order issue but also the issue that your if-end tokens are currently not placed correctly!
I assume you're referring to the fact that multiple }if
tokens are always placed at the very end? This is also the case in the Java language module currently.
In your code snippet, you're extracting a token for the end of an else
, which currently isn't extracted in the Java module, but I assume it makes sense to change that too if the }if
token is changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, even more issues in the Java module unearthed. Okay, then let me investigate first if the if-end and the else-end have an impact on the detection quality. If not, we do not need to fix it. I just assumed it would be correct in the Java module. I will have a look at it. Interesting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SirYwell you are right, java does not place the if tokens correctly either:
7 if (args == null) {
|IF{
9 } else if (args.length > 1) {
|ELSE
|IF{
11 } else {
|ELSE
13 }
|}if
|}if
As a result, leave it as the Java module does it now, I can always change it in the future. However, please ensure the else tokens are correctly placed. If the else token comes before the tokens of the if block, then we cannot differ between code in the if block and code in the else block.
EDIT: Also, sonar found one code smell. Just rename the local variable tokens
to tokenStream
.
Der konkrete Vorschlag würde so nicht funktionieren, weil die Map keine null Keys erlaubt (und auch sonst maximal ein mal null vorkommen dürfte), aber man kann key und value natürlich tauschen. Mein Vorschlag wäre stattdessen jedoch folgender: record Extraction<T>(Predicate<T> nonNullTest, TokenType toExtract) {
static <T> Extraction<T> of(Function<T, ?> contextToAnything, TokenType toExtract) {
return new Extraction<>(t -> Objects.nonNull(contextToAnything.apply(t)), toExtract);
}
}
private static final List<Extraction<JumpStatementContext>> JUMP_STATEMENT_TOKENS = List.of(
Extraction.of(JumpStatementContext::Break, BREAK),
Extraction.of(JumpStatementContext::Continue, CONTINUE),
Extraction.of(JumpStatementContext::Goto, GOTO),
Extraction.of(JumpStatementContext::Return, RETURN)
);
@Override
public void enterJumpStatement(JumpStatementContext context) {
extractFirstNonNull(context, context.getStart(), JUMP_STATEMENT_TOKENS);
}
private <T> void extractFirstNonNull(T context, Token token, List<Extraction<T>> extractions) {
for (Extraction<? super T> extraction : extractions) {
if (extraction.nonNullTest().test(context)) {
addEnter(extraction.toExtract(), token);
return;
}
}
} Dadurch muss man nicht jedes Mal eine Collection konstruieren. Was hältst du davon @mbrdl?
Das habe ich mal gemacht. |
I updated the PR and included the requested changes. The if/else extraction should now behave like the Java language module. |
@SirYwell can you fix the two sonar issues, then I can merge. |
Kudos, SonarCloud Quality Gate passed! |
@tsaglam Sonarcloud seems to be happy now :) |
This Pull Request adds an alternative C++ language module based on the ANTLR grammar from https://github.com/antlr/grammars-v4/tree/master/cpp.
To avoid naming clashes, this language module and its base package are currently called
cpp2
.Notes:
minimumTokenMatch
is 12, I'm not sure if that should be changed.ELSE
token is currently extracted before the body of theif
is visited - I added a disabled test that shows this issue. If you know a good solution for that, please let me know.