Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jun 11, 2020

What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (SqlBase.g4) directly.

This approach is based on the @cloud-fan suggestion: #28779 (comment)

Why are the changes needed?

It is hard to maintain a full set of the keywords in TableIdentifierParserSuite, so it would be nice if we could extract them from the SqlBase.g4 file directly.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests.

// The non-reserved keywords are listed in `nonReserved`.
// These 2 together contain all the keywords.
strictNonReserved
//--ANSI-STRICT-NON-RESERVED-START
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ansi, we should just call it STRICT-NON-RESERVED

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I missed that. I'll update soon.

val nonReservedKeywordsInAnsiMode = parseSyntax(
"//--ANSI-NON-RESERVED-START", "//--ANSI-NON-RESERVED-END")

val reservedKeywordsInAnsiMode = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansi reserved keywords are "full keywords -- non-reserved keywords".

Copy link
Member

@viirya viirya Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"non-reserved keywords" means non-reserved keywords in ansi + strict non-reserved keywords?

Does previous allCandidateKeywords include strict non-reserved keywords?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, fixed.

@SparkQA
Copy link

SparkQA commented Jun 11, 2020

Test build #123860 has finished for PR 28802 at commit fdd1bcb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

private val sqlSyntaxDefs = {
val sqlBasePath = {
val sparkHome = {
assert(sys.props.contains("spark.test.home") ||
Copy link
Contributor

@dilipbiswal dilipbiswal Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Minor Nit. I was trying to run this and i didn't have the environment variable set. The error message ends up printing the full env info making it difficult to find what the error is ? Should we use something like:
if (condition) {
fail(...)
}
or may be there is a better way :-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check, @dilipbiswal . SQLQueryTestSuite has the same logic, so I personally think its better to fix them at the same time. https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L132
Could you make a PR after this merged?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Sure.. will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

if (line.trim.startsWith(startTag)) {
start = true
} else if (line.trim.startsWith(endTag)) {
start = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Just return once reaching endTag?

"//--ANSI-NON-RESERVED-START", "//--ANSI-NON-RESERVED-END")

val nonReservedKeywordsInAnsiMode = allCandidateKeywords -- reservedKeywordsInAnsiMode
val reservedKeywordsInAnsiMode = allCandidateKeywords -- nonReservedKeywordsInAnsiMode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the tags, will the related tests fail? I have this question because these tests seems do foreach on these lists. If we remove the tags and get an empty list, seems the tests won't fail too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice suggestion. In that case, the tests should fail. I'll check and update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check the latest commit? Is that okay to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thanks. Looks good.

}

val nonReservedKeywordsInAnsiMode = allCandidateKeywords -- reservedKeywordsInAnsiMode
private def parseSyntax(startTag: String, endTag: String): Set[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's only used once, can we inline it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. okay.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123871 has finished for PR 28802 at commit d748882.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123874 has finished for PR 28802 at commit 349b919.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123891 has finished for PR 28802 at commit 09a2ed4.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123892 has finished for PR 28802 at commit 80b1c3a.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123876 has finished for PR 28802 at commit 3c1ce72.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 12, 2020

Test build #123905 has finished for PR 28802 at commit bd2c6bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu closed this in 78d08a8 Jun 12, 2020
@maropu
Copy link
Member Author

maropu commented Jun 12, 2020

Thanks! Merged to master.

maropu added a commit that referenced this pull request Jun 15, 2020
### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: #28779 (comment)

### Why are the changes needed?

It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes #28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
holdenk pushed a commit to holdenk/spark that referenced this pull request Jun 25, 2020
### What changes were proposed in this pull request?

This PR intends to extract SQL reserved/non-reserved keywords from the ANTLR grammar file (`SqlBase.g4`) directly.

This approach is based on the cloud-fan suggestion: apache#28779 (comment)

### Why are the changes needed?

It is hard to maintain a full set of the keywords in `TableIdentifierParserSuite`, so it would be nice if we could extract them from the `SqlBase.g4` file directly.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

Closes apache#28802 from maropu/SPARK-31950-2.

Authored-by: Takeshi Yamamuro <[email protected]>
Signed-off-by: Takeshi Yamamuro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants