[SPARK-37471][SQL] spark-sql support `;` in nested bracketed comment #34721

AngersZhuuuu · 2021-11-26T08:40:23Z

What changes were proposed in this pull request?

In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as

/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;

When run spark-sql -f with --verbose got below error

park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^

In current code

else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }

If it meet an */, it will mark leavingBracketedComment as true, then when call next char, bracketed comment level -1.

      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

But when meet /*, it need !insideBracketedComment, then means if we have a case

/* aaa /* bbb */  ; ccc */ select 1;

when meet second /* , insideBracketedComment is true, so this /* won't be treat as a start of bracket comment.
Then meet the first */, bracketed comment end, this query is split as

/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query

Then query failed.

So here we remove the condition of !insideBracketedComment, then we can have bracketedCommentLevel > 1 and since

 def insideBracketedComment: Boolean = bracketedCommentLevel > 0

So chars between all level of bracket are treated as comment.

Why are the changes needed?

In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

Does this PR introduce any user-facing change?

User can use nested bracketed comment in spark-sql

How was this patch tested?

Since spark-sql console mode have special logic about handle ;

    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }

If we write sql as below

/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;

the \\; is escaped.

Manuel test wit spark-sql -f

(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:



Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$

In current PR, un completed bracket comment won't execute now, for SQL file

/* select /* BROADCAST(b) */ 4;
*/
select  1
;


/* select /* braoad */ ;
select 1;

It only execute

/* select /* BROADCAST(b) */ 4;
*/
select  1
;

The next part

/* select /* braoad */ ;
select 1;

are still treated as inprogress SQL.

SparkQA · 2021-11-26T09:41:24Z

Test build #145662 has finished for PR 34721 at commit 9e35d96.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-26T09:53:11Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50132/

SparkQA · 2021-11-26T10:50:44Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50132/

AngersZhuuuu · 2021-11-29T02:33:24Z

ping @cloud-fan @wangyum

SparkQA · 2021-11-29T12:07:36Z

Test build #145716 has finished for PR 34721 at commit 668aa45.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2021-11-29T13:19:04Z

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50185/

SparkQA · 2021-11-29T14:26:57Z

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50185/

AngersZhuuuu · 2021-12-03T14:30:56Z

ping @cloud-fan

cloud-fan · 2021-12-03T16:19:47Z

thanks, merging to master/3.2!

### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes #34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6e19125) Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes apache#34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6e19125) Signed-off-by: Wenchen Fan <[email protected]>

[SPARK-37471][SQL] spark-sql support nested bracketed comment

9e35d96

github-actions bot added the SQL label Nov 26, 2021

AngersZhuuuu changed the title ~~[SPARK-37471][SQL] spark-sql support nested bracketed comment~~ [SPARK-37471][SQL] spark-sql support ; in nested bracketed comment Nov 29, 2021

Added UT

668aa45

cloud-fan approved these changes Nov 29, 2021

View reviewed changes

cloud-fan approved these changes Dec 3, 2021

View reviewed changes

cloud-fan closed this in 6e19125 Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-37471][SQL] spark-sql support `;` in nested bracketed comment #34721

[SPARK-37471][SQL] spark-sql support `;` in nested bracketed comment #34721

Uh oh!

AngersZhuuuu commented Nov 26, 2021 •

edited

Loading

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

AngersZhuuuu commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

AngersZhuuuu commented Dec 3, 2021

Uh oh!

cloud-fan commented Dec 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-37471][SQL] spark-sql support ; in nested bracketed comment #34721

[SPARK-37471][SQL] spark-sql support ; in nested bracketed comment #34721

Uh oh!

Conversation

AngersZhuuuu commented Nov 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

SparkQA commented Nov 26, 2021

Uh oh!

AngersZhuuuu commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

SparkQA commented Nov 29, 2021

Uh oh!

AngersZhuuuu commented Dec 3, 2021

Uh oh!

cloud-fan commented Dec 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-37471][SQL] spark-sql support `;` in nested bracketed comment #34721

[SPARK-37471][SQL] spark-sql support `;` in nested bracketed comment #34721

AngersZhuuuu commented Nov 26, 2021 •

edited

Loading