Skip to content

Conversation

@AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Nov 26, 2021

What changes were proposed in this pull request?

In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as

/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;

When run spark-sql -f with --verbose got below error

park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^

In current code

else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }

If it meet an */, it will mark leavingBracketedComment as true, then when call next char, bracketed comment level -1.

      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

But when meet /*, it need !insideBracketedComment, then means if we have a case

/* aaa /* bbb */  ; ccc */ select 1;

when meet second /* , insideBracketedComment is true, so this /* won't be treat as a start of bracket comment.
Then meet the first */, bracketed comment end, this query is split as

/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query

Then query failed.

So here we remove the condition of !insideBracketedComment, then we can have bracketedCommentLevel > 1 and since

 def insideBracketedComment: Boolean = bracketedCommentLevel > 0

So chars between all level of bracket are treated as comment.

Why are the changes needed?

In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

Does this PR introduce any user-facing change?

User can use nested bracketed comment in spark-sql

How was this patch tested?

Since spark-sql console mode have special logic about handle ;

    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }

If we write sql as below

/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;

the \\; is escaped.

Manuel test wit spark-sql -f

(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:



Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore [email protected]
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$

In current PR, un completed bracket comment won't execute now, for SQL file

/* select /* BROADCAST(b) */ 4;
*/
select  1
;


/* select /* braoad */ ;
select 1;

It only execute

/* select /* BROADCAST(b) */ 4;
*/
select  1
;

The next part

/* select /* braoad */ ;
select 1;

are still treated as inprogress SQL.

@github-actions github-actions bot added the SQL label Nov 26, 2021
@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Test build #145662 has finished for PR 34721 at commit 9e35d96.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50132/

@SparkQA
Copy link

SparkQA commented Nov 26, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50132/

@AngersZhuuuu
Copy link
Contributor Author

ping @cloud-fan @wangyum

@AngersZhuuuu AngersZhuuuu changed the title [SPARK-37471][SQL] spark-sql support nested bracketed comment [SPARK-37471][SQL] spark-sql support ; in nested bracketed comment Nov 29, 2021
@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Test build #145716 has finished for PR 34721 at commit 668aa45.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50185/

@SparkQA
Copy link

SparkQA commented Nov 29, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50185/

@AngersZhuuuu
Copy link
Contributor Author

ping @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.2!

@cloud-fan cloud-fan closed this in 6e19125 Dec 3, 2021
cloud-fan pushed a commit that referenced this pull request Dec 3, 2021
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes #34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants