-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37471][SQL] spark-sql support ; in nested bracketed comment
#34721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #145662 has finished for PR 34721 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
ping @cloud-fan @wangyum |
; in nested bracketed comment
|
Test build #145716 has finished for PR 34721 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
ping @cloud-fan |
|
thanks, merging to master/3.2! |
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes #34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
When run
spark-sql -fwith--verbosegot below errorIn current code
If it meet an
*/, it will markleavingBracketedCommentas true, then when call next char, bracketed comment level -1.But when meet
/*, it need!insideBracketedComment, then means if we have a casewhen meet second
/*,insideBracketedCommentis true, so this/*won't be treat as a start of bracket comment.Then meet the first
*/, bracketed comment end, this query is split asThen query failed.
So here we remove the condition of
!insideBracketedComment, then we can havebracketedCommentLevel > 1and sinceSo chars between all level of bracket are treated as comment.
Why are the changes needed?
In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
Does this PR introduce any user-facing change?
User can use nested bracketed comment in spark-sql
How was this patch tested?
Since spark-sql console mode have special logic about handle
;If we write sql as below
the
\\;is escaped.Manuel test wit spark-sql -f
In current PR, un completed bracket comment won't execute now, for SQL file
It only execute
The next part
are still treated as inprogress SQL.