-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-39963][SQL] Simplify SimplifyCasts.isWiderCast
#37389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This refactor changes the behavior when from is non-numeric and from == to. Previously returns true, now false. Is this intentional?
To preserve behavior, and be more Scala-idiomatic by using pattern matching instead of isInstanceOf, I would suggest:
private def isWiderCast(from: DataType, to: NumericType): Boolean = from match {
case NumericType => Cast.canUpCast(from, to)
case _ => from == to
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, when from is non-numeric, from == to must be false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an existing test coverage for the case discussed in this thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, the existing test cases does't cover the case discussed above.
So, let's add one.
SimplifyCastsSimplifyCasts.isWiderCast
|
Thank you for pinging me, @beliefer . Could you rebase to the |
Thank you for the reminder. |
|
thanks, merging to master! |
|
@cloud-fan @dongjoon-hyun @viirya @gengliangwang Thank you ! |
…sting date to decimal ### What changes were proposed in this pull request? This PR is a followup of #37389 which disables ANSI mode when testing a case from date to decimal. ### Why are the changes needed? To make the test pass. Currently it fails with ANSI mode on, see also https://github.com/apache/spark/runs/7701218236?check_suite_focus=true. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I manually ran the test in my local. Closes #37426 from HyukjinKwon/SPARK-39963. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT 1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)
== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```
In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
val hasNext = index + 1 < line.length
if (insideSingleQuote || insideDoubleQuote) {
// Ignores '/' in any case of quotes
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
```
If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1.
```
if (leavingBracketedComment) {
bracketedCommentLevel -= 1
leavingBracketedComment = false
}
```
But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case
```
/* aaa /* bbb */ ; ccc */ select 1;
```
when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this query is split as
```
/* aaa /* bbb */; => comment
ccc */ select 1; => query
```
Then query failed.
So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since
```
def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.
### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.
### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql
### How was this patch tested?
Since spark-sql console mode have special logic about handle `;`
```
while (line != null) {
if (!line.startsWith("--")) {
if (prefix.nonEmpty) {
prefix += '\n'
}
if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
line = prefix + line
ret = cli.processLine(line, true)
prefix = ""
currentPrompt = promptWithCurrentDB
} else {
prefix = prefix + line
currentPrompt = continuedPromptWithDBSpaces
}
}
line = reader.readLine(currentPrompt + "> ")
}
```
If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT 1
;
```
the `\\;` is escaped.
Manuel test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select 1
1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```
In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
/* select /* braoad */ ;
select 1;
```
It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select 1
;
```
The next part
```
/* select /* braoad */ ;
select 1;
```
are still treated as inprogress SQL.
Closes apache#34721 from AngersZhuuuu/SPARK-37471.
Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
SimplifyCastsis a optimize rule used to removesCastthat are unnecessary because the input is already the correct type.Currently, the implementation of
SimplifyCastsseems a little redundant.Cast.canUpCastcan completely cover the function ofisWiderCast.On the other hand,
SimplifyCastsSuitedoes't cover the case whenfromis non-numeric andtois numeric. This PR add a new test case for this case.Why are the changes needed?
Simplify the implementation of
SimplifyCasts.Does this PR introduce any user-facing change?
'No'.
Just update the inner implementation.
How was this patch tested?
N/A