[SPARK-39963][SQL] Simplify `SimplifyCasts.isWiderCast` #37389

beliefer · 2022-08-03T03:52:36Z

What changes were proposed in this pull request?

SimplifyCasts is a optimize rule used to removes Cast that are unnecessary because the input is already the correct type.
Currently, the implementation of SimplifyCasts seems a little redundant. Cast.canUpCast can completely cover the function of isWiderCast.

On the other hand, SimplifyCastsSuite does't cover the case when from is non-numeric and to is numeric. This PR add a new test case for this case.

Why are the changes needed?

Simplify the implementation of SimplifyCasts.

Does this PR introduce any user-facing change?

'No'.
Just update the inner implementation.

How was this patch tested?

N/A

beliefer · 2022-08-03T06:46:50Z

ping @gengliangwang @MaxGekk @dongjoon-hyun cc @cloud-fan @viirya

### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes apache#34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6e19125) Signed-off-by: Wenchen Fan <[email protected]>

xkrogen · 2022-08-03T16:08:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

This refactor changes the behavior when from is non-numeric and from == to. Previously returns true, now false. Is this intentional?

To preserve behavior, and be more Scala-idiomatic by using pattern matching instead of isInstanceOf, I would suggest:

private def isWiderCast(from: DataType, to: NumericType): Boolean = from match { case NumericType => Cast.canUpCast(from, to) case _ => from == to }

Previously, when from is non-numeric, from == to must be false.

Do we have an existing test coverage for the case discussed in this thread?

Previously, the existing test cases does't cover the case discussed above.
So, let's add one.

dongjoon-hyun · 2022-08-04T00:23:08Z

Thank you for pinging me, @beliefer . Could you rebase to the master branch once more? SparkR failure is fixed.

beliefer · 2022-08-04T00:59:18Z

Thank you for pinging me, @beliefer . Could you rebase to the master branch once more? SparkR failure is fixed.

Thank you for the reminder.

cloud-fan · 2022-08-05T10:37:17Z

thanks, merging to master!

beliefer · 2022-08-06T02:49:09Z

@cloud-fan @dongjoon-hyun @viirya @gengliangwang Thank you !
@xkrogen Thank you for you comments.

…sting date to decimal ### What changes were proposed in this pull request? This PR is a followup of #37389 which disables ANSI mode when testing a case from date to decimal. ### Why are the changes needed? To make the test pass. Currently it fails with ANSI mode on, see also https://github.com/apache/spark/runs/7701218236?check_suite_focus=true. ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? I manually ran the test in my local. Closes #37426 from HyukjinKwon/SPARK-39963. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Gengliang Wang <[email protected]>

### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes apache#34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 6e19125) Signed-off-by: Wenchen Fan <[email protected]>

### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes apache#34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

github-actions bot added the SQL label Aug 3, 2022

xkrogen reviewed Aug 3, 2022

View reviewed changes

dongjoon-hyun changed the title ~~[SPARK-39963][SQL] Simplify the implementation of SimplifyCasts~~ [SPARK-39963][SQL] Simplify SimplifyCasts.isWiderCast Aug 4, 2022

beliefer added 2 commits August 4, 2022 08:37

[SPARK-39963][SQL] Simplify the implementation of SimplifyCasts

a49fe34

Update code

99bf971

beliefer force-pushed the SPARK-39963 branch from d54b182 to 99bf971 Compare August 4, 2022 00:53

cloud-fan approved these changes Aug 4, 2022

View reviewed changes

dongjoon-hyun approved these changes Aug 4, 2022

View reviewed changes

viirya approved these changes Aug 4, 2022

View reviewed changes

gengliangwang approved these changes Aug 4, 2022

View reviewed changes

cloud-fan closed this in 4a7e92d Aug 5, 2022

HyukjinKwon mentioned this pull request Aug 6, 2022

[SPARK-39963][SQL][TESTS][FOLLOW-UP] Disable ANSI mode for test of casting date to decimal #37426

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-39963][SQL] Simplify `SimplifyCasts.isWiderCast` #37389

[SPARK-39963][SQL] Simplify `SimplifyCasts.isWiderCast` #37389

Uh oh!

beliefer commented Aug 3, 2022 •

edited

Loading

Uh oh!

beliefer commented Aug 3, 2022

Uh oh!

xkrogen Aug 3, 2022 •

edited

Loading

Uh oh!

beliefer Aug 4, 2022

Uh oh!

dongjoon-hyun Aug 4, 2022

Uh oh!

beliefer Aug 4, 2022

Uh oh!

dongjoon-hyun commented Aug 4, 2022

Uh oh!

beliefer commented Aug 4, 2022

Uh oh!

cloud-fan commented Aug 5, 2022

Uh oh!

beliefer commented Aug 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-39963][SQL] Simplify SimplifyCasts.isWiderCast #37389

[SPARK-39963][SQL] Simplify SimplifyCasts.isWiderCast #37389

Uh oh!

Conversation

beliefer commented Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

beliefer commented Aug 3, 2022

Uh oh!

xkrogen Aug 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

beliefer Aug 4, 2022

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Aug 4, 2022

Uh oh!

beliefer commented Aug 4, 2022

Uh oh!

cloud-fan commented Aug 5, 2022

Uh oh!

beliefer commented Aug 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[SPARK-39963][SQL] Simplify `SimplifyCasts.isWiderCast` #37389

[SPARK-39963][SQL] Simplify `SimplifyCasts.isWiderCast` #37389

beliefer commented Aug 3, 2022 •

edited

Loading

xkrogen Aug 3, 2022 •

edited

Loading