Skip to content

Conversation

@beliefer
Copy link
Contributor

@beliefer beliefer commented Aug 3, 2022

What changes were proposed in this pull request?

SimplifyCasts is a optimize rule used to removes Cast that are unnecessary because the input is already the correct type.
Currently, the implementation of SimplifyCasts seems a little redundant. Cast.canUpCast can completely cover the function of isWiderCast.

On the other hand, SimplifyCastsSuite does't cover the case when from is non-numeric and to is numeric. This PR add a new test case for this case.

Why are the changes needed?

Simplify the implementation of SimplifyCasts.

Does this PR introduce any user-facing change?

'No'.
Just update the inner implementation.

How was this patch tested?

N/A

@github-actions github-actions bot added the SQL label Aug 3, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Aug 3, 2022

ping @gengliangwang @MaxGekk @dongjoon-hyun cc @cloud-fan @viirya

ulysses-you pushed a commit to ulysses-you/spark that referenced this pull request Aug 3, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
Copy link
Contributor

@xkrogen xkrogen Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor changes the behavior when from is non-numeric and from == to. Previously returns true, now false. Is this intentional?

To preserve behavior, and be more Scala-idiomatic by using pattern matching instead of isInstanceOf, I would suggest:

  private def isWiderCast(from: DataType, to: NumericType): Boolean = from match {
    case NumericType => Cast.canUpCast(from, to)
    case _ => from == to
  }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, when from is non-numeric, from == to must be false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing test coverage for the case discussed in this thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the existing test cases does't cover the case discussed above.
So, let's add one.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-39963][SQL] Simplify the implementation of SimplifyCasts [SPARK-39963][SQL] Simplify SimplifyCasts.isWiderCast Aug 4, 2022
@dongjoon-hyun
Copy link
Member

Thank you for pinging me, @beliefer . Could you rebase to the master branch once more? SparkR failure is fixed.

@beliefer
Copy link
Contributor Author

beliefer commented Aug 4, 2022

Thank you for pinging me, @beliefer . Could you rebase to the master branch once more? SparkR failure is fixed.

Thank you for the reminder.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 4a7e92d Aug 5, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Aug 6, 2022

@cloud-fan @dongjoon-hyun @viirya @gengliangwang Thank you !
@xkrogen Thank you for you comments.

gengliangwang pushed a commit that referenced this pull request Aug 8, 2022
…sting date to decimal

### What changes were proposed in this pull request?

This PR is a followup of #37389 which disables ANSI mode when testing a case from date to decimal.

### Why are the changes needed?

To make the test pass. Currently it fails with ANSI mode on, see also https://github.com/apache/spark/runs/7701218236?check_suite_focus=true.

### Does this PR introduce _any_ user-facing change?

No, test-only.

### How was this patch tested?

I manually ran the test in my local.

Closes #37426 from HyukjinKwon/SPARK-39963.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Gengliang Wang <[email protected]>
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 6e19125)
Signed-off-by: Wenchen Fan <[email protected]>
asiunov pushed a commit to ascend-io/spark that referenced this pull request Aug 25, 2022
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark apache#37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes apache#34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants