[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string #28393

adrian-wang · 2020-04-28T11:07:55Z

What changes were proposed in this pull request?

def splitSemiColon cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, splitSemiColon will not drop off semicolon as expected.

Why are the changes needed?

Some regex expression will use quote mark in string. We should process semicolon correctly.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Unit test and also manual test.

… string

SparkQA · 2020-04-28T11:48:08Z

Test build #121991 has finished for PR 28393 at commit 562841b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-wang · 2020-04-30T03:42:54Z

@xuanyuanking

dilipbiswal · 2020-04-30T06:57:17Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala


    for (index <- 0 until line.length) {
      if (line.charAt(index) == '\'' && !insideComment) {
        // take a look to see if it is escaped


@adrian-wang Should we update the comment to reflect the newly added condition?

Yep, maybe we can rephrase this comment here.

dilipbiswal · 2020-04-30T06:57:48Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala

          insideSingleQuote = !insideSingleQuote
        }
      } else if (line.charAt(index) == '\"' && !insideComment) {
        // take a look to see if it is escaped


@adrian-wang Same.

xuanyuanking

LGTM for the changes, cc @cloud-fan

xuanyuanking · 2020-04-30T07:23:14Z

...e-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala


    for (index <- 0 until line.length) {
      if (line.charAt(index) == '\'' && !insideComment) {
        // take a look to see if it is escaped


Yep, maybe we can rephrase this comment here.

adrian-wang · 2020-05-01T09:45:34Z

@dilipbiswal @xuanyuanking Thanks for your advice, I have updated the code.

SparkQA · 2020-05-01T10:25:24Z

Test build #122169 has finished for PR 28393 at commit 1947cbe.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-01T13:02:18Z

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

    )
  }
+
+  test("Should allow unescaped quote mark in quoted string") {


Plz add a prefix SPARK-31595: Should allow....

dilipbiswal · 2020-05-02T07:28:49Z

LGTM
cc @maropu

SparkQA · 2020-05-02T07:49:20Z

Test build #122206 has finished for PR 28393 at commit c20543d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-05-02T09:11:21Z

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

+
+  test("SPARK-31595 Should allow unescaped quote mark in quoted string") {
+    runCliWithin(1.minute)(
+      """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235"


Is this an only issue in the thrift server side? How about the spark side?

scala> sql("""SELECT '"legal string a';select 1 + 234;""").show() org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 25) == SQL == SELECT '"legal string a';select 1 + 234; -------------------------^^^ at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:268) at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:135) at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)

This is not the same case. sql() only accepts single sql statement, even

sql("select 1; select 2;")

will return error.

maropu · 2020-05-02T09:12:39Z

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

  }
+
+  test("SPARK-31595 Should allow unescaped quote mark in quoted string") {
+    runCliWithin(1.minute)(


Could you explicitly set false at spark.sql.parser.escapedStringLiterals for the tests below?

Actually this has nothing to do with spark.sql.parser.escapedStringLiterals, even if the config is set to true, the parser should accept this string.

Ur, I got it. I think the option is misleading... Could you remove it from the PR descritpion?

Done that, thanks!

maropu

Looks fine and thanks for the update, @adrian-wang cc: @wangyum

cloud-fan · 2020-05-05T06:12:25Z

sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala

+      """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235"
+    )
+    runCliWithin(1.minute)(
+      """SELECT "legal 'string b";select 22222 + 1;""".stripMargin -> "22223"


nit: let's not use the multiline string style for a single line string.

Updated, thanks!

SparkQA · 2020-05-06T03:14:04Z

Test build #122329 has finished for PR 28393 at commit d185264.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-05-06T04:33:40Z

cc @juliuszsompolski as well

cloud-fan · 2020-05-06T04:34:39Z

thanks, merging to master/3.0!

…ted string ### What changes were proposed in this pull request? `def splitSemiColon` cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, `splitSemiColon` will not drop off semicolon as expected. ### Why are the changes needed? Some regex expression will use quote mark in string. We should process semicolon correctly. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit test and also manual test. Closes #28393 from adrian-wang/unescaped. Authored-by: Daoyuan Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 53a9bf8) Signed-off-by: Wenchen Fan <[email protected]>

SPARK-31595 Spark sql cli should allow unescaped quote mark in quoted…

562841b

… string

adrian-wang changed the title ~~SPARK-31595 Spark sql should allow unescaped quote mark in quoted string~~ [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string Apr 28, 2020

dilipbiswal reviewed Apr 30, 2020

View reviewed changes

xuanyuanking reviewed Apr 30, 2020

View reviewed changes

refine comments

1947cbe

probot-autolabeler bot added the SQL label May 1, 2020

maropu reviewed May 1, 2020

View reviewed changes

add jira title in test suite

c20543d

maropu reviewed May 2, 2020

View reviewed changes

maropu approved these changes May 3, 2020

View reviewed changes

cloud-fan reviewed May 5, 2020

View reviewed changes

remove multiline string style

d185264

cloud-fan closed this in 53a9bf8 May 6, 2020

This was referenced Jan 5, 2021

[SPARK-33100][SQL][3.0] Ignore a semicolon inside a bracketed comment in spark-sql #31033

Closed

[SPARK-33100][SQL][2.4] Ignore a semicolon inside a bracketed comment in spark-sql #31040

Closed

[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string #28393

[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string #28393

Uh oh!

Conversation

adrian-wang commented Apr 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Apr 28, 2020

Uh oh!

adrian-wang commented Apr 30, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xuanyuanking left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrian-wang commented May 1, 2020

Uh oh!

SparkQA commented May 1, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented May 2, 2020

Uh oh!

SparkQA commented May 2, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu May 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 6, 2020

Uh oh!

cloud-fan commented May 6, 2020

Uh oh!

cloud-fan commented May 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

adrian-wang commented Apr 28, 2020 •

edited

Loading

maropu May 2, 2020 •

edited

Loading