Skip to content

Conversation

@adrian-wang
Copy link
Contributor

@adrian-wang adrian-wang commented Apr 28, 2020

What changes were proposed in this pull request?

def splitSemiColon cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, splitSemiColon will not drop off semicolon as expected.

Why are the changes needed?

Some regex expression will use quote mark in string. We should process semicolon correctly.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added Unit test and also manual test.

@adrian-wang adrian-wang changed the title SPARK-31595 Spark sql should allow unescaped quote mark in quoted string [SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string Apr 28, 2020
@SparkQA
Copy link

SparkQA commented Apr 28, 2020

Test build #121991 has finished for PR 28393 at commit 562841b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@adrian-wang
Copy link
Contributor Author

@xuanyuanking


for (index <- 0 until line.length) {
if (line.charAt(index) == '\'' && !insideComment) {
// take a look to see if it is escaped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrian-wang Should we update the comment to reflect the newly added condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, maybe we can rephrase this comment here.

insideSingleQuote = !insideSingleQuote
}
} else if (line.charAt(index) == '\"' && !insideComment) {
// take a look to see if it is escaped
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrian-wang Same.

Copy link
Member

@xuanyuanking xuanyuanking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for the changes, cc @cloud-fan


for (index <- 0 until line.length) {
if (line.charAt(index) == '\'' && !insideComment) {
// take a look to see if it is escaped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, maybe we can rephrase this comment here.

@adrian-wang
Copy link
Contributor Author

@dilipbiswal @xuanyuanking Thanks for your advice, I have updated the code.

@SparkQA
Copy link

SparkQA commented May 1, 2020

Test build #122169 has finished for PR 28393 at commit 1947cbe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

)
}

test("Should allow unescaped quote mark in quoted string") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz add a prefix SPARK-31595: Should allow....

@dilipbiswal
Copy link
Contributor

LGTM
cc @maropu

@SparkQA
Copy link

SparkQA commented May 2, 2020

Test build #122206 has finished for PR 28393 at commit c20543d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


test("SPARK-31595 Should allow unescaped quote mark in quoted string") {
runCliWithin(1.minute)(
"""SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an only issue in the thrift server side? How about the spark side?

scala> sql("""SELECT '"legal string a';select 1 + 234;""").show()
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 25)

== SQL ==
SELECT '"legal string a';select 1 + 234;
-------------------------^^^

  at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:268)
  at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:135)
  at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same case. sql() only accepts single sql statement, even

sql("select 1; select 2;")

will return error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

}

test("SPARK-31595 Should allow unescaped quote mark in quoted string") {
runCliWithin(1.minute)(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explicitly set false at spark.sql.parser.escapedStringLiterals for the tests below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this has nothing to do with spark.sql.parser.escapedStringLiterals, even if the config is set to true, the parser should accept this string.

Copy link
Member

@maropu maropu May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ur, I got it. I think the option is misleading... Could you remove it from the PR descritpion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done that, thanks!

Copy link
Member

@maropu maropu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine and thanks for the update, @adrian-wang cc: @wangyum

"""SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235"
)
runCliWithin(1.minute)(
"""SELECT "legal 'string b";select 22222 + 1;""".stripMargin -> "22223"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's not use the multiline string style for a single line string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

@SparkQA
Copy link

SparkQA commented May 6, 2020

Test build #122329 has finished for PR 28393 at commit d185264.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cc @juliuszsompolski as well

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.0!

@cloud-fan cloud-fan closed this in 53a9bf8 May 6, 2020
cloud-fan pushed a commit that referenced this pull request May 6, 2020
…ted string

### What changes were proposed in this pull request?
`def splitSemiColon` cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, `splitSemiColon` will not drop off semicolon as expected.

### Why are the changes needed?
Some regex expression will use quote mark in string. We should process semicolon correctly.

### Does this PR introduce any user-facing change?
No

### How was this patch tested?
Added Unit test and also manual test.

Closes #28393 from adrian-wang/unescaped.

Authored-by: Daoyuan Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 53a9bf8)
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants