-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31595][SQL] Spark sql should allow unescaped quote mark in quoted string #28393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #121991 has finished for PR 28393 at commit
|
|
|
||
| for (index <- 0 until line.length) { | ||
| if (line.charAt(index) == '\'' && !insideComment) { | ||
| // take a look to see if it is escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrian-wang Should we update the comment to reflect the newly added condition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, maybe we can rephrase this comment here.
| insideSingleQuote = !insideSingleQuote | ||
| } | ||
| } else if (line.charAt(index) == '\"' && !insideComment) { | ||
| // take a look to see if it is escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrian-wang Same.
xuanyuanking
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for the changes, cc @cloud-fan
|
|
||
| for (index <- 0 until line.length) { | ||
| if (line.charAt(index) == '\'' && !insideComment) { | ||
| // take a look to see if it is escaped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, maybe we can rephrase this comment here.
|
@dilipbiswal @xuanyuanking Thanks for your advice, I have updated the code. |
|
Test build #122169 has finished for PR 28393 at commit
|
| ) | ||
| } | ||
|
|
||
| test("Should allow unescaped quote mark in quoted string") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz add a prefix SPARK-31595: Should allow....
|
LGTM |
|
Test build #122206 has finished for PR 28393 at commit
|
|
|
||
| test("SPARK-31595 Should allow unescaped quote mark in quoted string") { | ||
| runCliWithin(1.minute)( | ||
| """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an only issue in the thrift server side? How about the spark side?
scala> sql("""SELECT '"legal string a';select 1 + 234;""").show()
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 25)
== SQL ==
SELECT '"legal string a';select 1 + 234;
-------------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:268)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:135)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:49)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same case. sql() only accepts single sql statement, even
sql("select 1; select 2;")will return error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see.
| } | ||
|
|
||
| test("SPARK-31595 Should allow unescaped quote mark in quoted string") { | ||
| runCliWithin(1.minute)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explicitly set false at spark.sql.parser.escapedStringLiterals for the tests below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this has nothing to do with spark.sql.parser.escapedStringLiterals, even if the config is set to true, the parser should accept this string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ur, I got it. I think the option is misleading... Could you remove it from the PR descritpion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done that, thanks!
maropu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine and thanks for the update, @adrian-wang cc: @wangyum
| """SELECT '"legal string a';select 1 + 234;""".stripMargin -> "235" | ||
| ) | ||
| runCliWithin(1.minute)( | ||
| """SELECT "legal 'string b";select 22222 + 1;""".stripMargin -> "22223" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's not use the multiline string style for a single line string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks!
|
Test build #122329 has finished for PR 28393 at commit
|
|
cc @juliuszsompolski as well |
|
thanks, merging to master/3.0! |
…ted string ### What changes were proposed in this pull request? `def splitSemiColon` cannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string, `splitSemiColon` will not drop off semicolon as expected. ### Why are the changes needed? Some regex expression will use quote mark in string. We should process semicolon correctly. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Added Unit test and also manual test. Closes #28393 from adrian-wang/unescaped. Authored-by: Daoyuan Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]> (cherry picked from commit 53a9bf8) Signed-off-by: Wenchen Fan <[email protected]>
What changes were proposed in this pull request?
def splitSemiColoncannot handle unescaped quote mark like "'" or '"' correctly. When there are unmatched quotes in a string,splitSemiColonwill not drop off semicolon as expected.Why are the changes needed?
Some regex expression will use quote mark in string. We should process semicolon correctly.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added Unit test and also manual test.