-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command #29146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gatorsmile Do we need to support the case: a configuration with spaces? |
|
Test build #126055 has finished for PR 29146 at commit
|
|
Test build #126099 has finished for PR 29146 at commit
|
|
Test build #126104 has finished for PR 29146 at commit
|
|
Test build #126108 has finished for PR 29146 at commit
|
|
Looks all the tests are passed in Github Actions. |
| assert(spark.conf.get(SQLConf.SHUFFLE_PARTITIONS) === 10) | ||
| } finally { | ||
| sql(s"set ${SQLConf.SHUFFLE_PARTITIONS}=$original") | ||
| sql(s"set ${SQLConf.SHUFFLE_PARTITIONS.key}=$original") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code looks incorrect.
|
cc: @cloud-fan @viirya |
|
retest this please |
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Show resolved
Hide resolved
|
Test build #126491 has finished for PR 29146 at commit
|
|
If we don't allow space in the config name by default(requires quoting), I think we can do that for other special chars as well. Then the parser rule can be very simple: |
|
BTW after #29202 , let's make sure SET and RESET are consistent after this PR. |
+1 Can we try to do that in parser rule? |
Oh, it looks pretty smart. I'll update it based on that. |
| } else if (raw.nonEmpty) { | ||
| SetCommand(Some(raw.trim -> None)) | ||
| if (ctx.configKey() != null) { | ||
| val keyStr = normalizeConfigString(ctx.configKey().getText) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have existing code to get the text of BACKQUOTED_IDENTIFIER?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. At least, we don't have such an existing test case.
| case "-v" => SetCommand(Some("-v" -> None)) | ||
| case s if s.isEmpty() => SetCommand(None) | ||
| case _ => throw new ParseException("Expected format is 'SET', 'SET key', or " + | ||
| "'SET key=value'. If you want to include spaces in key and value, please use quotes, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spaces -> special chars
| case s if s.isEmpty() => SetCommand(None) | ||
| case _ => throw new ParseException("Expected format is 'SET', 'SET key', or " + | ||
| "'SET key=value'. If you want to include spaces in key and value, please use quotes, " + | ||
| "e.g., SET \"ke y\"=`va lu e`.", ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use quotes or string literal, e.g. ...
| remainder(ctx.RESET().getSymbol).trim match { | ||
| case s if s.isEmpty() => ResetCommand(None) | ||
| case _ => throw new ParseException("Expected format is 'RESET' or 'RESET key'. " + | ||
| "If you want to include spaces in key, please use quotes, " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
|
Test build #126650 has finished for PR 29146 at commit
|
|
Test build #126674 has finished for PR 29146 at commit
|
| class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) { | ||
| import org.apache.spark.sql.catalyst.parser.ParserUtils._ | ||
|
|
||
| private def normalizeConfigString(s: String) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked tableIdentifier, it includes BACKQUOTED_IDENTIFIER, and in AstBuilder we just use xxx.getText. Why does getText not work here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into tableIdentifier, but I couldn't find why back-quotes are removed in the case. I replaced BACKQUOTED_IDENTIFIER with quotedIdentifier in this commit, then it seems back-quotes are removed by the ANTLR parser. Do you know something about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found it. See PostProcessor.exitQuotedIdentifier. quotedIdentifier is a special parser rule which has post porcessor to remove back-quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, nice. I see...
|
Test build #126694 has finished for PR 29146 at commit
|
|
Test build #126817 has finished for PR 29146 at commit
|
|
retest this please |
|
Test build #126851 has finished for PR 29146 at commit
|
|
retest this please |
1 similar comment
|
retest this please |
|
Test build #126875 has finished for PR 29146 at commit
|
|
Test build #126913 has finished for PR 29146 at commit
|
|
Test build #126915 has finished for PR 29146 at commit
|
|
Test build #126934 has finished for PR 29146 at commit
|
|
okay, ready to review. cc: @cloud-fan @viirya @HyukjinKwon |
| | unsupportedHiveNativeCommands .*? #failNativeCommand | ||
| ; | ||
|
|
||
| quotedConfigKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, is it necessary to create an alias? How about SET key= quotedIdentifier (EQ value=.*)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I did so first, but that suggested definition did not invoke PostProcessor.exitQuotedIdentifier to split backquotes;
- | SET quotedConfigKey (EQ value=.*)? #setQuotedConfiguration
+ | SET key=quotedIdentifier (EQ value=.*)? #setQuotedConfiguration
override def visitSetQuotedConfiguration(ctx: SetQuotedConfigurationContext)
: LogicalPlan = withOrigin(ctx) {
- val keyStr = ctx.quotedConfigKey().getText
+ val keyStr = ctx.key.getText
assertEqual("SET `spark.sql. key`=value",
SetCommand(Some("spark.sql. key" -> Some("value"))))
fo] - Report Error for invalid usage of SET command *** FAILED *** (106 milliseconds)
[info] == FAIL: Plans do not match ===
[info] !SetCommand (`spark.sql. key`,Some(value)) SetCommand (spark.sql. key,Some(value)) (PlanTest.scala:157)
[info] org.scalatest.exceptions.TestFailedException:
...
So, we need replace("`", "") for that approach. Please check the lates commit.
| SetCommand(Some(key -> Option(value))) | ||
| } else if (raw.nonEmpty) { | ||
| SetCommand(Some(raw.trim -> None)) | ||
| val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put it in the class body so we don't need to compile the regex repeatedly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right.
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Show resolved
Hide resolved
| test("Checks if SET/RESET can parse all the configurations") { | ||
| // Force to build static SQL configurations | ||
| StaticSQLConf | ||
| (SQLConf.sqlConfEntries.values.asScala ++ ConfigEntry.knownConfigs.values.asScala) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SQLConf also uses ConfigEntry, I think ConfigEntry.knownConfigs already covers all the registered configs.
|
|
||
| override def visitSetQuotedConfiguration(ctx: SetQuotedConfigurationContext) | ||
| : LogicalPlan = withOrigin(ctx) { | ||
| val keyStr = ctx.key.getText.replaceAll("`", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about visitQuotedIdentifier(ctx.key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I tried (the two cases below) before, but it didn't work, too...
- val keyStr = ctx.key.getText.replaceAll("`", "")
+ val keyStr = visitQuotedIdentifier(ctx.key).toString
- val keyStr = ctx.key.getText.replaceAll("`", "")
+ val keyStr = visitQuotedIdentifier(ctx.quotedIdentifier()).toString
|
Test build #126963 has finished for PR 29146 at commit
|
@cloud-fan's brush-up
|
Test build #126976 has finished for PR 29146 at commit
|
|
thanks, merging to master! |
|
Thanks for the reviews! |
viirya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
late LGTM
What changes were proposed in this pull request?
This PR modified the parser code to handle invalid usages of a SET/RESET command.
For example;
The above SQL command does not change the configuration value and it just tries to display the value of the configuration
spark.sql.ansi.enabled true. This PR disallows using special characters including spaces in the configuration name and reports a user-friendly error instead. In the error message, it tells users a workaround to use quotes or a string literal if they still needs to specify a configuration with them.Before this PR:
After this PR:
Why are the changes needed?
For better user-friendly errors.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Added tests in
SparkSqlParserSuite.