Skip to content

Conversation

@maropu
Copy link
Member

@maropu maropu commented Jul 17, 2020

What changes were proposed in this pull request?

This PR modified the parser code to handle invalid usages of a SET/RESET command.
For example;

SET spark.sql.ansi.enabled true

The above SQL command does not change the configuration value and it just tries to display the value of the configuration
spark.sql.ansi.enabled true. This PR disallows using special characters including spaces in the configuration name and reports a user-friendly error instead. In the error message, it tells users a workaround to use quotes or a string literal if they still needs to specify a configuration with them. 

Before this PR:

scala> sql("SET spark.sql.ansi.enabled true").show(1, -1)
+---------------------------+-----------+
|key                        |value      |
+---------------------------+-----------+
|spark.sql.ansi.enabled true|<undefined>|
+---------------------------+-----------+

After this PR:

scala> sql("SET spark.sql.ansi.enabled true")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected format is 'SET', 'SET key', or 'SET key=value'. If you want to include special characters in key, please use quotes, e.g., SET `ke y`=value.(line 1, pos 0)

== SQL ==
SET spark.sql.ansi.enabled true
^^^

Why are the changes needed?

For better user-friendly errors.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added tests in SparkSqlParserSuite.

@maropu
Copy link
Member Author

maropu commented Jul 17, 2020

@gatorsmile Do we need to support the case: a configuration with spaces?

@SparkQA
Copy link

SparkQA commented Jul 17, 2020

Test build #126055 has finished for PR 29146 at commit 7f8907e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command Jul 17, 2020
@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126099 has finished for PR 29146 at commit 7e0d97a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126104 has finished for PR 29146 at commit 0d357ae.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 18, 2020

Test build #126108 has finished for PR 29146 at commit 2e92675.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 20, 2020

Looks all the tests are passed in Github Actions.

assert(spark.conf.get(SQLConf.SHUFFLE_PARTITIONS) === 10)
} finally {
sql(s"set ${SQLConf.SHUFFLE_PARTITIONS}=$original")
sql(s"set ${SQLConf.SHUFFLE_PARTITIONS.key}=$original")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code looks incorrect.

@maropu maropu changed the title [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command Jul 20, 2020
@maropu
Copy link
Member Author

maropu commented Jul 20, 2020

cc: @cloud-fan @viirya

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 24, 2020

Test build #126491 has finished for PR 29146 at commit 2e92675.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 24, 2020

If we don't allow space in the config name by default(requires quoting), I think we can do that for other special chars as well. Then the parser rule can be very simple:

SET (multiPartIdent | STRING | BACKQUOTED_IDENTIFIER) (EQ value=.*)?

@cloud-fan
Copy link
Contributor

BTW after #29202 , let's make sure SET and RESET are consistent after this PR.

@viirya
Copy link
Member

viirya commented Jul 24, 2020

If we don't allow space in the config name by default(requires quoting), I think we can do that for other special chars as well. Then the parser rule can be very simple:

SET (multiPartIdent | STRING | BACKQUOTED_IDENTIFIER) (EQ value=.*)?

+1

Can we try to do that in parser rule?

@maropu
Copy link
Member Author

maropu commented Jul 25, 2020

If we don't allow space in the config name by default(requires quoting), I think we can do that for other special chars as well. Then the parser rule can be very simple:

SET (multiPartIdent | STRING | BACKQUOTED_IDENTIFIER) (EQ value=.*)?

Oh, it looks pretty smart. I'll update it based on that.

@maropu maropu changed the title [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET command [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command Jul 27, 2020
} else if (raw.nonEmpty) {
SetCommand(Some(raw.trim -> None))
if (ctx.configKey() != null) {
val keyStr = normalizeConfigString(ctx.configKey().getText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have existing code to get the text of BACKQUOTED_IDENTIFIER?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. At least, we don't have such an existing test case.

case "-v" => SetCommand(Some("-v" -> None))
case s if s.isEmpty() => SetCommand(None)
case _ => throw new ParseException("Expected format is 'SET', 'SET key', or " +
"'SET key=value'. If you want to include spaces in key and value, please use quotes, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces -> special chars

case s if s.isEmpty() => SetCommand(None)
case _ => throw new ParseException("Expected format is 'SET', 'SET key', or " +
"'SET key=value'. If you want to include spaces in key and value, please use quotes, " +
"e.g., SET \"ke y\"=`va lu e`.", ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use quotes or string literal, e.g. ...

remainder(ctx.RESET().getSymbol).trim match {
case s if s.isEmpty() => ResetCommand(None)
case _ => throw new ParseException("Expected format is 'RESET' or 'RESET key'. " +
"If you want to include spaces in key, please use quotes, " +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@SparkQA
Copy link

SparkQA commented Jul 27, 2020

Test build #126650 has finished for PR 29146 at commit 62ed9cd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126674 has finished for PR 29146 at commit 31e3fcb.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu maropu changed the title [SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command [WIP][SPARK-32257][SQL] Reports explicit errors for invalid usage of SET/RESET command Jul 28, 2020
class SparkSqlAstBuilder(conf: SQLConf) extends AstBuilder(conf) {
import org.apache.spark.sql.catalyst.parser.ParserUtils._

private def normalizeConfigString(s: String) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked tableIdentifier, it includes BACKQUOTED_IDENTIFIER, and in AstBuilder we just use xxx.getText. Why does getText not work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into tableIdentifier, but I couldn't find why back-quotes are removed in the case. I replaced BACKQUOTED_IDENTIFIER with quotedIdentifier in this commit, then it seems back-quotes are removed by the ANTLR parser. Do you know something about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it. See PostProcessor.exitQuotedIdentifier. quotedIdentifier is a special parser rule which has post porcessor to remove back-quotes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice. I see...

@SparkQA
Copy link

SparkQA commented Jul 28, 2020

Test build #126694 has finished for PR 29146 at commit 7a32197.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126817 has finished for PR 29146 at commit 6d2221d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 31, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126851 has finished for PR 29146 at commit 6d2221d.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Jul 31, 2020

retest this please

1 similar comment
@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jul 31, 2020

Test build #126875 has finished for PR 29146 at commit 6d2221d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2020

Test build #126913 has finished for PR 29146 at commit 61f5ffc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 1, 2020

Test build #126915 has finished for PR 29146 at commit 4822210.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 2, 2020

Test build #126934 has finished for PR 29146 at commit 5a80e78.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member Author

maropu commented Aug 2, 2020

okay, ready to review. cc: @cloud-fan @viirya @HyukjinKwon

| unsupportedHiveNativeCommands .*? #failNativeCommand
;

quotedConfigKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, is it necessary to create an alias? How about SET key= quotedIdentifier (EQ value=.*)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I did so first, but that suggested definition did not invoke PostProcessor.exitQuotedIdentifier to split backquotes;

-    | SET quotedConfigKey (EQ value=.*)?                               #setQuotedConfiguration
+    | SET key=quotedIdentifier (EQ value=.*)?                          #setQuotedConfiguration

   override def visitSetQuotedConfiguration(ctx: SetQuotedConfigurationContext)
     : LogicalPlan = withOrigin(ctx) {
-    val keyStr = ctx.quotedConfigKey().getText
+    val keyStr = ctx.key.getText

assertEqual("SET `spark.sql.    key`=value",
  SetCommand(Some("spark.sql.    key" -> Some("value"))))
fo] - Report Error for invalid usage of SET command *** FAILED *** (106 milliseconds)
[info]   == FAIL: Plans do not match ===
[info]   !SetCommand (`spark.sql.    key`,Some(value))   SetCommand (spark.sql.    key,Some(value)) (PlanTest.scala:157)
[info]   org.scalatest.exceptions.TestFailedException:
...

So, we need replace("`", "") for that approach. Please check the lates commit.

SetCommand(Some(key -> Option(value)))
} else if (raw.nonEmpty) {
SetCommand(Some(raw.trim -> None))
val configKeyValueDef = """([a-zA-Z_\d\\.:]+)\s*=(.*)""".r
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put it in the class body so we don't need to compile the regex repeatedly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

test("Checks if SET/RESET can parse all the configurations") {
// Force to build static SQL configurations
StaticSQLConf
(SQLConf.sqlConfEntries.values.asScala ++ ConfigEntry.knownConfigs.values.asScala)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQLConf also uses ConfigEntry, I think ConfigEntry.knownConfigs already covers all the registered configs.


override def visitSetQuotedConfiguration(ctx: SetQuotedConfigurationContext)
: LogicalPlan = withOrigin(ctx) {
val keyStr = ctx.key.getText.replaceAll("`", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about visitQuotedIdentifier(ctx.key)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I tried (the two cases below) before, but it didn't work, too...

-    val keyStr = ctx.key.getText.replaceAll("`", "")
+    val keyStr = visitQuotedIdentifier(ctx.key).toString

-    val keyStr = ctx.key.getText.replaceAll("`", "")
+    val keyStr = visitQuotedIdentifier(ctx.quotedIdentifier()).toString

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126963 has finished for PR 29146 at commit b0e9686.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 3, 2020

Test build #126976 has finished for PR 29146 at commit 3065ed9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in c6109ba Aug 3, 2020
@maropu
Copy link
Member Author

maropu commented Aug 3, 2020

Thanks for the reviews!

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

late LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants