[SPARK-31875][SQL] Provide a option to disable user supplied Hints #28683

dilipbiswal · 2020-05-31T06:38:42Z

What changes were proposed in this pull request?

Introduce a new SQL config spark.sql.optimizer.ignoreHints. When this is set to true
application of hints are disabled. This is similar to Oracle's OPTIMIZER_IGNORE_HINTS.
This can be helpful to study the impact of performance difference when hints are applied vs when they are not.

Why are the changes needed?

Can be helpful to study the impact of performance difference when hints are applied vs when they are not.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests added in ResolveHintsSuite.

dilipbiswal · 2020-05-31T06:39:42Z

cc @gatorsmile @maropu

SparkQA · 2020-05-31T07:05:02Z

Test build #123330 has finished for PR 28683 at commit d2c3eda.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-05-31T07:21:21Z

retest this please

maropu · 2020-05-31T09:55:49Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+    buildConf("spark.sql.optimizer.hints.enabled")
+      .internal()
+      .doc("Hints are additional directives that aids optimizer in better planning of a query." +
+        " This configuration when set to `false`, disables the application of user" +


nit: for consistency, could you move the space in the head into the tail?

maropu · 2020-05-31T09:57:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala

    def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
-      case hint @ UnresolvedHint(hintName, _, _) => hintName.toUpperCase(Locale.ROOT) match {
+      case hint @ UnresolvedHint(hintName, _, _)
+        if conf.optimizerHintsEnabled => hintName.toUpperCase(Locale.ROOT) match {


nit format;

case hint @ UnresolvedHint(hintName, _, _) if conf.optimizerHintsEnabled => hintName.toUpperCase(Locale.ROOT) match { case "REPARTITION" => createRepartition(shuffle = true, hint) ...

maropu · 2020-05-31T09:58:38Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveHints.scala

-      case h: UnresolvedHint if STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
+      case h: UnresolvedHint
+        if (conf.optimizerHintsEnabled &&
+          STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT))) =>


nit format;

case h: UnresolvedHint if conf.optimizerHintsEnabled && STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) => if (h.parameters.isEmpty) {

maropu · 2020-05-31T10:02:19Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/ResolveHintsSuite.scala

+      UnresolvedHint("COALESCE", Seq(Literal(10)), table("TaBlE")),
+      testRelation,
+      caseSensitive = true,
+      enableHints = false


Could you use withSQLConf instead for the tests? I think enableHints is only used for this test suite, so we don't need to add the option in AnalysisTest.

@maropu Should i move this test to SQLQuerySuite ? Actually i tried initially to do withSQLConf, but realized that this suite creates an Analyzer by constructing a SQLConf inside AnalysisTest.

Ah, yea. that sounds better.

maropu · 2020-05-31T10:05:08Z

Adding this config looks fine to me. also cc: @maryannxue

SparkQA · 2020-05-31T12:02:07Z

Test build #123336 has finished for PR 28683 at commit d2c3eda.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-05-31T19:36:04Z

Test build #123347 has finished for PR 28683 at commit 4ba042b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-06-01T04:28:02Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+      .internal()
+      .doc("Hints are additional directives that aids optimizer in better planning of a query. " +
+        "This configuration when set to `false`, disables the application of user " +
+        "specified hints.")


nit: How about rephrasing it like this? (It seems the ohter similar .enabled configs say When true, brabrabra)

When false, the optimizer will ignore user-specified hints that are additional directives for better planning of a query.

@maropu Sounds good to me. Will change. Thank you.

SparkQA · 2020-06-01T11:53:10Z

Test build #123364 has finished for PR 28683 at commit 3cc45f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

HyukjinKwon · 2020-06-10T08:14:16Z

Looks okay to me too but @maryannxue can you take a look for sure?

dongjoon-hyun · 2020-06-20T02:40:32Z

Could you resolve a conflict, @dilipbiswal ?

dongjoon-hyun · 2020-06-20T02:45:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

      .createWithDefault(true)

+  val OPTIMIZER_HINTS_ENABLED =
+    buildConf("spark.sql.optimizer.hints.enabled")


Can we have more direct names like OPTIMIZER_IGNORE_HINTS? Maybe, spark.sql.optimizer.ignoreHints.enabled like spark.files.ignoreMissingFiles or spark.files.ignoreCorruptFiles?

@dongjoon-hyun Thank you. I have made the change.

a little confused, why this config named optimizer, does it used in analyze ?

@ulysses-you To the user, its an optimization we are disabling. Thats why the "optimizer" name in the config.
Implementation wise, we handle it by keeping the nodes unresolved. But semantic wise we are disabling an optimization.

dongjoon-hyun · 2020-06-20T02:46:15Z

Thank you, @dilipbiswal . The feature looks useful to me.

SparkQA · 2020-06-20T07:05:02Z

Test build #124310 has finished for PR 28683 at commit dd06548.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-06-20T07:05:46Z

retest this please

SparkQA · 2020-06-20T11:40:30Z

Test build #124315 has finished for PR 28683 at commit dd06548.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

Hi, @dilipbiswal , @maropu , @HyukjinKwon .

The current approach looks straightforward, but seems to unable to disable user supplied hints from SparkExtension analyzer rules. Although we can document this limitation as a conf description, I believe it would be great if this new option disable hints for all the other spark feature combinations like SparkExtension. Maybe, we need to remove the hint at the beginning.

@dilipbiswal . Could you add a test case for SparkExtension?

Also, cc @gatorsmile and @cloud-fan since there might be more extension points from their side. Or, we may want to proceed in this AS-IS status.

dilipbiswal · 2020-06-20T23:24:31Z

@dongjoon-hyun Thank you for the comments. I was thinking, if users add hints through extensions, isn't it reasonable to expect them to code to this new configuration ? I have never tried it myself, but i believe one can extend the catalyst parser and have hints implemented by a different syntax and have different logical nodes to represent it making it completely opaque to us ?

cloud-fan · 2020-07-02T13:34:11Z

Shall we add a rule to remove all hint nodes at the beginning if the conf is set? This is future-proof in case we add new hint resolution rules in the future. I don't think we should deal with hint nodes added by custom rules. Hints should be added by end-users. Customer rules can add whatever plan nodes directly and we can't control it.

dilipbiswal · 2020-07-02T20:45:42Z

@cloud-fan

Shall we add a rule to remove all hint nodes at the beginning if the conf is set? This is future-proof in case we add new hint resolution rules in the future.

SGTM.

dilipbiswal · 2020-07-02T22:56:15Z

retest this please

dongjoon-hyun · 2020-07-07T22:12:39Z

Retest this please.

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

dongjoon-hyun

+1, LGTM with only a few minor left-overs.

SparkQA · 2020-07-08T02:35:06Z

Test build #125277 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-07-08T05:36:39Z

Test build #125262 has finished for PR 28683 at commit f3d030f.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-08T05:48:40Z

retest this please

SparkQA · 2020-07-08T06:09:42Z

Test build #125303 has finished for PR 28683 at commit dbdab2d.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-08T07:23:55Z

retest this please

SparkQA · 2020-07-08T09:03:48Z

Test build #125325 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

maropu · 2020-07-08T10:10:12Z

retest this please

SparkQA · 2020-07-08T10:45:42Z

Test build #125346 has finished for PR 28683 at commit dbdab2d.

This patch fails to generate documentation.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-08T11:53:09Z

retest this please

SparkQA · 2020-07-08T13:58:20Z

Test build #125356 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-08T14:04:45Z

retest this please

SparkQA · 2020-07-08T22:04:13Z

Test build #125370 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2020-07-08T23:06:31Z

retest this please

SparkQA · 2020-07-09T00:02:20Z

Test build #125408 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-07-09T07:26:01Z

Retest this please.

SparkQA · 2020-07-09T14:41:55Z

Test build #125449 has finished for PR 28683 at commit dbdab2d.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-07-09T15:03:59Z

retest this please

SparkQA · 2020-07-09T15:07:47Z

Test build #125488 has started for PR 28683 at commit dbdab2d.

cloud-fan · 2020-07-09T16:49:55Z

retest this please

SparkQA · 2020-07-09T17:07:26Z

Test build #125497 has started for PR 28683 at commit dbdab2d.

dongjoon-hyun · 2020-07-10T01:26:13Z

The last run passed all Java/Scala/Python UTs, but it will fail due to timeout during R part.
So, I verified the R UT and packaging part manually.

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 13 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 0 ]
✔ |  OK F W S | Context
✔ |  11       | binary functions [1.3 s]
✔ |   4       | functions on binary files [1.0 s]
✔ |   2       | broadcast variables [0.3 s]
✔ |   5       | functions in client.R
✔ |  46       | test functions in sparkR.R [3.5 s]
✔ |   2       | include R packages [0.6 s]
✔ |   2       | JVM API [0.1 s]
✔ |  75       | MLlib classification algorithms, except for tree-based algorithms [53.2 s]
✔ |  70       | MLlib clustering algorithms [21.0 s]
✔ |   6       | MLlib frequent pattern mining [2.3 s]
✔ |   8       | MLlib recommendation algorithms [4.0 s]
✔ | 136       | MLlib regression algorithms, except for tree-based algorithms [70.0 s]
✔ |   8       | MLlib statistics algorithms [0.8 s]
✔ |  94       | MLlib tree-based algorithms [41.8 s]
✔ |  29       | parallelize() and collect() [0.3 s]
✔ | 428       | basic RDD functions [38.9 s]
✔ |  39       | SerDe functionality [2.2 s]
✔ |  20       | partitionBy, groupByKey, reduceByKey etc. [5.9 s]
✔ |   4       | functions in sparkR.R
✔ |  16       | SparkSQL Arrow optimization [30.1 s]
✔ |   6       | test show SparkDataFrame when eager execution is enabled. [0.8 s]
✔ | 1177       | SparkSQL functions [121.2 s]
✔ |  42       | Structured Streaming [53.6 s]
✔ |  16       | tests RDD function take() [0.6 s]
✔ |  14       | the textFile() function [2.7 s]
✔ |  46       | functions in utils.R [0.5 s]
✔ |   0     1 | Windows-specific tests
────────────────────────────────────────────────────────────────────────────────
test_Windows.R:22: skip: sparkJars tag in SparkContext
Reason: This test is only for Windows, skipped
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 456.7 s

OK:       2306
Failed:   0
Warnings: 0
Skipped:  1

Merged to master. Thank you, @dilipbiswal and all!

AmplabJenkins · 2020-07-10T01:31:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125497/
Test FAILed.

dilipbiswal · 2020-07-10T05:00:32Z

@dongjoon-hyun @maropu @cloud-fan @ulysses-you Thanks a lot.

probot-autolabeler bot added the SQL label May 31, 2020

dilipbiswal changed the title ~~[SPARK-31875] Provide a option to disabling user supplied Hints~~ [SPARK-31875][SQL] Provide a option to disabling user supplied Hints May 31, 2020

dilipbiswal changed the title ~~[SPARK-31875][SQL] Provide a option to disabling user supplied Hints~~ [SPARK-31875][SQL] Provide a option to disable user supplied Hints May 31, 2020

maropu reviewed May 31, 2020

View reviewed changes

dilipbiswal force-pushed the disable_hint branch from d2c3eda to 4ba042b Compare May 31, 2020 15:11

maropu reviewed Jun 1, 2020

View reviewed changes

maropu approved these changes Jun 1, 2020

View reviewed changes

dongjoon-hyun reviewed Jun 20, 2020

View reviewed changes

dilipbiswal force-pushed the disable_hint branch from 3cc45f1 to dd06548 Compare June 20, 2020 03:25

dongjoon-hyun requested changes Jun 20, 2020

View reviewed changes

dongjoon-hyun reviewed Jul 7, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

dongjoon-hyun reviewed Jul 7, 2020

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala Outdated Show resolved Hide resolved

dongjoon-hyun approved these changes Jul 7, 2020

View reviewed changes

maropu approved these changes Jul 7, 2020

View reviewed changes

Review

dbdab2d

dongjoon-hyun closed this in 18aae21 Jul 10, 2020

[SPARK-31875][SQL] Provide a option to disable user supplied Hints #28683

[SPARK-31875][SQL] Provide a option to disable user supplied Hints #28683

Uh oh!

Conversation

dilipbiswal commented May 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

dilipbiswal commented May 31, 2020

Uh oh!

SparkQA commented May 31, 2020

Uh oh!

dilipbiswal commented May 31, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maropu commented May 31, 2020

Uh oh!

SparkQA commented May 31, 2020

Uh oh!

SparkQA commented May 31, 2020

Uh oh!

maropu Jun 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jun 1, 2020

Uh oh!

HyukjinKwon commented Jun 10, 2020

Uh oh!

dongjoon-hyun commented Jun 20, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Jun 20, 2020

Uh oh!

SparkQA commented Jun 20, 2020

Uh oh!

dilipbiswal commented Jun 20, 2020

Uh oh!

SparkQA commented Jun 20, 2020

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented Jun 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dilipbiswal commented Jul 2, 2020

Uh oh!

dilipbiswal commented Jul 2, 2020

Uh oh!

dongjoon-hyun commented Jul 7, 2020

Uh oh!

Uh oh!

dilipbiswal commented May 31, 2020 •

edited

Loading

maropu Jun 1, 2020 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

dilipbiswal commented Jun 20, 2020 •

edited

Loading

cloud-fan commented Jul 2, 2020 •

edited

Loading

dongjoon-hyun commented Jul 10, 2020 •

edited

Loading

dilipbiswal commented Jul 10, 2020 •

edited

Loading