Skip to content

Conversation

@dilipbiswal
Copy link
Contributor

@dilipbiswal dilipbiswal commented May 31, 2020

What changes were proposed in this pull request?

Introduce a new SQL config spark.sql.optimizer.ignoreHints. When this is set to true
application of hints are disabled. This is similar to Oracle's OPTIMIZER_IGNORE_HINTS.
This can be helpful to study the impact of performance difference when hints are applied vs when they are not.

Why are the changes needed?

Can be helpful to study the impact of performance difference when hints are applied vs when they are not.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests added in ResolveHintsSuite.

@dilipbiswal
Copy link
Contributor Author

cc @gatorsmile @maropu

@dilipbiswal dilipbiswal changed the title [SPARK-31875] Provide a option to disabling user supplied Hints [SPARK-31875][SQL] Provide a option to disabling user supplied Hints May 31, 2020
@SparkQA
Copy link

SparkQA commented May 31, 2020

Test build #123330 has finished for PR 28683 at commit d2c3eda.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@dilipbiswal dilipbiswal changed the title [SPARK-31875][SQL] Provide a option to disabling user supplied Hints [SPARK-31875][SQL] Provide a option to disable user supplied Hints May 31, 2020
buildConf("spark.sql.optimizer.hints.enabled")
.internal()
.doc("Hints are additional directives that aids optimizer in better planning of a query." +
" This configuration when set to `false`, disables the application of user" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency, could you move the space in the head into the tail?

def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
case hint @ UnresolvedHint(hintName, _, _) => hintName.toUpperCase(Locale.ROOT) match {
case hint @ UnresolvedHint(hintName, _, _)
if conf.optimizerHintsEnabled => hintName.toUpperCase(Locale.ROOT) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit format;

      case hint @ UnresolvedHint(hintName, _, _) if conf.optimizerHintsEnabled =>
        hintName.toUpperCase(Locale.ROOT) match {
          case "REPARTITION" =>
            createRepartition(shuffle = true, hint)
            ...

case h: UnresolvedHint if STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
case h: UnresolvedHint
if (conf.optimizerHintsEnabled &&
STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT))) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit format;

      case h: UnresolvedHint if conf.optimizerHintsEnabled &&
          STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
        if (h.parameters.isEmpty) {

UnresolvedHint("COALESCE", Seq(Literal(10)), table("TaBlE")),
testRelation,
caseSensitive = true,
enableHints = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use withSQLConf instead for the tests? I think enableHints is only used for this test suite, so we don't need to add the option in AnalysisTest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Should i move this test to SQLQuerySuite ? Actually i tried initially to do withSQLConf, but realized that this suite creates an Analyzer by constructing a SQLConf inside AnalysisTest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yea. that sounds better.

@maropu
Copy link
Member

maropu commented May 31, 2020

Adding this config looks fine to me. also cc: @maryannxue

@SparkQA
Copy link

SparkQA commented May 31, 2020

Test build #123336 has finished for PR 28683 at commit d2c3eda.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 31, 2020

Test build #123347 has finished for PR 28683 at commit 4ba042b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.internal()
.doc("Hints are additional directives that aids optimizer in better planning of a query. " +
"This configuration when set to `false`, disables the application of user " +
"specified hints.")
Copy link
Member

@maropu maropu Jun 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: How about rephrasing it like this? (It seems the ohter similar .enabled configs say When true, brabrabra)

When false, the optimizer will ignore user-specified hints that are additional directives
for better planning of a query.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maropu Sounds good to me. Will change. Thank you.

@SparkQA
Copy link

SparkQA commented Jun 1, 2020

Test build #123364 has finished for PR 28683 at commit 3cc45f1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Looks okay to me too but @maryannxue can you take a look for sure?

@dongjoon-hyun
Copy link
Member

Could you resolve a conflict, @dilipbiswal ?

.createWithDefault(true)

val OPTIMIZER_HINTS_ENABLED =
buildConf("spark.sql.optimizer.hints.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have more direct names like OPTIMIZER_IGNORE_HINTS? Maybe, spark.sql.optimizer.ignoreHints.enabled like spark.files.ignoreMissingFiles or spark.files.ignoreCorruptFiles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun Thank you. I have made the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little confused, why this config named optimizer, does it used in analyze ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ulysses-you To the user, its an optimization we are disabling. Thats why the "optimizer" name in the config.
Implementation wise, we handle it by keeping the nodes unresolved. But semantic wise we are disabling an optimization.

@dongjoon-hyun
Copy link
Member

Thank you, @dilipbiswal . The feature looks useful to me.

@SparkQA
Copy link

SparkQA commented Jun 20, 2020

Test build #124310 has finished for PR 28683 at commit dd06548.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 20, 2020

Test build #124315 has finished for PR 28683 at commit dd06548.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @dilipbiswal , @maropu , @HyukjinKwon .

The current approach looks straightforward, but seems to unable to disable user supplied hints from SparkExtension analyzer rules. Although we can document this limitation as a conf description, I believe it would be great if this new option disable hints for all the other spark feature combinations like SparkExtension. Maybe, we need to remove the hint at the beginning.

@dilipbiswal . Could you add a test case for SparkExtension?

Also, cc @gatorsmile and @cloud-fan since there might be more extension points from their side. Or, we may want to proceed in this AS-IS status.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Jun 20, 2020

@dongjoon-hyun Thank you for the comments. I was thinking, if users add hints through extensions, isn't it reasonable to expect them to code to this new configuration ? I have never tried it myself, but i believe one can extend the catalyst parser and have hints implemented by a different syntax and have different logical nodes to represent it making it completely opaque to us ?

@cloud-fan
Copy link
Contributor

cloud-fan commented Jul 2, 2020

Shall we add a rule to remove all hint nodes at the beginning if the conf is set? This is future-proof in case we add new hint resolution rules in the future. I don't think we should deal with hint nodes added by custom rules. Hints should be added by end-users. Customer rules can add whatever plan nodes directly and we can't control it.

@dilipbiswal
Copy link
Contributor Author

@cloud-fan

Shall we add a rule to remove all hint nodes at the beginning if the conf is set? This is future-proof in case we add new hint resolution rules in the future.

SGTM.

@dilipbiswal
Copy link
Contributor Author

retest this please

@dongjoon-hyun
Copy link
Member

Retest this please.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM with only a few minor left-overs.

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125277 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125262 has finished for PR 28683 at commit f3d030f.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 8, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125303 has finished for PR 28683 at commit dbdab2d.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125325 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 8, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125346 has finished for PR 28683 at commit dbdab2d.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125356 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 8, 2020

Test build #125370 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125408 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

Retest this please.

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125449 has finished for PR 28683 at commit dbdab2d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125488 has started for PR 28683 at commit dbdab2d.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jul 9, 2020

Test build #125497 has started for PR 28683 at commit dbdab2d.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Jul 10, 2020

The last run passed all Java/Scala/Python UTs, but it will fail due to timeout during R part.
So, I verified the R UT and packaging part manually.

══ testthat results  ═══════════════════════════════════════════════════════════
[ OK: 13 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 0 ]
✔ |  OK F W S | Context
✔ |  11       | binary functions [1.3 s]
✔ |   4       | functions on binary files [1.0 s]
✔ |   2       | broadcast variables [0.3 s]
✔ |   5       | functions in client.R
✔ |  46       | test functions in sparkR.R [3.5 s]
✔ |   2       | include R packages [0.6 s]
✔ |   2       | JVM API [0.1 s]
✔ |  75       | MLlib classification algorithms, except for tree-based algorithms [53.2 s]
✔ |  70       | MLlib clustering algorithms [21.0 s]
✔ |   6       | MLlib frequent pattern mining [2.3 s]
✔ |   8       | MLlib recommendation algorithms [4.0 s]
✔ | 136       | MLlib regression algorithms, except for tree-based algorithms [70.0 s]
✔ |   8       | MLlib statistics algorithms [0.8 s]
✔ |  94       | MLlib tree-based algorithms [41.8 s]
✔ |  29       | parallelize() and collect() [0.3 s]
✔ | 428       | basic RDD functions [38.9 s]
✔ |  39       | SerDe functionality [2.2 s]
✔ |  20       | partitionBy, groupByKey, reduceByKey etc. [5.9 s]
✔ |   4       | functions in sparkR.R
✔ |  16       | SparkSQL Arrow optimization [30.1 s]
✔ |   6       | test show SparkDataFrame when eager execution is enabled. [0.8 s]
✔ | 1177       | SparkSQL functions [121.2 s]
✔ |  42       | Structured Streaming [53.6 s]
✔ |  16       | tests RDD function take() [0.6 s]
✔ |  14       | the textFile() function [2.7 s]
✔ |  46       | functions in utils.R [0.5 s]
✔ |   0     1 | Windows-specific tests
────────────────────────────────────────────────────────────────────────────────
test_Windows.R:22: skip: sparkJars tag in SparkContext
Reason: This test is only for Windows, skipped
────────────────────────────────────────────────────────────────────────────────

══ Results ═════════════════════════════════════════════════════════════════════
Duration: 456.7 s

OK:       2306
Failed:   0
Warnings: 0
Skipped:  1

Merged to master. Thank you, @dilipbiswal and all!

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125497/
Test FAILed.

@dilipbiswal
Copy link
Contributor Author

dilipbiswal commented Jul 10, 2020

@dongjoon-hyun @maropu @cloud-fan @ulysses-you Thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants