-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31875][SQL] Provide a option to disable user supplied Hints #28683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #123330 has finished for PR 28683 at commit
|
|
retest this please |
| buildConf("spark.sql.optimizer.hints.enabled") | ||
| .internal() | ||
| .doc("Hints are additional directives that aids optimizer in better planning of a query." + | ||
| " This configuration when set to `false`, disables the application of user" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency, could you move the space in the head into the tail?
| def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { | ||
| case hint @ UnresolvedHint(hintName, _, _) => hintName.toUpperCase(Locale.ROOT) match { | ||
| case hint @ UnresolvedHint(hintName, _, _) | ||
| if conf.optimizerHintsEnabled => hintName.toUpperCase(Locale.ROOT) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit format;
case hint @ UnresolvedHint(hintName, _, _) if conf.optimizerHintsEnabled =>
hintName.toUpperCase(Locale.ROOT) match {
case "REPARTITION" =>
createRepartition(shuffle = true, hint)
...
| case h: UnresolvedHint if STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) => | ||
| case h: UnresolvedHint | ||
| if (conf.optimizerHintsEnabled && | ||
| STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT))) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit format;
case h: UnresolvedHint if conf.optimizerHintsEnabled &&
STRATEGY_HINT_NAMES.contains(h.name.toUpperCase(Locale.ROOT)) =>
if (h.parameters.isEmpty) {
| UnresolvedHint("COALESCE", Seq(Literal(10)), table("TaBlE")), | ||
| testRelation, | ||
| caseSensitive = true, | ||
| enableHints = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use withSQLConf instead for the tests? I think enableHints is only used for this test suite, so we don't need to add the option in AnalysisTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Should i move this test to SQLQuerySuite ? Actually i tried initially to do withSQLConf, but realized that this suite creates an Analyzer by constructing a SQLConf inside AnalysisTest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yea. that sounds better.
|
Adding this config looks fine to me. also cc: @maryannxue |
|
Test build #123336 has finished for PR 28683 at commit
|
|
Test build #123347 has finished for PR 28683 at commit
|
| .internal() | ||
| .doc("Hints are additional directives that aids optimizer in better planning of a query. " + | ||
| "This configuration when set to `false`, disables the application of user " + | ||
| "specified hints.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about rephrasing it like this? (It seems the ohter similar .enabled configs say When true, brabrabra)
When false, the optimizer will ignore user-specified hints that are additional directives
for better planning of a query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu Sounds good to me. Will change. Thank you.
|
Test build #123364 has finished for PR 28683 at commit
|
|
Looks okay to me too but @maryannxue can you take a look for sure? |
|
Could you resolve a conflict, @dilipbiswal ? |
| .createWithDefault(true) | ||
|
|
||
| val OPTIMIZER_HINTS_ENABLED = | ||
| buildConf("spark.sql.optimizer.hints.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have more direct names like OPTIMIZER_IGNORE_HINTS? Maybe, spark.sql.optimizer.ignoreHints.enabled like spark.files.ignoreMissingFiles or spark.files.ignoreCorruptFiles?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun Thank you. I have made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little confused, why this config named optimizer, does it used in analyze ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ulysses-you To the user, its an optimization we are disabling. Thats why the "optimizer" name in the config.
Implementation wise, we handle it by keeping the nodes unresolved. But semantic wise we are disabling an optimization.
|
Thank you, @dilipbiswal . The feature looks useful to me. |
3cc45f1 to
dd06548
Compare
|
Test build #124310 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #124315 has finished for PR 28683 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @dilipbiswal , @maropu , @HyukjinKwon .
The current approach looks straightforward, but seems to unable to disable user supplied hints from SparkExtension analyzer rules. Although we can document this limitation as a conf description, I believe it would be great if this new option disable hints for all the other spark feature combinations like SparkExtension. Maybe, we need to remove the hint at the beginning.
@dilipbiswal . Could you add a test case for SparkExtension?
Also, cc @gatorsmile and @cloud-fan since there might be more extension points from their side. Or, we may want to proceed in this AS-IS status.
|
@dongjoon-hyun Thank you for the comments. I was thinking, if users add hints through extensions, isn't it reasonable to expect them to code to this new configuration ? I have never tried it myself, but i believe one can extend the catalyst parser and have hints implemented by a different syntax and have different logical nodes to represent it making it completely opaque to us ? |
|
Shall we add a rule to remove all hint nodes at the beginning if the conf is set? This is future-proof in case we add new hint resolution rules in the future. I don't think we should deal with hint nodes added by custom rules. Hints should be added by end-users. Customer rules can add whatever plan nodes directly and we can't control it. |
SGTM. |
|
retest this please |
|
Retest this please. |
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
Outdated
Show resolved
Hide resolved
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM with only a few minor left-overs.
|
Test build #125277 has finished for PR 28683 at commit
|
|
Test build #125262 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125303 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125325 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125346 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125356 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125370 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125408 has finished for PR 28683 at commit
|
|
Retest this please. |
|
Test build #125449 has finished for PR 28683 at commit
|
|
retest this please |
|
Test build #125488 has started for PR 28683 at commit |
|
retest this please |
|
Test build #125497 has started for PR 28683 at commit |
|
The last run passed all Java/Scala/Python UTs, but it will fail due to timeout during R part. Merged to master. Thank you, @dilipbiswal and all! |
|
Test FAILed. |
|
@dongjoon-hyun @maropu @cloud-fan @ulysses-you Thanks a lot. |
What changes were proposed in this pull request?
Introduce a new SQL config
spark.sql.optimizer.ignoreHints. When this is set to trueapplication of hints are disabled. This is similar to Oracle's OPTIMIZER_IGNORE_HINTS.
This can be helpful to study the impact of performance difference when hints are applied vs when they are not.
Why are the changes needed?
Can be helpful to study the impact of performance difference when hints are applied vs when they are not.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New tests added in ResolveHintsSuite.