-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-27674][SQL] the hint should not be dropped after cache lookup #24580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,7 +23,8 @@ import org.apache.hadoop.fs.{FileSystem, Path} | |
|
|
||
| import org.apache.spark.internal.Logging | ||
| import org.apache.spark.sql.{Dataset, SparkSession} | ||
| import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, SubqueryExpression} | ||
| import org.apache.spark.sql.catalyst.expressions.{Attribute, SubqueryExpression} | ||
| import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint | ||
| import org.apache.spark.sql.catalyst.plans.logical.{IgnoreCachedData, LogicalPlan, ResolvedHint} | ||
| import org.apache.spark.sql.execution.columnar.InMemoryRelation | ||
| import org.apache.spark.sql.execution.command.CommandUtils | ||
|
|
@@ -212,17 +213,18 @@ class CacheManager extends Logging { | |
| def useCachedData(plan: LogicalPlan): LogicalPlan = { | ||
| val newPlan = plan transformDown { | ||
| case command: IgnoreCachedData => command | ||
| // Do not lookup the cache by hint node. Hint node is special, we should ignore it when | ||
| // canonicalizing plans, so that plans which are same except hint can hit the same cache. | ||
| // However, we also want to keep the hint info after cache lookup. Here we skip the hint | ||
| // node, so that the returned caching plan won't replace the hint node and drop the hint info | ||
| // from the original plan. | ||
| case hint: ResolvedHint => hint | ||
|
|
||
| case currentFragment => | ||
| lookupCachedData(currentFragment) | ||
| .map(_.cachedRepresentation.withOutput(currentFragment.output)) | ||
| .getOrElse(currentFragment) | ||
| lookupCachedData(currentFragment).map { cached => | ||
| // After cache lookup, we should still keep the hints from the input plan. | ||
| val hints = EliminateResolvedHint.extractHintsFromPlan(currentFragment)._2 | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's natural to return the hints in a top-down fashion. And the caller side is free to process the returned hints, including reverse it. |
||
| val cachedPlan = cached.cachedRepresentation.withOutput(currentFragment.output) | ||
| // The returned hint list is in top-down order, we should create the hint nodes from | ||
| // right to left. | ||
| hints.foldRight[LogicalPlan](cachedPlan) { case (hint, p) => | ||
| ResolvedHint(p, hint) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the same (semantically) as original cached plan? We can take one example in added test:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The semantic of a hint node is special. By design only join node has hints, so
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, I see. Makes sense and it's fine. |
||
| } | ||
| }.getOrElse(currentFragment) | ||
| } | ||
|
|
||
| newPlan transformAllExpressions { | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the original cached plan has a hint, should we keep/respect them? We need to define a clear behavior in our cache manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter, because
semanticEquals, so having the hint node in the plan has no effect.InMemoryRelation, which has no hint.I think the behavior is pretty clear: for any query, the hint behavior should be the same no matter some sub-plans are cached or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we ignore the hints that are specified in the original cached plans. If users want to use hints, they should specify them in the queries.