-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-19846][SQL] Add a flag to disable constraint propagation #17186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
44e494b
ae9f037
8318152
3eda726
eb200d6
0e204bc
d3b0a72
d4c9a5e
da09d9f
92f368e
a02c8cb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -186,6 +186,15 @@ object SQLConf { | |
| .booleanConf | ||
| .createWithDefault(false) | ||
|
|
||
| val CONSTRAINT_PROPAGATION_ENABLED = buildConf("spark.sql.constraintPropagation.enabled") | ||
| .internal() | ||
| .doc("When true, the query optimizer will use constraint propagation in query plans to " + | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: 'get around the issue' might sound pretty vague to a non-expert user. How about something along these lines? .doc("When true, the query optimizer will infer and propagate data constraints in the query " +
"plan to optimize them. Constraint propagation can sometimes be computationally expensive" +
"for certain kinds of query plans (such as those with a large number of predicates and " +
"aliases) which might negatively impact overall runtime.")
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks good. |
||
| "perform optimization. Constraint propagation can be computation expensive for large " + | ||
| "query plans. For such queries, disable this flag to get around this issue. Default " + | ||
| "is enabled") | ||
| .booleanConf | ||
| .createWithDefault(true) | ||
|
|
||
| val PARQUET_SCHEMA_MERGING_ENABLED = buildConf("spark.sql.parquet.mergeSchema") | ||
| .doc("When true, the Parquet data source merges schemas collected from all data files, " + | ||
| "otherwise the schema is picked from the summary file or a random data file " + | ||
|
|
@@ -843,6 +852,8 @@ class SQLConf extends Serializable with Logging { | |
|
|
||
| def caseSensitiveAnalysis: Boolean = getConf(SQLConf.CASE_SENSITIVE) | ||
|
|
||
| def constraintPropagationEnabled: Boolean = getConf(CONSTRAINT_PROPAGATION_ENABLED) | ||
|
|
||
| /** | ||
| * Returns the [[Resolver]] for the current configuration, which can be used to determine if two | ||
| * identifiers are equal. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.optimizer | ||
|
|
||
| import org.apache.spark.sql.catalyst.SimpleCatalystConf | ||
| import org.apache.spark.sql.catalyst.dsl.expressions._ | ||
| import org.apache.spark.sql.catalyst.dsl.plans._ | ||
| import org.apache.spark.sql.catalyst.expressions._ | ||
|
|
@@ -31,7 +32,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest { | |
| Batch("InferAndPushDownFilters", FixedPoint(100), | ||
| PushPredicateThroughJoin, | ||
| PushDownPredicate, | ||
| InferFiltersFromConstraints, | ||
| InferFiltersFromConstraints(SimpleCatalystConf(caseSensitiveAnalysis = true)), | ||
| CombineFilters) :: Nil | ||
| } | ||
|
|
||
| object OptimizeDisableConstraintPropagation extends RuleExecutor[LogicalPlan] { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: perhaps |
||
| val batches = | ||
| Batch("InferAndPushDownFilters", FixedPoint(100), | ||
| PushPredicateThroughJoin, | ||
| PushDownPredicate, | ||
| InferFiltersFromConstraints(SimpleCatalystConf(caseSensitiveAnalysis = true, | ||
| constraintPropagationEnabled = false)), | ||
| CombineFilters) :: Nil | ||
| } | ||
|
|
||
|
|
@@ -201,4 +212,10 @@ class InferFiltersFromConstraintsSuite extends PlanTest { | |
| val optimized = Optimize.execute(originalQuery) | ||
| comparePlans(optimized, correctAnswer) | ||
| } | ||
|
|
||
| test("No inferred filter when constraint propagation is disabled") { | ||
| val originalQuery = testRelation.where('a === 1 && 'a === 'b).analyze | ||
| val optimized = OptimizeDisableConstraintPropagation.execute(originalQuery) | ||
| comparePlans(optimized, originalQuery) | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.optimizer | ||
|
|
||
| import org.apache.spark.sql.catalyst.SimpleCatalystConf | ||
| import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases | ||
| import org.apache.spark.sql.catalyst.dsl.expressions._ | ||
| import org.apache.spark.sql.catalyst.dsl.plans._ | ||
|
|
@@ -31,7 +32,7 @@ class OuterJoinEliminationSuite extends PlanTest { | |
| Batch("Subqueries", Once, | ||
| EliminateSubqueryAliases) :: | ||
| Batch("Outer Join Elimination", Once, | ||
| EliminateOuterJoin, | ||
| EliminateOuterJoin(SimpleCatalystConf(caseSensitiveAnalysis = true)), | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we add a test for outer join elimination as well?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added a test. |
||
| PushPredicateThroughJoin) :: Nil | ||
| } | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -17,6 +17,7 @@ | |
|
|
||
| package org.apache.spark.sql.catalyst.optimizer | ||
|
|
||
| import org.apache.spark.sql.catalyst.SimpleCatalystConf | ||
| import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases | ||
| import org.apache.spark.sql.catalyst.dsl.expressions._ | ||
| import org.apache.spark.sql.catalyst.dsl.plans._ | ||
|
|
@@ -33,7 +34,19 @@ class PruneFiltersSuite extends PlanTest { | |
| EliminateSubqueryAliases) :: | ||
| Batch("Filter Pushdown and Pruning", Once, | ||
| CombineFilters, | ||
| PruneFilters, | ||
| PruneFilters(SimpleCatalystConf(caseSensitiveAnalysis = true)), | ||
| PushDownPredicate, | ||
| PushPredicateThroughJoin) :: Nil | ||
| } | ||
|
|
||
| object OptimizeDisableConstraintPropagation extends RuleExecutor[LogicalPlan] { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: same as above |
||
| val batches = | ||
| Batch("Subqueries", Once, | ||
| EliminateSubqueryAliases) :: | ||
| Batch("Filter Pushdown and Pruning", Once, | ||
| CombineFilters, | ||
| PruneFilters(SimpleCatalystConf(caseSensitiveAnalysis = true, | ||
| constraintPropagationEnabled = false)), | ||
| PushDownPredicate, | ||
| PushPredicateThroughJoin) :: Nil | ||
| } | ||
|
|
@@ -133,4 +146,28 @@ class PruneFiltersSuite extends PlanTest { | |
| val correctAnswer = testRelation.where(Rand(10) > 5).where(Rand(10) > 5).select('a).analyze | ||
| comparePlans(optimized, correctAnswer) | ||
| } | ||
|
|
||
| test("No pruning when constraint propagation is disabled") { | ||
| val tr1 = LocalRelation('a.int, 'b.int, 'c.int).subquery('tr1) | ||
| val tr2 = LocalRelation('a.int, 'd.int, 'e.int).subquery('tr2) | ||
|
|
||
| val query = tr1 | ||
| .where("tr1.a".attr > 10 || "tr1.c".attr < 10) | ||
| .join(tr2.where('d.attr < 100), Inner, Some("tr1.a".attr === "tr2.a".attr)) | ||
|
|
||
| val queryWithUselessFilter = | ||
| query.where( | ||
| ("tr1.a".attr > 10 || "tr1.c".attr < 10) && | ||
| 'd.attr < 100) | ||
|
|
||
| val optimized = OptimizeDisableConstraintPropagation.execute(queryWithUselessFilter.analyze) | ||
| // When constraint propagation is disabled, the useless filter won't be pruned. | ||
| // It gets pushed down. Because the rule `CombineFilters` runs only once, there are redundant | ||
| // and duplicate filters. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This behaviour does not make sense to me. If I write a query like
I expect that Spark evaluates the predicate only once. The wording of "constraint propagation" is misleading. In this example, there is no activity of propagation at all. Perhaps we want to distinguish the "constraints" between the ones written originally and the ones that are inferred from relationships with other predicates. When the "propagation" (or perhaps a more meaningful term "predicate inference") is set to OFF, we want to exclude those inferred predicates in the
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What needs to clarify is, this behaviour is just limited to this test case. That is why I added the comment. In normal optimization,
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am aware of it. My point is when users turn on this setting in hope of alleviating the long compilation time, they will get this "unintentional" side effect that could lengthen the execution time of evaluating the same predicate twice. Overall, I agree with your approach but the point I raised could be a followup work.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a workaround in short term. Actually I have proposed another approach to bring new data structure for constraint propagation in #16998. But it is more complex and may need more time to consider and review. |
||
| val correctAnswer = tr1 | ||
| .where("tr1.a".attr > 10 || "tr1.c".attr < 10).where("tr1.a".attr > 10 || "tr1.c".attr < 10) | ||
| .join(tr2.where('d.attr < 100).where('d.attr < 100), | ||
| Inner, Some("tr1.a".attr === "tr2.a".attr)).analyze | ||
| comparePlans(optimized, correctAnswer) | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. missing it.