-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-32859][SQL] Introduce physical rule to decide bucketing dynamically #29804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a7193cf
7ffcdbc
6a8c7a5
a15f864
b45ee8b
9aa0266
26c4338
336337a
1012334
057c021
e0c76c9
fbe0c06
4027175
f2ceacd
b29f688
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -951,6 +951,17 @@ object SQLConf { | |
| .checkValue(_ > 0, "the value of spark.sql.sources.bucketing.maxBuckets must be greater than 0") | ||
| .createWithDefault(100000) | ||
|
|
||
| val AUTO_BUCKETED_SCAN_ENABLED = | ||
| buildConf("spark.sql.sources.bucketing.autoBucketedScan.enabled") | ||
| .doc("When true, decide whether to do bucketed scan on input tables based on query plan " + | ||
| "automatically. Do not use bucketed scan if 1. query does not have operators to utilize " + | ||
| "bucketing (e.g. join, group-by, etc), or 2. there's an exchange operator between these " + | ||
| s"operators and table scan. Note when '${BUCKETING_ENABLED.key}' is set to " + | ||
| "false, this configuration does not take any effect.") | ||
| .version("3.1.0") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. btw, we need to make this config external? If we just add this config for keeping the current behaviour, is it okay to add it as internal one?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maropu - sure, just for my own education, what does it indicate to make a config internal/external?
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC we don't have any strict rule for that. But, I think this new rule works well in most queries, so adding this as external looks less meaning because I think most users don't turn this feature off.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maropu - sure, updated. |
||
| .booleanConf | ||
| .createWithDefault(false) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we can follow AQE and only disable it for table cache. See https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L82
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan - thanks for pointing it out. Created https://issues.apache.org/jira/browse/SPARK-33075 for followup, cc @viirya in case there's any other regression for enabling auto bucketed scan, except cached query. |
||
|
|
||
| val CROSS_JOINS_ENABLED = buildConf("spark.sql.crossJoin.enabled") | ||
| .internal() | ||
| .doc("When false, we will throw an error if a query contains a cartesian product without " + | ||
|
|
@@ -3164,6 +3175,8 @@ class SQLConf extends Serializable with Logging { | |
|
|
||
| def bucketingMaxBuckets: Int = getConf(SQLConf.BUCKETING_MAX_BUCKETS) | ||
|
|
||
| def autoBucketedScanEnabled: Boolean = getConf(SQLConf.AUTO_BUCKETED_SCAN_ENABLED) | ||
|
|
||
| def dataFrameSelfJoinAutoResolveAmbiguity: Boolean = | ||
| getConf(DATAFRAME_SELF_JOIN_AUTO_RESOLVE_AMBIGUITY) | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -156,7 +156,9 @@ case class RowDataSourceScanExec( | |
| * @param optionalBucketSet Bucket ids for bucket pruning. | ||
| * @param optionalNumCoalescedBuckets Number of coalesced buckets. | ||
| * @param dataFilters Filters on non-partition columns. | ||
| * @param tableIdentifier identifier for the table in the metastore. | ||
| * @param tableIdentifier Identifier for the table in the metastore. | ||
| * @param disableBucketedScan Disable bucketed scan based on physical query plan, see rule | ||
| * [[DisableUnnecessaryBucketedScan]] for details. | ||
| */ | ||
| case class FileSourceScanExec( | ||
| @transient relation: HadoopFsRelation, | ||
|
|
@@ -166,7 +168,8 @@ case class FileSourceScanExec( | |
| optionalBucketSet: Option[BitSet], | ||
| optionalNumCoalescedBuckets: Option[Int], | ||
| dataFilters: Seq[Expression], | ||
| tableIdentifier: Option[TableIdentifier]) | ||
| tableIdentifier: Option[TableIdentifier], | ||
| disableBucketedScan: Boolean = false) | ||
| extends DataSourceScanExec { | ||
|
|
||
| // Note that some vals referring the file-based relation are lazy intentionally | ||
|
|
@@ -257,7 +260,8 @@ case class FileSourceScanExec( | |
|
|
||
| // exposed for testing | ||
| lazy val bucketedScan: Boolean = { | ||
| if (relation.sparkSession.sessionState.conf.bucketingEnabled && relation.bucketSpec.isDefined) { | ||
| if (relation.sparkSession.sessionState.conf.bucketingEnabled && relation.bucketSpec.isDefined | ||
| && !disableBucketedScan) { | ||
| val spec = relation.bucketSpec.get | ||
| val bucketColumns = spec.bucketColumnNames.flatMap(n => toAttribute(n)) | ||
| bucketColumns.size == spec.bucketColumnNames.size | ||
|
|
@@ -348,20 +352,23 @@ case class FileSourceScanExec( | |
| "DataFilters" -> seqToString(dataFilters), | ||
| "Location" -> locationDesc) | ||
|
|
||
| val withSelectedBucketsCount = relation.bucketSpec.map { spec => | ||
| val numSelectedBuckets = optionalBucketSet.map { b => | ||
| b.cardinality() | ||
| // TODO(SPARK-32986): Add bucketed scan info in explain output of FileSourceScanExec | ||
| if (bucketedScan) { | ||
| relation.bucketSpec.map { spec => | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maropu - just for my own education, why does it matter? Updated anyway.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, I remember the previous discussion: https://issues.apache.org/jira/browse/SPARK-16694
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, please only use
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cloud-fan , @maropu - I changed the code during iterations. The current change is just adding a |
||
| val numSelectedBuckets = optionalBucketSet.map { b => | ||
| b.cardinality() | ||
| } getOrElse { | ||
| spec.numBuckets | ||
| } | ||
| metadata + ("SelectedBucketsCount" -> | ||
| (s"$numSelectedBuckets out of ${spec.numBuckets}" + | ||
| optionalNumCoalescedBuckets.map { b => s" (Coalesced to $b)"}.getOrElse(""))) | ||
| } getOrElse { | ||
| spec.numBuckets | ||
| metadata | ||
| } | ||
| metadata + ("SelectedBucketsCount" -> | ||
| (s"$numSelectedBuckets out of ${spec.numBuckets}" + | ||
| optionalNumCoalescedBuckets.map { b => s" (Coalesced to $b)"}.getOrElse(""))) | ||
| } getOrElse { | ||
| } else { | ||
| metadata | ||
| } | ||
|
|
||
| withSelectedBucketsCount | ||
| } | ||
|
|
||
| override def verboseStringWithOperatorId(): String = { | ||
|
|
@@ -539,6 +546,7 @@ case class FileSourceScanExec( | |
| .getOrElse(sys.error(s"Invalid bucket file ${f.filePath}")) | ||
| } | ||
|
|
||
| // TODO(SPARK-32985): Decouple bucket filter pruning and bucketed table scan | ||
| val prunedFilesGroupedToBuckets = if (optionalBucketSet.isDefined) { | ||
| val bucketSet = optionalBucketSet.get | ||
| filesGroupedToBuckets.filter { | ||
|
|
@@ -624,6 +632,7 @@ case class FileSourceScanExec( | |
| optionalBucketSet, | ||
| optionalNumCoalescedBuckets, | ||
| QueryPlan.normalizePredicates(dataFilters, output), | ||
| None) | ||
| None, | ||
| disableBucketedScan) | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,161 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.sql.execution.bucketing | ||
|
|
||
| import org.apache.spark.sql.catalyst.plans.physical.{ClusteredDistribution, HashClusteredDistribution} | ||
| import org.apache.spark.sql.catalyst.rules.Rule | ||
| import org.apache.spark.sql.execution.{FileSourceScanExec, FilterExec, ProjectExec, SortExec, SparkPlan} | ||
| import org.apache.spark.sql.execution.aggregate.BaseAggregateExec | ||
| import org.apache.spark.sql.execution.exchange.Exchange | ||
| import org.apache.spark.sql.internal.SQLConf | ||
|
|
||
| /** | ||
| * Disable unnecessary bucketed table scan based on actual physical query plan. | ||
| * NOTE: this rule is designed to be applied right after [[EnsureRequirements]], | ||
| * where all [[ShuffleExchangeExec]] and [[SortExec]] have been added to plan properly. | ||
| * | ||
| * When BUCKETING_ENABLED and AUTO_BUCKETED_SCAN_ENABLED are set to true, go through | ||
| * query plan to check where bucketed table scan is unnecessary, and disable bucketed table | ||
| * scan if: | ||
| * | ||
| * 1. The sub-plan from root to bucketed table scan, does not contain | ||
| * [[hasInterestingPartition]] operator. | ||
| * | ||
| * 2. The sub-plan from the nearest downstream [[hasInterestingPartition]] operator | ||
| * to the bucketed table scan, contains only [[isAllowedUnaryExecNode]] operators | ||
| * and at least one [[Exchange]]. | ||
| * | ||
| * Examples: | ||
| * 1. no [[hasInterestingPartition]] operator: | ||
| * Project | ||
| * | | ||
| * Filter | ||
| * | | ||
| * Scan(t1: i, j) | ||
| * (bucketed on column j, DISABLE bucketed scan) | ||
| * | ||
| * 2. join: | ||
| * SortMergeJoin(t1.i = t2.j) | ||
| * / \ | ||
| * Sort(i) Sort(j) | ||
| * / \ | ||
| * Shuffle(i) Scan(t2: i, j) | ||
| * / (bucketed on column j, enable bucketed scan) | ||
| * Scan(t1: i, j) | ||
| * (bucketed on column j, DISABLE bucketed scan) | ||
| * | ||
| * 3. aggregate: | ||
| * HashAggregate(i, ..., Final) | ||
| * | | ||
| * Shuffle(i) | ||
| * | | ||
| * HashAggregate(i, ..., Partial) | ||
| * | | ||
| * Filter | ||
| * | | ||
| * Scan(t1: i, j) | ||
| * (bucketed on column j, DISABLE bucketed scan) | ||
| * | ||
| * The idea of [[hasInterestingPartition]] is inspired from "interesting order" in | ||
| * the paper "Access Path Selection in a Relational Database Management System" | ||
| * (https://dl.acm.org/doi/10.1145/582095.582099). | ||
| */ | ||
| case class DisableUnnecessaryBucketedScan(conf: SQLConf) extends Rule[SparkPlan] { | ||
|
|
||
| /** | ||
| * Disable bucketed table scan with pre-order traversal of plan. | ||
| * | ||
| * @param withInterestingPartition The traversed plan has operator with interesting partition. | ||
| * @param withExchange The traversed plan has [[Exchange]] operator. | ||
| * @param withAllowedNode The traversed plan has only [[isAllowedUnaryExecNode]] operators. | ||
| */ | ||
| private def disableBucketWithInterestingPartition( | ||
| plan: SparkPlan, | ||
| withInterestingPartition: Boolean, | ||
| withExchange: Boolean, | ||
| withAllowedNode: Boolean): SparkPlan = { | ||
| plan match { | ||
| case p if hasInterestingPartition(p) => | ||
| // Operator with interesting partition, propagates `withInterestingPartition` as true | ||
| // to its children, and resets `withExchange` and `withAllowedNode`. | ||
| p.mapChildren(disableBucketWithInterestingPartition(_, true, false, true)) | ||
| case exchange: Exchange => | ||
| // Exchange operator propagates `withExchange` as true to its child. | ||
| exchange.mapChildren(disableBucketWithInterestingPartition( | ||
| _, withInterestingPartition, true, withAllowedNode)) | ||
| case scan: FileSourceScanExec => | ||
| if (isBucketedScanWithoutFilter(scan)) { | ||
| if (!withInterestingPartition || (withExchange && withAllowedNode)) { | ||
| scan.copy(disableBucketedScan = true) | ||
| } else { | ||
| scan | ||
| } | ||
| } else { | ||
| scan | ||
| } | ||
| case o => | ||
| o.mapChildren(disableBucketWithInterestingPartition( | ||
| _, | ||
| withInterestingPartition, | ||
| withExchange, | ||
| withAllowedNode && isAllowedUnaryExecNode(o))) | ||
| } | ||
| } | ||
|
|
||
| private def hasInterestingPartition(plan: SparkPlan): Boolean = { | ||
| plan.requiredChildDistribution.exists { | ||
| case _: ClusteredDistribution | _: HashClusteredDistribution => true | ||
| case _ => false | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Check if the operator is allowed single-child operator. | ||
| * We may revisit this method later as we probably can | ||
| * remove this restriction to allow arbitrary operator between | ||
| * bucketed table scan and operator with interesting partition. | ||
| */ | ||
| private def isAllowedUnaryExecNode(plan: SparkPlan): Boolean = { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add description why this is needed? HasInterestingPartition and at lease one Exchange sounds obvious condition, but this allowed unary exec node is not. Why we can disable bucketed scan only if those exec nodes?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @viirya - this is good question. I agree we can be more bold and we probably don't need a whitelist operators here, e.g. SMJ - shuffle - BHJ - scan, SMJ - Shuffle - union - Scan (and another scan) should also work, but my feeling is to start with more confidence change first and improve later. With a whitelist operators here, we have a high confidence that this feature should work without introducing regression, but much less confidence if we allow arbitrary operators in the middle (at least for me). For now, to be honest, I cannot find a case why arbitrary operators cannot work. But I want to play safer in the beginning and any future improvement for this is much welcomed. cc @cloud-fan and @maropu for thoughts. Added a comment for now.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my opnion, it is okay for this PR to focus on a basic (minimal) support for the auto bucket scan. In followup activities, I think we can optimize it step-by-step by adding test cases and checking performance improvements... (Anyway, it would be better to leave some comment there about it as @viirya suggested above)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make the code clear for developer and maintainer, so leaving some comment is nicer if we want to constrain the scope of this rule for now.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I already added a comment in last iteration. Please suggest concretely for alternative comment if it's not looking good. Thanks. |
||
| plan match { | ||
| case _: SortExec | _: ProjectExec | _: FilterExec => true | ||
| case partialAgg: BaseAggregateExec => | ||
| partialAgg.requiredChildDistributionExpressions.isEmpty | ||
| case _ => false | ||
| } | ||
| } | ||
|
|
||
| private def isBucketedScanWithoutFilter(scan: FileSourceScanExec): Boolean = { | ||
| // Do not disable bucketed table scan if it has filter pruning, | ||
| // because bucketed table scan is still useful here to save CPU/IO cost with | ||
| // only reading selected bucket files. | ||
| scan.bucketedScan && scan.optionalBucketSet.isEmpty | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if a scan operator reads most buckets? e.g., 999 of 1000 buckets. We select bucket scans even in this case?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maropu - this is a good question, and I think it is kind of out of scope for this PR and needs more thoughts later. We don't have a cost model to decide whether to do (bucketed filter + bucketed scan) vs (normal filter + non-bucketed scan). It can depend on number of buckets, size of filtered buckets, CPU cost for filter, etc.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with it for now. Technically I think filter by bucket ID and bucketed scan don't need to be coupled. We can always filter files by bucket id, and then do bucketed scan or not according to this rule.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yea, okay. Could you file jira later, @c21 ?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @maropu - sure, filed https://issues.apache.org/jira/browse/SPARK-32985 . |
||
| } | ||
|
|
||
| def apply(plan: SparkPlan): SparkPlan = { | ||
| lazy val hasBucketedScanWithoutFilter = plan.find { | ||
| case scan: FileSourceScanExec => isBucketedScanWithoutFilter(scan) | ||
| case _ => false | ||
| }.isDefined | ||
|
|
||
| if (!conf.bucketingEnabled || !conf.autoBucketedScanEnabled || !hasBucketedScanWithoutFilter) { | ||
| plan | ||
| } else { | ||
| disableBucketWithInterestingPartition(plan, false, false, true) | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our user documents are generated based on this statement, could you describe a bit more about how to decide whether to do bucketed scans or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu - sure, wondering what do you think of below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks okay, but I have the same suggestion with #29804 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maropu - sure, updated.