-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-12639][SQL] Improve Explain for Datasources with Handled Predicates #10655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -27,7 +27,7 @@ import org.apache.spark.sql.catalyst.planning.PhysicalOperation | |
| import org.apache.spark.sql.catalyst.plans.logical | ||
| import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan | ||
| import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow, expressions} | ||
| import org.apache.spark.sql.execution.PhysicalRDD.{INPUT_PATHS, PUSHED_FILTERS} | ||
| import org.apache.spark.sql.execution.PhysicalRDD.{HANDLED_FILTERS, INPUT_PATHS} | ||
| import org.apache.spark.sql.execution.SparkPlan | ||
| import org.apache.spark.sql.sources._ | ||
| import org.apache.spark.sql.types.{StringType, StructType} | ||
|
|
@@ -307,8 +307,8 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { | |
|
|
||
| // A set of column attributes that are only referenced by pushed down filters. We can eliminate | ||
| // them from requested columns. | ||
| val handledPredicates = filterPredicates.filterNot(unhandledPredicates.contains) | ||
| val handledSet = { | ||
| val handledPredicates = filterPredicates.filterNot(unhandledPredicates.contains) | ||
| val unhandledSet = AttributeSet(unhandledPredicates.flatMap(_.references)) | ||
| AttributeSet(handledPredicates.flatMap(_.references)) -- | ||
| (projectSet ++ unhandledSet).map(relation.attributeMap) | ||
|
|
@@ -321,8 +321,8 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { | |
| val metadata: Map[String, String] = { | ||
| val pairs = ArrayBuffer.empty[(String, String)] | ||
|
|
||
| if (pushedFilters.nonEmpty) { | ||
| pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]")) | ||
| if (handledPredicates.nonEmpty) { | ||
| pairs += (HANDLED_FILTERS -> handledPredicates.mkString("[", ", ", "]")) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we also keep pushed filters? For some data source like orc, a pushed filter will be evaluated at a coarse grain level instead of on every rows. I think it is better to keep that information.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought 11663 meant all filters are pushed down, regardless so I wondered if that was redundant? It's also still a bit confusing since although it says the filters are "pushed" there is no guarantee that the underlying source will do anything with them at all
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah sorry. I think I understand the change now. |
||
| } | ||
|
|
||
| relation.relation match { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HandledFiltersat here means filters that will be applied to every row inside the data source, right? Is there a better name?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
FilteredAtSourceThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we just delete
PUSHED_FILTERSsince it is not used? I thinkHandledFiltersis a better name.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm