[SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1#33698
[SPARK-32986][SQL] Add bucketed scan info in query plan of data source v1#33698c21 wants to merge 2 commits intoapache:masterfrom
Conversation
|
@cloud-fan and @maropu could you help take a look when you have time? Thanks! |
|
Kubernetes integration test starting |
|
Kubernetes integration test status success |
| withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") { | ||
| checkKeywordsExistsInExplain( | ||
| df1.join(df2, df1("i") === df2("i")), | ||
| "Bucketed: true" :: Nil: _*) |
There was a problem hiding this comment.
nit: "Bucketed: true" :: Nil: _* -> "Bucketed: true"?
There was a problem hiding this comment.
yeah, updated for this and other places.
| withSQLConf(SQLConf.BUCKETING_ENABLED.key -> "false") { | ||
| checkKeywordsExistsInExplain( | ||
| df1.join(df2, df1("i") === df2("i")), | ||
| "Bucketed: false (disabled by configuration)" :: Nil: _*) |
| "Bucketed: false (disabled by configuration)" :: Nil: _*) | ||
| } | ||
|
|
||
| checkKeywordsExistsInExplain(df1, "Bucketed: false (disabled by query planner)" :: Nil: _*) |
|
|
||
| checkKeywordsExistsInExplain( | ||
| df1.select("j"), | ||
| "Bucketed: false (bucket column(s) not read)" :: Nil: _*) |
|
Test build #142306 has finished for PR 33698 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Test build #142348 has finished for PR 33698 at commit
|
|
Merged to master for Apache Spark 3.3.0 according to the issue type, |
|
Thank you, @c21 and @cloud-fan . |
|
Thank you @dongjoon-hyun and @cloud-fan for review! |
What changes were proposed in this pull request?
As a followup from discussion in #29804 (comment) , currently the query plan for data source v1 scan operator -
FileSourceScanExechas no information to indicate whether the table is read as bucketed table or not. And if table not read as bucketed table, what's the reason behind it. Add this info intoFileSourceScanExecphysical query plan output, can help users and developers understand query plan more easily without spending a lot of time debugging why table is not read as bucketed table.Why are the changes needed?
Help users and developers debug query plan for bucketed table.
Does this PR introduce any user-facing change?
The added
Bucketedinformation in physical query plan when reading bucketed table.Note for reading non-bucketed table, the query plan stays same and nothing is changed.
Example:
How was this patch tested?
Added unit test in
ExplainSuite.scala.