-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-32808][SQL] Pass all test of sql/core module in Scala 2.13 #29711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
ecfc789
ff5fdcc
8a0bb43
b66ca44
10d4953
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,15 +10,15 @@ TakeOrderedAndProject (34) | |
| : :- * Project (17) | ||
| : : +- * BroadcastHashJoin Inner BuildRight (16) | ||
| : : :- * Project (10) | ||
| : : : +- * BroadcastHashJoin Inner BuildLeft (9) | ||
| : : : :- BroadcastExchange (5) | ||
| : : : : +- * Project (4) | ||
| : : : : +- * Filter (3) | ||
| : : : : +- * ColumnarToRow (2) | ||
| : : : : +- Scan parquet default.date_dim (1) | ||
| : : : +- * Filter (8) | ||
| : : : +- * ColumnarToRow (7) | ||
| : : : +- Scan parquet default.store_sales (6) | ||
| : : : +- * BroadcastHashJoin Inner BuildRight (9) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm skimming the changes, and this seems non-trivial? but then again I am not so sure how to read these plans expertly enough to evaluate. The plan below starts by scanning a different table, for example. We may just have to accept changes like this, if they're equivalent, to make sure they do not depend on implementation details of hash maps. But @gatorsmile @cloud-fan et al do these look like plausible equivalent plans for example? |
||
| : : : :- * Filter (3) | ||
| : : : : +- * ColumnarToRow (2) | ||
| : : : : +- Scan parquet default.store_sales (1) | ||
| : : : +- BroadcastExchange (8) | ||
| : : : +- * Project (7) | ||
| : : : +- * Filter (6) | ||
| : : : +- * ColumnarToRow (5) | ||
| : : : +- Scan parquet default.date_dim (4) | ||
| : : +- BroadcastExchange (15) | ||
| : : +- * Project (14) | ||
| : : +- * Filter (13) | ||
|
|
@@ -35,50 +35,50 @@ TakeOrderedAndProject (34) | |
| +- Scan parquet default.item (25) | ||
|
|
||
|
|
||
| (1) Scan parquet default.date_dim | ||
| Output [2]: [d_date_sk#1, d_year#2] | ||
| (1) Scan parquet default.store_sales | ||
| Output [8]: [ss_sold_date_sk#1, ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
| Batched: true | ||
| Location [not included in comparison]/{warehouse_dir}/date_dim] | ||
| PushedFilters: [IsNotNull(d_year), EqualTo(d_year,1998), GreaterThanOrEqual(d_date_sk,2450815), LessThanOrEqual(d_date_sk,2451179), IsNotNull(d_date_sk)] | ||
| ReadSchema: struct<d_date_sk:int,d_year:int> | ||
| Location [not included in comparison]/{warehouse_dir}/store_sales] | ||
| PushedFilters: [IsNotNull(ss_sold_date_sk), GreaterThanOrEqual(ss_sold_date_sk,2450815), LessThanOrEqual(ss_sold_date_sk,2451179), IsNotNull(ss_cdemo_sk), IsNotNull(ss_item_sk), IsNotNull(ss_promo_sk)] | ||
| ReadSchema: struct<ss_sold_date_sk:int,ss_item_sk:int,ss_cdemo_sk:int,ss_promo_sk:int,ss_quantity:int,ss_list_price:decimal(7,2),ss_sales_price:decimal(7,2),ss_coupon_amt:decimal(7,2)> | ||
|
|
||
| (2) ColumnarToRow [codegen id : 1] | ||
| Input [2]: [d_date_sk#1, d_year#2] | ||
| (2) ColumnarToRow [codegen id : 5] | ||
| Input [8]: [ss_sold_date_sk#1, ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
|
|
||
| (3) Filter [codegen id : 1] | ||
| Input [2]: [d_date_sk#1, d_year#2] | ||
| Condition : ((((isnotnull(d_year#2) AND (d_year#2 = 1998)) AND (d_date_sk#1 >= 2450815)) AND (d_date_sk#1 <= 2451179)) AND isnotnull(d_date_sk#1)) | ||
| (3) Filter [codegen id : 5] | ||
| Input [8]: [ss_sold_date_sk#1, ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
| Condition : (((((isnotnull(ss_sold_date_sk#1) AND (ss_sold_date_sk#1 >= 2450815)) AND (ss_sold_date_sk#1 <= 2451179)) AND isnotnull(ss_cdemo_sk#3)) AND isnotnull(ss_item_sk#2)) AND isnotnull(ss_promo_sk#4)) | ||
|
|
||
| (4) Project [codegen id : 1] | ||
| Output [1]: [d_date_sk#1] | ||
| Input [2]: [d_date_sk#1, d_year#2] | ||
| (4) Scan parquet default.date_dim | ||
| Output [2]: [d_date_sk#9, d_year#10] | ||
| Batched: true | ||
| Location [not included in comparison]/{warehouse_dir}/date_dim] | ||
| PushedFilters: [IsNotNull(d_year), EqualTo(d_year,1998), GreaterThanOrEqual(d_date_sk,2450815), LessThanOrEqual(d_date_sk,2451179), IsNotNull(d_date_sk)] | ||
| ReadSchema: struct<d_date_sk:int,d_year:int> | ||
|
|
||
| (5) BroadcastExchange | ||
| Input [1]: [d_date_sk#1] | ||
| Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#3] | ||
| (5) ColumnarToRow [codegen id : 1] | ||
| Input [2]: [d_date_sk#9, d_year#10] | ||
|
|
||
| (6) Scan parquet default.store_sales | ||
| Output [8]: [ss_sold_date_sk#4, ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Batched: true | ||
| Location [not included in comparison]/{warehouse_dir}/store_sales] | ||
| PushedFilters: [IsNotNull(ss_sold_date_sk), GreaterThanOrEqual(ss_sold_date_sk,2450815), LessThanOrEqual(ss_sold_date_sk,2451179), IsNotNull(ss_cdemo_sk), IsNotNull(ss_item_sk), IsNotNull(ss_promo_sk)] | ||
| ReadSchema: struct<ss_sold_date_sk:int,ss_item_sk:int,ss_cdemo_sk:int,ss_promo_sk:int,ss_quantity:int,ss_list_price:decimal(7,2),ss_sales_price:decimal(7,2),ss_coupon_amt:decimal(7,2)> | ||
| (6) Filter [codegen id : 1] | ||
| Input [2]: [d_date_sk#9, d_year#10] | ||
| Condition : ((((isnotnull(d_year#10) AND (d_year#10 = 1998)) AND (d_date_sk#9 >= 2450815)) AND (d_date_sk#9 <= 2451179)) AND isnotnull(d_date_sk#9)) | ||
|
|
||
| (7) ColumnarToRow | ||
| Input [8]: [ss_sold_date_sk#4, ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| (7) Project [codegen id : 1] | ||
| Output [1]: [d_date_sk#9] | ||
| Input [2]: [d_date_sk#9, d_year#10] | ||
|
|
||
| (8) Filter | ||
| Input [8]: [ss_sold_date_sk#4, ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Condition : (((((isnotnull(ss_sold_date_sk#4) AND (ss_sold_date_sk#4 >= 2450815)) AND (ss_sold_date_sk#4 <= 2451179)) AND isnotnull(ss_cdemo_sk#6)) AND isnotnull(ss_item_sk#5)) AND isnotnull(ss_promo_sk#7)) | ||
| (8) BroadcastExchange | ||
| Input [1]: [d_date_sk#9] | ||
| Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#11] | ||
|
|
||
| (9) BroadcastHashJoin [codegen id : 5] | ||
| Left keys [1]: [d_date_sk#1] | ||
| Right keys [1]: [ss_sold_date_sk#4] | ||
| Left keys [1]: [ss_sold_date_sk#1] | ||
| Right keys [1]: [d_date_sk#9] | ||
| Join condition: None | ||
|
|
||
| (10) Project [codegen id : 5] | ||
| Output [7]: [ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Input [9]: [d_date_sk#1, ss_sold_date_sk#4, ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Output [7]: [ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
| Input [9]: [ss_sold_date_sk#1, ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, d_date_sk#9] | ||
|
|
||
| (11) Scan parquet default.promotion | ||
| Output [3]: [p_promo_sk#12, p_channel_email#13, p_channel_event#14] | ||
|
|
@@ -103,13 +103,13 @@ Input [1]: [p_promo_sk#12] | |
| Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#15] | ||
|
|
||
| (16) BroadcastHashJoin [codegen id : 5] | ||
| Left keys [1]: [ss_promo_sk#7] | ||
| Left keys [1]: [ss_promo_sk#4] | ||
| Right keys [1]: [p_promo_sk#12] | ||
| Join condition: None | ||
|
|
||
| (17) Project [codegen id : 5] | ||
| Output [6]: [ss_item_sk#5, ss_cdemo_sk#6, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Input [8]: [ss_item_sk#5, ss_cdemo_sk#6, ss_promo_sk#7, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11, p_promo_sk#12] | ||
| Output [6]: [ss_item_sk#2, ss_cdemo_sk#3, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
| Input [8]: [ss_item_sk#2, ss_cdemo_sk#3, ss_promo_sk#4, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, p_promo_sk#12] | ||
|
|
||
| (18) Scan parquet default.customer_demographics | ||
| Output [4]: [cd_demo_sk#16, cd_gender#17, cd_marital_status#18, cd_education_status#19] | ||
|
|
@@ -134,13 +134,13 @@ Input [1]: [cd_demo_sk#16] | |
| Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint)),false), [id=#20] | ||
|
|
||
| (23) BroadcastHashJoin [codegen id : 5] | ||
| Left keys [1]: [ss_cdemo_sk#6] | ||
| Left keys [1]: [ss_cdemo_sk#3] | ||
| Right keys [1]: [cd_demo_sk#16] | ||
| Join condition: None | ||
|
|
||
| (24) Project [codegen id : 5] | ||
| Output [5]: [ss_item_sk#5, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11] | ||
| Input [7]: [ss_item_sk#5, ss_cdemo_sk#6, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11, cd_demo_sk#16] | ||
| Output [5]: [ss_item_sk#2, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8] | ||
| Input [7]: [ss_item_sk#2, ss_cdemo_sk#3, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, cd_demo_sk#16] | ||
|
|
||
| (25) Scan parquet default.item | ||
| Output [2]: [i_item_sk#21, i_item_id#22] | ||
|
|
@@ -161,18 +161,18 @@ Input [2]: [i_item_sk#21, i_item_id#22] | |
| Arguments: HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), [id=#23] | ||
|
|
||
| (29) BroadcastHashJoin [codegen id : 5] | ||
| Left keys [1]: [ss_item_sk#5] | ||
| Left keys [1]: [ss_item_sk#2] | ||
| Right keys [1]: [i_item_sk#21] | ||
| Join condition: None | ||
|
|
||
| (30) Project [codegen id : 5] | ||
| Output [5]: [ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11, i_item_id#22] | ||
| Input [7]: [ss_item_sk#5, ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11, i_item_sk#21, i_item_id#22] | ||
| Output [5]: [ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, i_item_id#22] | ||
| Input [7]: [ss_item_sk#2, ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, i_item_sk#21, i_item_id#22] | ||
|
|
||
| (31) HashAggregate [codegen id : 5] | ||
| Input [5]: [ss_quantity#8, ss_list_price#9, ss_sales_price#10, ss_coupon_amt#11, i_item_id#22] | ||
| Input [5]: [ss_quantity#5, ss_list_price#6, ss_sales_price#7, ss_coupon_amt#8, i_item_id#22] | ||
| Keys [1]: [i_item_id#22] | ||
| Functions [4]: [partial_avg(cast(ss_quantity#8 as bigint)), partial_avg(UnscaledValue(ss_list_price#9)), partial_avg(UnscaledValue(ss_coupon_amt#11)), partial_avg(UnscaledValue(ss_sales_price#10))] | ||
| Functions [4]: [partial_avg(cast(ss_quantity#5 as bigint)), partial_avg(UnscaledValue(ss_list_price#6)), partial_avg(UnscaledValue(ss_coupon_amt#8)), partial_avg(UnscaledValue(ss_sales_price#7))] | ||
| Aggregate Attributes [8]: [sum#24, count#25, sum#26, count#27, sum#28, count#29, sum#30, count#31] | ||
| Results [9]: [i_item_id#22, sum#32, count#33, sum#34, count#35, sum#36, count#37, sum#38, count#39] | ||
|
|
||
|
|
@@ -183,9 +183,9 @@ Arguments: hashpartitioning(i_item_id#22, 5), true, [id=#40] | |
| (33) HashAggregate [codegen id : 6] | ||
| Input [9]: [i_item_id#22, sum#32, count#33, sum#34, count#35, sum#36, count#37, sum#38, count#39] | ||
| Keys [1]: [i_item_id#22] | ||
| Functions [4]: [avg(cast(ss_quantity#8 as bigint)), avg(UnscaledValue(ss_list_price#9)), avg(UnscaledValue(ss_coupon_amt#11)), avg(UnscaledValue(ss_sales_price#10))] | ||
| Aggregate Attributes [4]: [avg(cast(ss_quantity#8 as bigint))#41, avg(UnscaledValue(ss_list_price#9))#42, avg(UnscaledValue(ss_coupon_amt#11))#43, avg(UnscaledValue(ss_sales_price#10))#44] | ||
| Results [5]: [i_item_id#22, avg(cast(ss_quantity#8 as bigint))#41 AS agg1#45, cast((avg(UnscaledValue(ss_list_price#9))#42 / 100.0) as decimal(11,6)) AS agg2#46, cast((avg(UnscaledValue(ss_coupon_amt#11))#43 / 100.0) as decimal(11,6)) AS agg3#47, cast((avg(UnscaledValue(ss_sales_price#10))#44 / 100.0) as decimal(11,6)) AS agg4#48] | ||
| Functions [4]: [avg(cast(ss_quantity#5 as bigint)), avg(UnscaledValue(ss_list_price#6)), avg(UnscaledValue(ss_coupon_amt#8)), avg(UnscaledValue(ss_sales_price#7))] | ||
| Aggregate Attributes [4]: [avg(cast(ss_quantity#5 as bigint))#41, avg(UnscaledValue(ss_list_price#6))#42, avg(UnscaledValue(ss_coupon_amt#8))#43, avg(UnscaledValue(ss_sales_price#7))#44] | ||
| Results [5]: [i_item_id#22, avg(cast(ss_quantity#5 as bigint))#41 AS agg1#45, cast((avg(UnscaledValue(ss_list_price#6))#42 / 100.0) as decimal(11,6)) AS agg2#46, cast((avg(UnscaledValue(ss_coupon_amt#8))#43 / 100.0) as decimal(11,6)) AS agg3#47, cast((avg(UnscaledValue(ss_sales_price#7))#44 / 100.0) as decimal(11,6)) AS agg4#48] | ||
|
|
||
| (34) TakeOrderedAndProject | ||
| Input [5]: [i_item_id#22, agg1#45, agg2#46, agg3#47, agg4#48] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is OK, though it's not strictly necessary to make the type def this specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it as a separate PR? The plan changes need more reviews
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to use
Map[Set[Int], JoinPlan]or not to defineJoinPlanMaptype?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile Use a separate JIRA number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same JIRA I think, just a separate PR. I'm not sure it matters that much though, it's a tiny part of the change really? we just need more eyes on the plan changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that there is no way to divide 2 PR because if we change
CostBasedJoinReorderonly, the test cases insql/coremodule will be failed.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So all excepted plan changes are due to
use of LinkedHashMap instead of Map