Skip to content

Fix join reordering for non-trivial equi join conditions#20413

Merged
ajaygeorge merged 1 commit intoprestodb:masterfrom
aaneja:fixBadEquiJoinFilter
Sep 18, 2023
Merged

Fix join reordering for non-trivial equi join conditions#20413
ajaygeorge merged 1 commit intoprestodb:masterfrom
aaneja:fixBadEquiJoinFilter

Conversation

@aaneja
Copy link
Contributor

@aaneja aaneja commented Jul 28, 2023

Description

Enhance join-reordering to work with non-simple equi-join predicates

Motivation and Context

Join predicates like left.key = right0.key1 + right1.key2 can reduce the join space by appearing as Project nodes or Join noes with no equi-join clauses in the join graph. This commit fixes this behavior by removing any intermediate Projects in the join graph and only creating them on-the-fly while choosing the join order

Impact

Queries with completely connected join graphs would have CrossJoin's in them, depending on the order of specification of the FROM clause

presto:tiny> explain select count(*) from customer c, partsupp ps, orders o, supplier s where s.suppkey = ps.suppkey and c.custkey = o.custkey and s.nationkey + ps.partkey = c.nationkey;
                                                                                                                               Query Plan                                                                                               >
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 - Output[_col0] => [count:bigint]                                                                                                                                                                                                      >
         _col0 := count (1:16)                                                                                                                                                                                                          >
     - Aggregate(FINAL) => [count:bigint]                                                                                                                                                                                               >
             count := "presto.default.count"((count_16)) (1:16)                                                                                                                                                                         >
         - LocalExchange[SINGLE] () => [count_16:bigint]                                                                                                                                                                                >
             - RemoteStreamingExchange[GATHER] => [count_16:bigint]                                                                                                                                                                     >
                 - Aggregate(PARTIAL) => [count_16:bigint]                                                                                                                                                                              >
                         count_16 := "presto.default.count"(*) (1:16)                                                                                                                                                                   >
                     - InnerJoin[("suppkey" = "suppkey_5") AND (nationkey) = ((nationkey_8) + (partkey))][$hashvalue_20, $hashvalue_21] => []                                                                                           >
                             Estimates: {source: CostBasedSourceInfo, rows: 2400 (21.09kB), cpu: 11881391400.00, memory: 178200.00, network: 178200.00}                                                                                 >
                             Distribution: REPLICATED                                                                                                                                                                                   >
                         - Project[projectLocality = LOCAL] => [partkey:bigint, suppkey:bigint, nationkey:bigint, $hashvalue_20:bigint]                                                                                                 >
                                 Estimates: {source: CostBasedSourceInfo, rows: 120000000 (1.01GB), cpu: 7561381500.00, memory: 175500.00, network: 175500.00}                                                                          >
                                 $hashvalue_20 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(suppkey), BIGINT'0')) (1:43)                                                                                                     >
                             - CrossJoin => [partkey:bigint, suppkey:bigint, nationkey:bigint]                                                                                                                                          >
                                     Estimates: {source: CostBasedSourceInfo, rows: 120000000 (1.01GB), cpu: 3241381500.00, memory: 175500.00, network: 175500.00}                                                                      >
                                     Distribution: REPLICATED                                                                                                                                                                           >
                                 - TableScan[TableHandle {connectorId='tpch', connectorHandle='partsupp:sf0.01', layout='Optional[partsupp:sf0.01]'}] => [partkey:bigint, suppkey:bigint]                                               >
                                         Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 144000.00, memory: 0.00, network: 0.00}                                                                                    >
                                         partkey := tpch:partkey (1:42)                                                                                                                                                                 >
                                         suppkey := tpch:suppkey (1:42)                                                                                                                                                                 >
                                 - LocalExchange[SINGLE] () => [nationkey:bigint]                                                                                                                                                       >
                                         Estimates: {source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 958500.00, memory: 40500.00, network: 175500.00}                                                                         >
                                     - RemoteStreamingExchange[REPLICATE] => [nationkey:bigint]                                                                                                                                         >
                                             Estimates: {source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 958500.00, memory: 40500.00, network: 175500.00}                                                                     >
                                         - InnerJoin[("custkey_2" = "custkey")][$hashvalue, $hashvalue_17] => [nationkey:bigint]                                                                                                        >
                                                 Estimates: {source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 958500.00, memory: 40500.00, network: 40500.00}                                                                  >
                                                 Distribution: REPLICATED                                                                                                                                                               >
                                             - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='orders:sf0.01', layout='Optional[orders:sf0.01]'}, projectLocality = LOCAL] => [custkey_2:bigint, $hashvalue:bigint>
                                                     Estimates: {source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 135000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 405000.00,>
                                                     $hashvalue := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_2), BIGINT'0')) (1:56)                                                                                  >
                                                     custkey_2 := tpch:custkey (1:55)                                                                                                                                                   >
                                                     tpch:orderstatus                                                                                                                                                                   >
                                                         :: [["F"], ["O"], ["P"]]                                                                                                                                                       >
                                             - LocalExchange[HASH][$hashvalue_17] (custkey) => [custkey:bigint, nationkey:bigint, $hashvalue_17:bigint]                                                                                 >
                                                     Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 108000.00, memory: 0.00, network: 40500.00}                                                                    >
                                                 - RemoteStreamingExchange[REPLICATE] => [custkey:bigint, nationkey:bigint, $hashvalue_18:bigint]                                                                                       >
                                                         Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 67500.00, memory: 0.00, network: 40500.00}                                                                 >
                                                     - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01', layout='Optional[customer:sf0.01]'}, projectLocality = LOCAL] => [custkey:bigint, nationk>
                                                             Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 27000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 67500.0>
                                                             $hashvalue_19 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:30)                                                                         >
                                                             nationkey := tpch:nationkey (1:30)                                                                                                                                         >
                                                             custkey := tpch:custkey (1:30)                                                                                                                                             >
                         - LocalExchange[HASH][$hashvalue_21] (suppkey_5) => [suppkey_5:bigint, nationkey_8:bigint, $hashvalue_21:bigint]                                                                                               >
                                 Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 7200.00, memory: 0.00, network: 2700.00}                                                                                               >
                             - RemoteStreamingExchange[REPLICATE] => [suppkey_5:bigint, nationkey_8:bigint, $hashvalue_22:bigint]                                                                                                       >
                                     Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 4500.00, memory: 0.00, network: 2700.00}                                                                                           >
                                 - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf0.01', layout='Optional[supplier:sf0.01]'}, projectLocality = LOCAL] => [suppkey_5:bigint, nationkey_8:bigint, $hash>
                                         Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 1800.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 100 (900B), cpu: 4500.00, memory: 0.00, network: 0.00>
                                         $hashvalue_23 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(suppkey_5), BIGINT'0')) (1:65)                                                                                           >
                                         suppkey_5 := tpch:suppkey (1:65)                                                                                                                                                               >
                                         nationkey_8 := tpch:nationkey (1:65)                                                                                                                                                           >
                                                                                                                                                                                                                                        >
(1 row)

This is fixed after this PR, and a cheaper plan with only inner joins is chosen :

presto:tiny> explain select count(*) from customer c, partsupp ps, orders o, supplier s where s.suppkey = ps.suppkey and c.custkey = o.custkey and s.nationkey + ps.partkey = c.nationkey;
                                                                                                                                     Query Plan                                                                                         >
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 - Output[_col0] => [count:bigint]                                                                                                                                                                                                      >
         _col0 := count (1:16)                                                                                                                                                                                                          >
     - Aggregate(FINAL) => [count:bigint]                                                                                                                                                                                               >
             count := "presto.default.count"((count_18)) (1:16)                                                                                                                                                                         >
         - LocalExchange[SINGLE] () => [count_18:bigint]                                                                                                                                                                                >
             - RemoteStreamingExchange[GATHER] => [count_18:bigint]                                                                                                                                                                     >
                 - Aggregate(PARTIAL) => [count_18:bigint]                                                                                                                                                                              >
                         count_18 := "presto.default.count"(*) (1:16)                                                                                                                                                                   >
                     - InnerJoin[("custkey_2" = "custkey")][$hashvalue, $hashvalue_19] => []                                                                                                                                            >
                             Estimates: {source: CostBasedSourceInfo, rows: 59289 (521.09kB), cpu: 2286917.79, memory: 149919.37, network: 149919.37}                                                                                   >
                             Distribution: REPLICATED                                                                                                                                                                                   >
                         - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='orders:sf0.01', layout='Optional[orders:sf0.01]'}, projectLocality = LOCAL] => [custkey_2:bigint, $hashvalue:bigint]                   >
                                 Estimates: {source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 135000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 15000 (131.84kB), cpu: 405000.00, memory: 0.00, netwo>
                                 $hashvalue := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_2), BIGINT'0')) (1:56)                                                                                                      >
                                 custkey_2 := tpch:custkey (1:55)                                                                                                                                                                       >
                                 tpch:orderstatus                                                                                                                                                                                       >
                                     :: [["F"], ["O"], ["P"]]                                                                                                                                                                           >
                         - LocalExchange[HASH][$hashvalue_19] (custkey) => [custkey:bigint, $hashvalue_19:bigint]                                                                                                                       >
                                 Estimates: {source: CostBasedSourceInfo, rows: 5929 (52.11kB), cpu: 1505198.42, memory: 43200.00, network: 149919.37}                                                                                  >
                             - RemoteStreamingExchange[REPLICATE] => [custkey:bigint, $hashvalue_20:bigint]                                                                                                                             >
                                     Estimates: {source: CostBasedSourceInfo, rows: 5929 (52.11kB), cpu: 1398479.05, memory: 43200.00, network: 149919.37}                                                                              >
                                 - Project[projectLocality = LOCAL] => [custkey:bigint, $hashvalue_29:bigint]                                                                                                                           >
                                         Estimates: {source: CostBasedSourceInfo, rows: 5929 (52.11kB), cpu: 1398479.05, memory: 43200.00, network: 43200.00}                                                                           >
                                         $hashvalue_29 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:30)                                                                                             >
                                     - InnerJoin[("add_17" = "nationkey")][$hashvalue_25, $hashvalue_26] => [custkey:bigint]                                                                                                            >
                                             Estimates: {source: CostBasedSourceInfo, rows: 5929 (52.11kB), cpu: 1291759.68, memory: 43200.00, network: 43200.00}                                                                       >
                                             Distribution: REPLICATED                                                                                                                                                                   >
                                         - Project[projectLocality = LOCAL] => [add_17:bigint, $hashvalue_25:bigint]                                                                                                                    >
                                                 Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 945900.00, memory: 2700.00, network: 2700.00}                                                                      >
                                                 $hashvalue_25 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(add_17), BIGINT'0')) (1:66)                                                                                      >
                                             - Project[projectLocality = LOCAL] => [add_17:bigint]                                                                                                                                      >
                                                     Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 801900.00, memory: 2700.00, network: 2700.00}                                                                  >
                                                     add_17 := (nationkey_8) + (partkey) (1:66)                                                                                                                                         >
                                                 - InnerJoin[("suppkey" = "suppkey_5")][$hashvalue_21, $hashvalue_22] => [partkey:bigint, nationkey_8:bigint]                                                                           >
                                                         Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 729900.00, memory: 2700.00, network: 2700.00}                                                              >
                                                         Distribution: REPLICATED                                                                                                                                                       >
                                                     - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='partsupp:sf0.01', layout='Optional[partsupp:sf0.01]'}, projectLocality = LOCAL] => [partkey:bigint, suppkey>
                                                             Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 144000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 360000>
                                                             $hashvalue_21 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(suppkey), BIGINT'0')) (1:43)                                                                         >
                                                             partkey := tpch:partkey (1:42)                                                                                                                                             >
                                                             suppkey := tpch:suppkey (1:42)                                                                                                                                             >
                                                     - LocalExchange[HASH][$hashvalue_22] (suppkey_5) => [suppkey_5:bigint, nationkey_8:bigint, $hashvalue_22:bigint]                                                                   >
                                                             Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 7200.00, memory: 0.00, network: 2700.00}                                                                   >
                                                         - RemoteStreamingExchange[REPLICATE] => [suppkey_5:bigint, nationkey_8:bigint, $hashvalue_23:bigint]                                                                           >
                                                                 Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 4500.00, memory: 0.00, network: 2700.00}                                                               >
                                                             - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='supplier:sf0.01', layout='Optional[supplier:sf0.01]'}, projectLocality = LOCAL] => [suppkey_5:bigin>
                                                                     Estimates: {source: CostBasedSourceInfo, rows: 100 (900B), cpu: 1800.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 100 (900B), cpu: 4500.00,>
                                                                     $hashvalue_24 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(suppkey_5), BIGINT'0')) (1:65)                                                               >
                                                                     suppkey_5 := tpch:suppkey (1:65)                                                                                                                                   >
                                                                     nationkey_8 := tpch:nationkey (1:65)                                                                                                                               >
                                         - LocalExchange[HASH][$hashvalue_26] (nationkey) => [custkey:bigint, nationkey:bigint, $hashvalue_26:bigint]                                                                                   >
                                                 Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 108000.00, memory: 0.00, network: 40500.00}                                                                        >
                                             - RemoteStreamingExchange[REPLICATE] => [custkey:bigint, nationkey:bigint, $hashvalue_27:bigint]                                                                                           >
                                                     Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 67500.00, memory: 0.00, network: 40500.00}                                                                     >
                                                 - ScanProject[table = TableHandle {connectorId='tpch', connectorHandle='customer:sf0.01', layout='Optional[customer:sf0.01]'}, projectLocality = LOCAL] => [custkey:bigint, nationkey:b>
                                                         Estimates: {source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 27000.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1500 (13.18kB), cpu: 67500.00, m>
                                                         $hashvalue_28 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(nationkey), BIGINT'0')) (1:30)                                                                           >
                                                         custkey := tpch:custkey (1:30)                                                                                                                                                 >
                                                         nationkey := tpch:nationkey (1:30)                                                                                                                                             >
                                                                                                                                                                                                                                        >
(1 row)

Test Plan

New unit tests added

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

== RELEASE NOTES ==

General Changes
* Enhance join-reordering to work with non-simple equi-join predicates

@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch from 0d9dd72 to 46c8624 Compare August 1, 2023 07:32
@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch 5 times, most recently from d077299 to 98b73d9 Compare August 22, 2023 14:59
@aaneja aaneja changed the title [DRAFT] Fix join reordering for non-trivial equi join conditions Fix join reordering for non-trivial equi join conditions Aug 22, 2023
@aaneja aaneja requested a review from vivek-bharathan August 22, 2023 15:43
@aaneja aaneja marked this pull request as ready for review August 22, 2023 15:43
@aaneja aaneja requested a review from a team as a code owner August 22, 2023 15:43
@aaneja aaneja requested a review from presto-oss August 22, 2023 15:43
@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch from 98b73d9 to fe72d1f Compare August 25, 2023 05:06
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure i'll get a chance to do a full review, but one request would be to add tests for joins with some complex predicates that can't be optimized (e.g. variables from both sides of the join on both sides of the equality, or variables from only one side of the join is in the predicate at all). Also, nested joins that have multiple such predicates. I see you have some example tests like this in TestJoinEnumerator, but would be good to have some full SQL tests to to ensure correct handling for plan and result correctness.

@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch from fe72d1f to 69d60cc Compare September 5, 2023 15:54
@aaneja
Copy link
Contributor Author

aaneja commented Sep 6, 2023

I'm not sure i'll get a chance to do a full review, but one request would be to add tests for joins with some complex predicates that can't be optimized (e.g. variables from both sides of the join on both sides of the equality, or variables from only one side of the join is in the predicate at all). Also, nested joins that have multiple such predicates. I see you have some example tests like this in TestJoinEnumerator, but would be good to have some full SQL tests to to ensure correct handling for plan and result correctness.

@rschlussel I've added the tests for non-optimizable predicates and a nested query with a mix of optimizable and non-optimizable predicates now in TestReorderJoins.

I do have one full SQL test added as well. While I could clone the plan tests here, I think plan equivalence tests gives us good coverage (A secondary reason is that H2 based result verification for large joins is prohibitively slow)

@kaikalur
Copy link
Contributor

kaikalur commented Sep 8, 2023

Do we have a session param to control this optimization? I don't see it. I say we have to do that if not already there before we merge this PR.

@aaneja
Copy link
Contributor Author

aaneja commented Sep 8, 2023

Do we have a session param to control this optimization? I don't see it. I say we have to do that if not already there before we merge this PR.

@kaikalur I chose not to add a session property since IMO we have a bug here - we get different query plans for a fully connected join graph depending upon how the query is written. This shouldn't happen if join re-ordering is turned on. Can you elaborate on your concerns on why we need a session flag ?

@kaikalur
Copy link
Contributor

kaikalur commented Sep 8, 2023

Is the plan incorrect or just (potentially) inefficient? If it's the latter, I say we still need to add the flag

@aaneja
Copy link
Contributor Author

aaneja commented Sep 11, 2023

The plan is incorrect since we expect join-reordering to work for fully connected graphs, but it doesn't.
If you look at the query in the PR description -

explain select count(*) from customer c, partsupp ps, orders o, supplier s where s.suppkey = ps.suppkey and c.custkey = o.custkey and s.nationkey + ps.partkey = c.nationkey;

---------------------------------------------------
...
- InnerJoin[("suppkey" = "suppkey_5") AND (nationkey) = ((nationkey_8) + (partkey))][$hashvalue_20, $hashvalue_21] => []                                                                                           >
                             Estimates: {source: CostBasedSourceInfo, rows: 2400 (21.09kB), cpu: 11881391400.00, memory: 178200.00, network: 178200.00}                                                                                 >
                             Distribution: REPLICATED                                                                                                                                                                                   >
                         - Project[projectLocality = LOCAL] => [partkey:bigint, suppkey:bigint, nationkey:bigint, $hashvalue_20:bigint]                                                                                                 >
                                 Estimates: {source: CostBasedSourceInfo, rows: 120000000 (1.01GB), cpu: 7561381500.00, memory: 175500.00, network: 175500.00}                                                                          >
                                 $hashvalue_20 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(suppkey), BIGINT'0')) (1:43)                                                                                                     >
                             - CrossJoin => [partkey:bigint, suppkey:bigint, nationkey:bigint]                                                                                                                                          >
                                     Estimates: {source: CostBasedSourceInfo, rows: 120000000 (1.01GB), cpu: 3241381500.00, memory: 175500.00, network: 175500.00}                                                                      >
                                     Distribution: REPLICATED                                                                                                                                                                           >
                                 - TableScan[TableHandle {connectorId='tpch', connectorHandle='partsupp:sf0.01', layout='Optional[partsupp:sf0.01]'}] => [partkey:bigint, suppkey:bigint]                                               >
                                         Estimates: {source: CostBasedSourceInfo, rows: 8000 (70.31kB), cpu: 144000.00, memory: 0.00, network: 0.00}                                                                                    >
                                         partkey := tpch:partkey (1:42)                                                                                                                                                                 >
                                         suppkey := tpch:suppkey (1:42)    
...

With join reordering turned on, we shouldn't get a CrossJoin node added to the plan because the join graph is fully connected. But the join space of the query gets broken up because of the current implementation of the ReorderJoins rule, which IMO is wrong (not just inefficient).

Are you concerned that there are user's relying on this broken behavior who would be surprised because of this change ?

@kaikalur
Copy link
Contributor

Are you concerned that there are user's relying on this broken behavior who would be surprised because of this change ?

Yes :) someone's bug is someone elses's feature :) For large tables, breaking up plan can help sometimes.

@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch from 84fd0a4 to ee95aec Compare September 12, 2023 07:52
@aaneja
Copy link
Contributor Author

aaneja commented Sep 12, 2023

@kaikalur I've added a session property handle_complex_equi_joins as requested. I've kept a default of true since I think we should default to handling the join re-ordering correctly

@aaneja
Copy link
Contributor Author

aaneja commented Sep 12, 2023

Test failure is unrelated. See #20840 for details

Join predicates like `left.key = right1.key1 + right2.key2` can reduce
the join space by appearing as Project nodes or Join noes with no
equi-join clauses in the join graph. This commit fixes this behavior
by removing any intermediate Projects in the join graph and only
creating them on-the-fly while choosing the join order
@aaneja aaneja force-pushed the fixBadEquiJoinFilter branch from ee95aec to df705ef Compare September 13, 2023 14:07
@aaneja
Copy link
Contributor Author

aaneja commented Sep 13, 2023

@kaikalur / @rschlussel Can you please merge this PR ? The last remaining test failure is unrelated.

I've created #20850 for this; we would need to figure out a proper fix for it

@aaneja aaneja requested review from presto-oss and removed request for presto-oss and rschlussel September 14, 2023 07:59
Copy link
Contributor

@ajaygeorge ajaygeorge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamping since this is already reviewed by multiple folks.

@ajaygeorge ajaygeorge merged commit 009c06b into prestodb:master Sep 18, 2023
@feilong-liu
Copy link
Contributor

@aaneja This PR has a bug, you can try to run query select * from orders join lineitem using (orderkey) join customer using (custkey); and it will throw exception and fail the query. Can you fix it?

@aaneja
Copy link
Contributor Author

aaneja commented Sep 22, 2023

@feilong-liu Working on a fix

@aaneja
Copy link
Contributor Author

aaneja commented Sep 22, 2023

@feilong-liu @ajaygeorge I think we should revert while I make a fix; I created - #20943
I want to add more tests for the case when an intermediate Project is only doing a variable rename

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

6 participants