Skip to content

Enable logical property propagation by default#22013

Merged
vivek-bharathan merged 1 commit intoprestodb:masterfrom
vivek-bharathan:enablelogicalprop
Mar 7, 2024
Merged

Enable logical property propagation by default#22013
vivek-bharathan merged 1 commit intoprestodb:masterfrom
vivek-bharathan:enablelogicalprop

Conversation

@vivek-bharathan
Copy link
Contributor

@vivek-bharathan vivek-bharathan commented Feb 26, 2024

Fixes #21968

== RELEASE NOTES ==

General Changes
* Enable propagation of logical properties by default

@vivek-bharathan vivek-bharathan force-pushed the enablelogicalprop branch 2 times, most recently from d195678 to eb95168 Compare February 26, 2024 23:57
@steveburnett
Copy link
Contributor

The only place in the docs that exploit_constraints or optimizer.exploit-constraints are mentioned is in the 0.274 release notes. Could docs for these two properties be added as part of this PR?

@vivek-bharathan vivek-bharathan marked this pull request as ready for review March 2, 2024 01:38
@vivek-bharathan vivek-bharathan requested a review from a team as a code owner March 2, 2024 01:38
@github-actions
Copy link

github-actions bot commented Mar 2, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 9dfc461...de6f59e.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/admin/properties.rst
presto-docs/src/main/sphinx/optimizer.rst
presto-docs/src/main/sphinx/optimizer/logical-properties.rst

aaneja
aaneja previously approved these changes Mar 4, 2024
@kaikalur
Copy link
Contributor

kaikalur commented Mar 4, 2024

I'm not sure we want to do this by default (unless the optimizations that use the property are off by default). We are not sure how they would affect our prod workloads. In any case, an installation can always turn it on cluster-wide using feature config. So I say keep it off for now.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! A few nits on formatting and punctuation.

You must add optimizer/logical-properties to https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/optimizer.rst for the new page to be part of the left navbar.

Consider adding entries for exploit_constraints and optimizer.exploit-constraints to the Properties Reference doc, maybe in the Optimizer Properties topic.

@tdcmeehan
Copy link
Contributor

I'm not sure we want to do this by default (unless the optimizations that use the property are off by default). We are not sure how they would affect our prod workloads. In any case, an installation can always turn it on cluster-wide using feature config. So I say keep it off for now.

If you're worried about that, can't you just disable it for your deployment and figure out a way (if any) to enable it gradually?

@kaikalur
Copy link
Contributor

kaikalur commented Mar 4, 2024

I'm not sure we want to do this by default (unless the optimizations that use the property are off by default). We are not sure how they would affect our prod workloads. In any case, an installation can always turn it on cluster-wide using feature config. So I say keep it off for now.

If you're worried about that, can't you just disable it for your deployment and figure out a way (if any) to enable it gradually?

Yeah - we were just talking about it. We will shut it off on our side.

ZacBlanco
ZacBlanco previously approved these changes Mar 4, 2024
Copy link
Contributor

@ZacBlanco ZacBlanco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, otherwise lgtm

@vivek-bharathan vivek-bharathan dismissed stale reviews from ZacBlanco and aaneja via e1a7d6f March 5, 2024 17:33
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A couple of nits, nothing big.

@tdcmeehan tdcmeehan self-assigned this Mar 5, 2024
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local build.

@vivek-bharathan vivek-bharathan merged commit 4b453a5 into prestodb:master Mar 7, 2024
@vivek-bharathan vivek-bharathan deleted the enablelogicalprop branch March 7, 2024 18:46
@feilong-liu
Copy link
Contributor

@ClarenceThreepwood @tdcmeehan

We found a correctness bug for the logical property propagation framework, a reproduce query is as follows:

Returns 0 rows when the framework is on:

presto:tpch> set session exploit_constraints=true;
SET SESSION
presto:tpch> with t as (select orderkey, count(1) cnt from (select * from (select * from orders where 1=0) left join (select partkey, suppkey from lineitem where 1=0) on partkey=10 where suppkey is not null) group by rollup(orderkey)) select t1.orderkey, t1.cnt from t t1 cross join t t2;
 orderkey | cnt 
----------+-----
(0 rows)

Returns 1 row when the framework is off:

presto:tpch> set session exploit_constraints=false;
SET SESSION
presto:tpch> with t as (select orderkey, count(1) cnt from (select * from (select * from orders where 1=0) left join (select partkey, suppkey from lineitem where 1=0) on partkey=10 where suppkey is not null) group by rollup(orderkey)) select t1.orderkey, t1.cnt from t t1 cross join t t2;
 orderkey | cnt 
----------+-----
 NULL     |   0 
(1 row)

Looking into the query plan, looks like that the framework incorrectly oversimplify the plan
When framework is on:

presto:tpch> set session exploit_constraints=true;
SET SESSION
presto:tpch> explain (type distributed) with t as (select orderkey, count(1) cnt from (select * from (select * from orders where 1=0) left join (select partkey, suppkey from lineitem where 1=0) on partkey=10 where suppkey is not null) group by rollup(orderkey)) select t1.orderkey, t1.cnt from t t1 cross join t t2;
                                                                                                   Query Plan                                                                                                   
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Fragment 0 [SINGLE]                                                                                                                                                                                            
     Output layout: [orderkey$gid, count]                                                                                                                                                                       
     Output partitioning: SINGLE []                                                                                                                                                                             
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                              
     - Output[PlanNodeId 63][orderkey, cnt] => [orderkey$gid:bigint, count:bigint]                                                                                                                              
             orderkey := orderkey$gid (1:257)                                                                                                                                                                   
             cnt := count (1:270)                                                                                                                                                                               
         - CrossJoin[PlanNodeId 370] => [orderkey$gid:bigint, count:bigint]                                                                                                                                     
                 Distribution: REPLICATED                                                                                                                                                                       
             - Project[PlanNodeId 24][projectLocality = LOCAL] => [orderkey$gid:bigint, count:bigint]                                                                                                           
                 - Aggregate(FINAL)[orderkey$gid, groupid][$hashvalue][PlanNodeId 23] => [orderkey$gid:bigint, groupid:bigint, $hashvalue:bigint, count:bigint]                                                 
                         count := "presto.default.count"((count_246)) (1:56)                                                                                                                                    
                     - LocalExchange[PlanNodeId 1597][HASH][$hashvalue] (orderkey$gid, groupid) => [orderkey$gid:bigint, groupid:bigint, count_246:bigint, $hashvalue:bigint]                                   
                         - Aggregate(PARTIAL)[orderkey$gid, groupid][$hashvalue_247][PlanNodeId 1595] => [orderkey$gid:bigint, groupid:bigint, $hashvalue_247:bigint, count_246:bigint]                         
                                 count_246 := "presto.default.count"((expr_93)) (1:56)                                                                                                                          
                             - Project[PlanNodeId 1624][projectLocality = LOCAL] => [orderkey$gid:bigint, expr_93:integer, groupid:bigint, $hashvalue_247:bigint]                                               
                                     Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                             
                                     $hashvalue_247 := combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey$gid), BIGINT'0')), COALESCE($operator$hash_code(groupid), BIGINT'0')) (1:239) 
                                 - Values[PlanNodeId 1474] => [orderkey$gid:bigint, expr_93:integer, groupid:bigint]                                                                                            
                                         Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                         
             - Values[PlanNodeId 1478] => []                                                                                                                                                                    
                     Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                                             
                                                                                                                                                                                                                
                                                                                                                                                                                                                
(1 row)

When framework is off:

presto:tpch> set session exploit_constraints=false;
SET SESSION
presto:tpch> explain (type distributed) with t as (select orderkey, count(1) cnt from (select * from (select * from orders where 1=0) left join (select partkey, suppkey from lineitem where 1=0) on partkey=10 where suppkey is not null) group by rollup(orderkey)) select t1.orderkey, t1.cnt from t t1 cross join t t2;
                                                                                                         Query Plan                                                                                                >
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 Fragment 0 [SINGLE]                                                                                                                                                                                               >
     Output layout: [orderkey$gid, count]                                                                                                                                                                          >
     Output partitioning: SINGLE []                                                                                                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                                 >
     - Output[PlanNodeId 63][orderkey, cnt] => [orderkey$gid:bigint, count:bigint]                                                                                                                                 >
             orderkey := orderkey$gid (1:257)                                                                                                                                                                      >
             cnt := count (1:270)                                                                                                                                                                                  >
         - CrossJoin[PlanNodeId 370] => [orderkey$gid:bigint, count:bigint]                                                                                                                                        >
                 Distribution: REPLICATED                                                                                                                                                                          >
             - Project[PlanNodeId 24][projectLocality = LOCAL] => [orderkey$gid:bigint, count:bigint]                                                                                                              >
                 - Aggregate(FINAL)[orderkey$gid, groupid][$hashvalue][PlanNodeId 23] => [orderkey$gid:bigint, groupid:bigint, $hashvalue:bigint, count:bigint]                                                    >
                         count := "presto.default.count"((count_246)) (1:56)                                                                                                                                       >
                     - LocalExchange[PlanNodeId 1619][HASH][$hashvalue] (orderkey$gid, groupid) => [orderkey$gid:bigint, groupid:bigint, count_246:bigint, $hashvalue:bigint]                                      >
                         - Aggregate(PARTIAL)[orderkey$gid, groupid][$hashvalue_247][PlanNodeId 1617] => [orderkey$gid:bigint, groupid:bigint, $hashvalue_247:bigint, count_246:bigint]                            >
                                 count_246 := "presto.default.count"((expr_93)) (1:56)                                                                                                                             >
                             - Project[PlanNodeId 1673][projectLocality = LOCAL] => [orderkey$gid:bigint, expr_93:integer, groupid:bigint, $hashvalue_247:bigint]                                                  >
                                     Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                                >
                                     $hashvalue_247 := combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey$gid), BIGINT'0')), COALESCE($operator$hash_code(groupid), BIGINT'0')) (1:239)    >
                                 - Values[PlanNodeId 1474] => [orderkey$gid:bigint, expr_93:integer, groupid:bigint]                                                                                               >
                                         Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                            >
             - LocalExchange[PlanNodeId 1602][SINGLE] () => []                                                                                                                                                     >
                 - Project[PlanNodeId 53][projectLocality = LOCAL] => []                                                                                                                                           >
                     - Aggregate(FINAL)[orderkey$gid_225, groupid_226][$hashvalue_248][PlanNodeId 52] => [orderkey$gid_225:bigint, groupid_226:bigint, $hashvalue_248:bigint]                                      >
                         - LocalExchange[PlanNodeId 1629][HASH][$hashvalue_248] (orderkey$gid_225, groupid_226) => [orderkey$gid_225:bigint, groupid_226:bigint, $hashvalue_248:bigint]                            >
                             - Aggregate(PARTIAL)[orderkey$gid_225, groupid_226][$hashvalue_249][PlanNodeId 1627] => [orderkey$gid_225:bigint, groupid_226:bigint, $hashvalue_249:bigint]                          >
                                 - Project[PlanNodeId 1674][projectLocality = LOCAL] => [orderkey$gid_225:bigint, groupid_226:bigint, $hashvalue_249:bigint]                                                       >
                                         Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                            >
                                         $hashvalue_249 := combine_hash(combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey$gid_225), BIGINT'0')), COALESCE($operator$hash_code(groupid_226), BIGINT'0'))>
                                     - Values[PlanNodeId 1478] => [orderkey$gid_225:bigint, groupid_226:bigint]                                                                                                    >
                                             Estimates: {source: CostBasedSourceInfo, rows: 0 (0B), cpu: 0.00, memory: 0.00, network: 0.00}                                                                        >
                                                                                                                                                                                                                   >
                                                                                                                                                                                                                   >
(1 row)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable logical property framework by default

7 participants