Avoid computation of output size if underlying stats are not confident by jaystarshot · Pull Request #21280 · prestodb/presto

jaystarshot · 2023-10-31T06:38:52Z

Summary: Current stats framework expects all tables to have stats. It might not be true and if there are no stats, we should not calcuate some random output size estimation which is used in cost based rules like DetermineJoinDistributionType and Reorder Joins.
This change propagates the confidence as is downstream but we can change rules later to tailor confidence propagation downstream.

Test Plan:
Existing unit tests +
Shadow test

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fixes output size estimation for plans with Empty table statistics 

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

jaystarshot · 2023-10-31T06:40:54Z

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeStatsEstimate.java

Main change

jaystarshot · 2023-10-31T06:42:39Z

presto-main/src/main/java/com/facebook/presto/cost/SimpleStatsRule.java

Propagated up

...t/java/com/facebook/presto/sql/planner/iterative/rule/TestDetermineJoinDistributionType.java

Summary: Open source stats framework expects all tables to have stats but we in uber don't for now. Test Plan: Existing unit tests + Shadow test - https://querybuilder.uberinternal.com/r/Bif6wGlLL/run/lj0NW5XEl Reviewers: #ldap_presto-core, hitarth Reviewed By: #ldap_presto-core, hitarth Subscribers: hitarth, O4263 subscribe to presto changes JIRA Issues: PRESTO-5669 Differential Revision: https://code.uberinternal.com/D11559869

jaystarshot · 2023-10-31T10:31:16Z

presto-main/src/main/java/com/facebook/presto/cost/TableScanStatsRule.java

        Constraint<ColumnHandle> constraint = new Constraint<>(node.getCurrentConstraint());

        TableStatistics tableStatistics = metadata.getTableStatistics(session, node.getTable(), ImmutableList.copyOf(node.getAssignments().values()), constraint);
+        if (tableStatistics.getRowCount().isUnknown()) {


If no variable statistics, we could set the confidence to be false but not sure that would be effective

jaystarshot · 2023-10-31T10:31:35Z

presto-main/src/test/java/com/facebook/presto/util/TestGraphvizPrinter.java

                "subgraph cluster_1 {\n" +
                "label = \"SOURCE\"\n" +
-                "plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (0B), cpu: ?, memory: ?, network: ?\\}\n" +
+                "plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (?), cpu: ?, memory: ?, network: ?\\}\n" +


Good change

In this case, the size is 0B because the number of output variables is 0, i.e. no output at all, and outputing 0B sounds better here.

feilong-liu

My understanding is that, this PR addresses the issue where the input table statistics are unknown, and we should not output any valid output size estimate. However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

The high level question I have here, in which cases, we will have valid output size estimate given the input table statistics is unknown?

feilong-liu · 2023-11-01T17:43:46Z

presto-main/src/main/java/com/facebook/presto/cost/HistoricalPlanStatisticsUtil.java

+        if (inputTableStatistics.stream().anyMatch(stat -> stat.getRowCount().isUnknown())) {
+            // return most recent run stats if input table stats were not found
+            return lastRunsStatistics.get(lastRunsStatistics.size() - 1).getPlanStatistics();
+        }


In current logic, for case when input statistics for some table is unknown, it will match history which also has unknown statistics for the same table (and similar statistics for other table), otherwise will not match. This will make HBO to always return the latest run, even there are history with closer match, i.e. unknown for the same table and similar statistics for other tables.

I see makes sense, i can remove this

feilong-liu · 2023-11-01T18:01:00Z

presto-main/src/main/java/com/facebook/presto/cost/SimpleStatsRule.java

+            return planNodeStatsEstimate;
+        }
+        boolean confident = sourceStats.getStats(node.getSources().get(0)).isConfident();
+        for (PlanNode source : node.getSources()) {


Confidence level should not only depends on the source inputs, for example, EnforceSingleRowNode node should always be confident that the output is one single row. We need to exclude these rules from this check.

I see maybe better to add this to an abstract method and override it in those rules

feilong-liu · 2023-11-01T18:06:49Z

presto-main/src/test/java/com/facebook/presto/util/TestGraphvizPrinter.java

                "subgraph cluster_1 {\n" +
                "label = \"SOURCE\"\n" +
-                "plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (0B), cpu: ?, memory: ?, network: ?\\}\n" +
+                "plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (?), cpu: ?, memory: ?, network: ?\\}\n" +


In this case, the size is 0B because the number of output variables is 0, i.e. no output at all, and outputing 0B sounds better here.

jaystarshot · 2023-11-02T04:11:50Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

feilong-liu · 2023-11-02T04:33:42Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

jaystarshot · 2023-11-02T04:59:09Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

feilong-liu · 2023-11-02T06:19:48Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

jaystarshot · 2023-11-02T06:29:21Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

Sure

Project1
Join2
          Join1
                   table1
                   table2
          table3

Lets say we have historical stats for Join1 (or table stats for table1, table2)
And we don't have any stats for table3 or Join2
Now during output size of Project1 or Join2, we will make some estimate of Join2 which shouldn't be confident and hence NAN

feilong-liu · 2023-11-02T20:19:48Z

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

Sure
Project1
Join2
          Join1
                   table1
                   table2
          table3
Lets say we have historical stats for Join1 (or table stats for table1, table2) And we don't have any stats for table3 or Join2 Now during output size of Project1 or Join2, we will make some estimate of Join2 which shouldn't be confident and hence NAN

Can you give a working example to reproduce what you described above?
I tried a query which is similar but not exactly the same, here in Fragment 1, the probe side has valid estimates, and build side is unknown, the estimate for the join is unknown.

presto:tpch> explain (type distributed) select * from lineitem l join orders o on l.orderkey = o.orderkey join (select * from customer cross join unnest(array[1, 2, 3]) t(idx)) t1 on o.custkey=t1.custkey;
                                                                                                                                                                                                   >
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 Fragment 0 [SINGLE]                                                                                                                                                                               >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, or>
     Output partitioning: SINGLE []                                                                                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Output[PlanNodeId 19][orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, com>
             comment := comment_1 (1:28)                                                                                                                                                           >
             comment := comment_7 (1:28)                                                                                                                                                           >
             idx := field (1:28)                                                                                                                                                                   >
         - RemoteSource[1] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), line>
                                                                                                                                                                                                   >
 Fragment 1 [HASH]                                                                                                                                                                                 >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, cu>
     Output partitioning: SINGLE []                                                                                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - InnerJoin[PlanNodeId 14][("custkey" = "custkey_6")][$hashvalue, $hashvalue_192] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:doub>
             Distribution: PARTITIONED                                                                                                                                                             >
         - RemoteSource[2] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), line>
         - LocalExchange[PlanNodeId 571][HASH][$hashvalue_192] (custkey_6) => [custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktseg>
             - RemoteSource[5] => [custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktsegment:varchar(10), comment_7:varchar(117), fi>
                                                                                                                                                                                                   >
 Fragment 2 [HASH]                                                                                                                                                                                 >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, cu>
     Output partitioning: HASH [custkey][$hashvalue_191]                                                                                                                                           >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Project[PlanNodeId 630][projectLocality = LOCAL] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double>
             Estimates: {source: CostBasedSourceInfo, rows: 58490 (15.77MB), cpu: 81249792.98, memory: 2083552.00, network: 11823037.00}                                                           >
             $hashvalue_191 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:58)                                                                                   >
         - InnerJoin[PlanNodeId 458][("orderkey" = "orderkey_0")][$hashvalue_186, $hashvalue_188] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extende>
                 Estimates: {source: CostBasedSourceInfo, rows: 58490 (15.77MB), cpu: 64711251.85, memory: 2083552.00, network: 11823037.00}                                                       >
                 Distribution: PARTITIONED                                                                                                                                                         >
             - RemoteSource[3] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), >
             - LocalExchange[PlanNodeId 570][HASH][$hashvalue_188] (orderkey_0) => [orderkey_0:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:var>
                     Estimates: {source: CostBasedSourceInfo, rows: 15000 (6.98MB), cpu: 8199208.00, memory: 0.00, network: 2083552.00}                                                            >
                 - RemoteSource[4] => [orderkey_0:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:int>
                                                                                                                                                                                                   >
 Fragment 3 [SOURCE]                                                                                                                                                                               >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, $h>
     Output partitioning: HASH [orderkey][$hashvalue_187]                                                                                                                                          >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - ScanProject[PlanNodeId 0,628][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout>
             Estimates: {source: CostBasedSourceInfo, rows: 60175 (9.29MB), cpu: 9197910.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 60175 (9.29MB), cpu: 18937395.00, mem>
             $hashvalue_187 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey), BIGINT'0')) (1:42)                                                                                  >
             LAYOUT: tpch.lineitem{}                                                                                                                                                               >
             linenumber := linenumber:int:3:REGULAR (1:42)                                                                                                                                         >
             partkey := partkey:bigint:1:REGULAR (1:42)                                                                                                                                            >
             shipdate := shipdate:date:10:REGULAR (1:42)                                                                                                                                           >
             quantity := quantity:double:4:REGULAR (1:42)                                                                                                                                          >
             receiptdate := receiptdate:date:12:REGULAR (1:42)                                                                                                                                     >
             orderkey := orderkey:bigint:0:REGULAR (1:42)                                                                                                                                          >
             shipinstruct := shipinstruct:varchar(25):13:REGULAR (1:42)                                                                                                                            >
             returnflag := returnflag:varchar(1):8:REGULAR (1:42)                                                                                                                                  >
             commitdate := commitdate:date:11:REGULAR (1:42)                                                                                                                                       >
             discount := discount:double:6:REGULAR (1:42)                                                                                                                                          >
             shipmode := shipmode:varchar(10):14:REGULAR (1:42)                                                                                                                                    >
             suppkey := suppkey:bigint:2:REGULAR (1:42)                                                                                                                                            >
             tax := tax:double:7:REGULAR (1:42)                                                                                                                                                    >
             extendedprice := extendedprice:double:5:REGULAR (1:42)                                                                                                                                >
             comment := comment:varchar(44):15:REGULAR (1:42)                                                                                                                                      >
             linestatus := linestatus:varchar(1):9:REGULAR (1:42)                                                                                                                                  >
                                                                                                                                                                                                   >
 Fragment 4 [SOURCE]                                                                                                                                                                               >
     Output layout: [orderkey_0, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment_1, $hashvalue_190]                                                       >
     Output partitioning: HASH [orderkey_0][$hashvalue_190]                                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - ScanProject[PlanNodeId 1,629][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitionValues=Optional.empty}', layout='>
             Estimates: {source: CostBasedSourceInfo, rows: 15000 (1.99MB), cpu: 1948552.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 15000 (1.99MB), cpu: 4032104.00, memo>
             $hashvalue_190 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey_0), BIGINT'0')) (1:58)                                                                                >
             LAYOUT: tpch.orders{}                                                                                                                                                                 >
             orderpriority := orderpriority:varchar(15):5:REGULAR (1:58)                                                                                                                           >
             orderstatus := orderstatus:varchar(1):2:REGULAR (1:58)                                                                                                                                >
             shippriority := shippriority:int:7:REGULAR (1:58)                                                                                                                                     >
             totalprice := totalprice:double:3:REGULAR (1:58)                                                                                                                                      >
             orderkey_0 := orderkey:bigint:0:REGULAR (1:58)                                                                                                                                        >
             custkey := custkey:bigint:1:REGULAR (1:58)                                                                                                                                            >
             comment_1 := comment:varchar(79):8:REGULAR (1:58)                                                                                                                                     >
             clerk := clerk:varchar(15):6:REGULAR (1:58)                                                                                                                                           >
             orderdate := orderdate:date:4:REGULAR (1:58)                                                                                                                                          >
                                                                                                                                                                                                   >
 Fragment 5 [SOURCE]                                                                                                                                                                               >
     Output layout: [custkey_6, name, address, nationkey, phone, acctbal, mktsegment, comment_7, field, $hashvalue_194]                                                                            >
     Output partitioning: HASH [custkey_6][$hashvalue_194]                                                                                                                                         >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Unnest[PlanNodeId 7][replicate=custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktsegment:varchar(10), comment_7:varchar(117),>
         - ScanProject[PlanNodeId 5,6][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=customer, analyzePartitionValues=Optional.empty}', layo>
                 Estimates: {source: CostBasedSourceInfo, rows: 1500 (301.62kB), cpu: 287855.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1500 (301.62kB), cpu: 665710.00, >
                 expr_11 := [Block: position count: 3; size: 68 bytes]                                                                                                                             >
                 $hashvalue_194 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_6), BIGINT'0')) (1:114)                                                                            >
                 LAYOUT: tpch.customer{}                                                                                                                                                           >
                 nationkey := nationkey:bigint:3:REGULAR (1:114)                                                                                                                                   >
                 name := name:varchar(25):1:REGULAR (1:114)                                                                                                                                        >
                 custkey_6 := custkey:bigint:0:REGULAR (1:114)                                                                                                                                     >
                 comment_7 := comment:varchar(117):7:REGULAR (1:114)                                                                                                                               >
                 acctbal := acctbal:double:5:REGULAR (1:114)                                                                                                                                       >
                 phone := phone:varchar(15):4:REGULAR (1:114)                                                                                                                                      >
                 mktsegment := mktsegment:varchar(10):6:REGULAR (1:114)                                                                                                                            >
                 address := address:varchar(40):2:REGULAR (1:114)                                                                                                                                  >
                                                                                                                                                                                                   >
                                                                                                                                                                                                   >
(1 row)

jaystarshot · 2023-11-03T02:38:02Z

I think the output estimation of the second join will be unknown in my previous example.
But the sides can be reversed or it can be a replicated join if the right side of the join has unknown stats. We had observed such in production. I will try to add a sharable test case

jaystarshot · 2024-05-21T02:05:46Z

Closing this since @feilong-liu has a planned improvement over this PR in #22791

jaystarshot requested a review from a team as a code owner October 31, 2023 06:38

jaystarshot requested a review from presto-oss October 31, 2023 06:38

jaystarshot commented Oct 31, 2023

View reviewed changes

presto-main/src/main/java/com/facebook/presto/cost/PlanNodeStatsEstimate.java Outdated

Copy link

Member Author

jaystarshot Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change

jaystarshot commented Oct 31, 2023

View reviewed changes

presto-main/src/main/java/com/facebook/presto/cost/SimpleStatsRule.java Outdated

Copy link

Member Author

jaystarshot Oct 31, 2023 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagated up

jaystarshot force-pushed the presto-confident-stats branch from 5d568aa to 9c9745c Compare October 31, 2023 06:44

jaystarshot changed the title ~~[WIP] Avoid computation of output size if underlying stats are not confident~~ Avoid computation of output size if underlying stats are not confident Oct 31, 2023

jaystarshot commented Oct 31, 2023

View reviewed changes

...t/java/com/facebook/presto/sql/planner/iterative/rule/TestDetermineJoinDistributionType.java Outdated Show resolved Hide resolved

jaystarshot force-pushed the presto-confident-stats branch from 9c9745c to c687df9 Compare October 31, 2023 06:49

jaystarshot changed the title ~~Avoid computation of output size if underlying stats are not confident~~ [DO NOT REVIEW] Avoid computation of output size if underlying stats are not confident Oct 31, 2023

jaystarshot marked this pull request as draft October 31, 2023 06:53

jaystarshot force-pushed the presto-confident-stats branch 3 times, most recently from 05d509d to ec7eeb5 Compare October 31, 2023 07:01

jaystarshot changed the title ~~[DO NOT REVIEW] Avoid computation of output size if underlying stats are not confident~~ Avoid computation of output size if underlying stats are not confident Oct 31, 2023

jaystarshot force-pushed the presto-confident-stats branch from ec7eeb5 to 6d30fa4 Compare October 31, 2023 07:55

jaystarshot marked this pull request as ready for review October 31, 2023 09:19

jaystarshot requested review from mlyublena and pranjalssh October 31, 2023 09:19

jaystarshot commented Oct 31, 2023

View reviewed changes

mlyublena requested a review from feilong-liu October 31, 2023 19:13

feilong-liu reviewed Nov 1, 2023

View reviewed changes

jaystarshot mentioned this pull request May 7, 2024

Broadcast join if build estimation is small and from HBO #22681

Merged

6 tasks

jaystarshot closed this May 21, 2024

Conversation

jaystarshot commented Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaystarshot Oct 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

feilong-liu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaystarshot commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feilong-liu commented Nov 2, 2023

Uh oh!

jaystarshot commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feilong-liu commented Nov 2, 2023

Uh oh!

jaystarshot commented Nov 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

feilong-liu commented Nov 2, 2023

Uh oh!

jaystarshot commented Nov 3, 2023

Uh oh!

jaystarshot commented May 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaystarshot commented Oct 31, 2023 •

edited

Loading

jaystarshot Oct 31, 2023 •

edited

Loading

jaystarshot commented Nov 2, 2023 •

edited

Loading

jaystarshot commented Nov 2, 2023 •

edited

Loading

jaystarshot commented Nov 2, 2023 •

edited

Loading