Skip to content

Avoid computation of output size if underlying stats are not confident#21280

Closed
jaystarshot wants to merge 1 commit intoprestodb:masterfrom
jaystarshot:presto-confident-stats
Closed

Avoid computation of output size if underlying stats are not confident#21280
jaystarshot wants to merge 1 commit intoprestodb:masterfrom
jaystarshot:presto-confident-stats

Conversation

@jaystarshot
Copy link
Member

@jaystarshot jaystarshot commented Oct 31, 2023

Summary: Current stats framework expects all tables to have stats. It might not be true and if there are no stats, we should not calcuate some random output size estimation which is used in cost based rules like DetermineJoinDistributionType and Reorder Joins.
This change propagates the confidence as is downstream but we can change rules later to tailor confidence propagation downstream.

Test Plan:
Existing unit tests +
Shadow test

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Fixes output size estimation for plans with Empty table statistics 

Hive Changes
* ...
* ...

If release note is NOT required, use:

== NO RELEASE NOTE ==

@jaystarshot jaystarshot requested a review from a team as a code owner October 31, 2023 06:38
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main change

Copy link
Member Author

@jaystarshot jaystarshot Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagated up

@jaystarshot jaystarshot force-pushed the presto-confident-stats branch from 5d568aa to 9c9745c Compare October 31, 2023 06:44
@jaystarshot jaystarshot changed the title [WIP] Avoid computation of output size if underlying stats are not confident Avoid computation of output size if underlying stats are not confident Oct 31, 2023
@jaystarshot jaystarshot force-pushed the presto-confident-stats branch from 9c9745c to c687df9 Compare October 31, 2023 06:49
@jaystarshot jaystarshot changed the title Avoid computation of output size if underlying stats are not confident [DO NOT REVIEW] Avoid computation of output size if underlying stats are not confident Oct 31, 2023
@jaystarshot jaystarshot marked this pull request as draft October 31, 2023 06:53
@jaystarshot jaystarshot force-pushed the presto-confident-stats branch 3 times, most recently from 05d509d to ec7eeb5 Compare October 31, 2023 07:01
@jaystarshot jaystarshot changed the title [DO NOT REVIEW] Avoid computation of output size if underlying stats are not confident Avoid computation of output size if underlying stats are not confident Oct 31, 2023
Summary: Open source stats framework expects all tables to have stats but we in uber don't for now.

Test Plan:
Existing unit tests +
Shadow test - https://querybuilder.uberinternal.com/r/Bif6wGlLL/run/lj0NW5XEl

Reviewers: #ldap_presto-core, hitarth

Reviewed By: #ldap_presto-core, hitarth

Subscribers: hitarth, O4263 subscribe to presto changes

JIRA Issues: PRESTO-5669

Differential Revision: https://code.uberinternal.com/D11559869
@jaystarshot jaystarshot force-pushed the presto-confident-stats branch from ec7eeb5 to 6d30fa4 Compare October 31, 2023 07:55
@jaystarshot jaystarshot marked this pull request as ready for review October 31, 2023 09:19
Constraint<ColumnHandle> constraint = new Constraint<>(node.getCurrentConstraint());

TableStatistics tableStatistics = metadata.getTableStatistics(session, node.getTable(), ImmutableList.copyOf(node.getAssignments().values()), constraint);
if (tableStatistics.getRowCount().isUnknown()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no variable statistics, we could set the confidence to be false but not sure that would be effective

"subgraph cluster_1 {\n" +
"label = \"SOURCE\"\n" +
"plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (0B), cpu: ?, memory: ?, network: ?\\}\n" +
"plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (?), cpu: ?, memory: ?, network: ?\\}\n" +
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the size is 0B because the number of output variables is 0, i.e. no output at all, and outputing 0B sounds better here.

@mlyublena mlyublena requested a review from feilong-liu October 31, 2023 19:13
Copy link
Contributor

@feilong-liu feilong-liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that, this PR addresses the issue where the input table statistics are unknown, and we should not output any valid output size estimate. However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

The high level question I have here, in which cases, we will have valid output size estimate given the input table statistics is unknown?

Comment on lines +43 to +46
if (inputTableStatistics.stream().anyMatch(stat -> stat.getRowCount().isUnknown())) {
// return most recent run stats if input table stats were not found
return lastRunsStatistics.get(lastRunsStatistics.size() - 1).getPlanStatistics();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current logic, for case when input statistics for some table is unknown, it will match history which also has unknown statistics for the same table (and similar statistics for other table), otherwise will not match. This will make HBO to always return the latest run, even there are history with closer match, i.e. unknown for the same table and similar statistics for other tables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see makes sense, i can remove this

return planNodeStatsEstimate;
}
boolean confident = sourceStats.getStats(node.getSources().get(0)).isConfident();
for (PlanNode source : node.getSources()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confidence level should not only depends on the source inputs, for example, EnforceSingleRowNode node should always be confident that the output is one single row. We need to exclude these rules from this check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see maybe better to add this to an abstract method and override it in those rules

"subgraph cluster_1 {\n" +
"label = \"SOURCE\"\n" +
"plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (0B), cpu: ?, memory: ?, network: ?\\}\n" +
"plannode_1[label=\"{TableScan | [TableHandle \\{connectorId='connector_id', connectorHandle='com.facebook.presto.testing.TestingMetadata$TestingTableHandle@1af56f7', layout='Optional.empty'\\}]|Estimates: \\{rows: ? (?), cpu: ?, memory: ?, network: ?\\}\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the size is 0B because the number of output variables is 0, i.e. no output at all, and outputing 0B sounds better here.

@jaystarshot
Copy link
Member Author

jaystarshot commented Nov 2, 2023

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

@feilong-liu
Copy link
Contributor

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

@jaystarshot
Copy link
Member Author

jaystarshot commented Nov 2, 2023

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

@feilong-liu
Copy link
Contributor

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

@jaystarshot
Copy link
Member Author

jaystarshot commented Nov 2, 2023

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

Sure

Project1
Join2
          Join1
                   table1
                   table2
          table3

Lets say we have historical stats for Join1 (or table stats for table1, table2)
And we don't have any stats for table3 or Join2
Now during output size of Project1 or Join2, we will make some estimate of Join2 which shouldn't be confident and hence NAN

@feilong-liu
Copy link
Contributor

However, when we have input statistics to be unknown, is it that the case that all downstream plan nodes will have unknown estimates as well in current production?

Yes according to the current implementation. Unless some downstream plan has historical stats. (StatsProvider will provide these which are always confident -here)

in which cases, we will have valid output size estimate given the input table statistics is unknown?

I think only in cases where intermediate plan nodes have historical statistics

If the output size is from historical statistics, then downstream plan nodes can use these statistics to estimate their output size, and we do not need the change here?

Indeed, that's true. However, this solution addresses situations in which historical statistics are accessible for one side of the upstream plan, while even table statistics are unavailable for the other side. Currently, we resort to random estimations in such cases. The purpose of this pull request or discussion is to rectify this issue.

Can you give an example of this? I just do not understand why this will happen and giving an example will be very helpful.

Sure

Project1
Join2
          Join1
                   table1
                   table2
          table3

Lets say we have historical stats for Join1 (or table stats for table1, table2) And we don't have any stats for table3 or Join2 Now during output size of Project1 or Join2, we will make some estimate of Join2 which shouldn't be confident and hence NAN

Can you give a working example to reproduce what you described above?
I tried a query which is similar but not exactly the same, here in Fragment 1, the probe side has valid estimates, and build side is unknown, the estimate for the join is unknown.

presto:tpch> explain (type distributed) select * from lineitem l join orders o on l.orderkey = o.orderkey join (select * from customer cross join unnest(array[1, 2, 3]) t(idx)) t1 on o.custkey=t1.custkey;
                                                                                                                                                                                                   >
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------->
 Fragment 0 [SINGLE]                                                                                                                                                                               >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, or>
     Output partitioning: SINGLE []                                                                                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Output[PlanNodeId 19][orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, com>
             comment := comment_1 (1:28)                                                                                                                                                           >
             comment := comment_7 (1:28)                                                                                                                                                           >
             idx := field (1:28)                                                                                                                                                                   >
         - RemoteSource[1] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), line>
                                                                                                                                                                                                   >
 Fragment 1 [HASH]                                                                                                                                                                                 >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, cu>
     Output partitioning: SINGLE []                                                                                                                                                                >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - InnerJoin[PlanNodeId 14][("custkey" = "custkey_6")][$hashvalue, $hashvalue_192] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:doub>
             Distribution: PARTITIONED                                                                                                                                                             >
         - RemoteSource[2] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), line>
         - LocalExchange[PlanNodeId 571][HASH][$hashvalue_192] (custkey_6) => [custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktseg>
             - RemoteSource[5] => [custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktsegment:varchar(10), comment_7:varchar(117), fi>
                                                                                                                                                                                                   >
 Fragment 2 [HASH]                                                                                                                                                                                 >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, cu>
     Output partitioning: HASH [custkey][$hashvalue_191]                                                                                                                                           >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Project[PlanNodeId 630][projectLocality = LOCAL] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double>
             Estimates: {source: CostBasedSourceInfo, rows: 58490 (15.77MB), cpu: 81249792.98, memory: 2083552.00, network: 11823037.00}                                                           >
             $hashvalue_191 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey), BIGINT'0')) (1:58)                                                                                   >
         - InnerJoin[PlanNodeId 458][("orderkey" = "orderkey_0")][$hashvalue_186, $hashvalue_188] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extende>
                 Estimates: {source: CostBasedSourceInfo, rows: 58490 (15.77MB), cpu: 64711251.85, memory: 2083552.00, network: 11823037.00}                                                       >
                 Distribution: PARTITIONED                                                                                                                                                         >
             - RemoteSource[3] => [orderkey:bigint, partkey:bigint, suppkey:bigint, linenumber:integer, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:varchar(1), >
             - LocalExchange[PlanNodeId 570][HASH][$hashvalue_188] (orderkey_0) => [orderkey_0:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:var>
                     Estimates: {source: CostBasedSourceInfo, rows: 15000 (6.98MB), cpu: 8199208.00, memory: 0.00, network: 2083552.00}                                                            >
                 - RemoteSource[4] => [orderkey_0:bigint, custkey:bigint, orderstatus:varchar(1), totalprice:double, orderdate:date, orderpriority:varchar(15), clerk:varchar(15), shippriority:int>
                                                                                                                                                                                                   >
 Fragment 3 [SOURCE]                                                                                                                                                                               >
     Output layout: [orderkey, partkey, suppkey, linenumber, quantity, extendedprice, discount, tax, returnflag, linestatus, shipdate, commitdate, receiptdate, shipinstruct, shipmode, comment, $h>
     Output partitioning: HASH [orderkey][$hashvalue_187]                                                                                                                                          >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - ScanProject[PlanNodeId 0,628][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=lineitem, analyzePartitionValues=Optional.empty}', layout>
             Estimates: {source: CostBasedSourceInfo, rows: 60175 (9.29MB), cpu: 9197910.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 60175 (9.29MB), cpu: 18937395.00, mem>
             $hashvalue_187 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey), BIGINT'0')) (1:42)                                                                                  >
             LAYOUT: tpch.lineitem{}                                                                                                                                                               >
             linenumber := linenumber:int:3:REGULAR (1:42)                                                                                                                                         >
             partkey := partkey:bigint:1:REGULAR (1:42)                                                                                                                                            >
             shipdate := shipdate:date:10:REGULAR (1:42)                                                                                                                                           >
             quantity := quantity:double:4:REGULAR (1:42)                                                                                                                                          >
             receiptdate := receiptdate:date:12:REGULAR (1:42)                                                                                                                                     >
             orderkey := orderkey:bigint:0:REGULAR (1:42)                                                                                                                                          >
             shipinstruct := shipinstruct:varchar(25):13:REGULAR (1:42)                                                                                                                            >
             returnflag := returnflag:varchar(1):8:REGULAR (1:42)                                                                                                                                  >
             commitdate := commitdate:date:11:REGULAR (1:42)                                                                                                                                       >
             discount := discount:double:6:REGULAR (1:42)                                                                                                                                          >
             shipmode := shipmode:varchar(10):14:REGULAR (1:42)                                                                                                                                    >
             suppkey := suppkey:bigint:2:REGULAR (1:42)                                                                                                                                            >
             tax := tax:double:7:REGULAR (1:42)                                                                                                                                                    >
             extendedprice := extendedprice:double:5:REGULAR (1:42)                                                                                                                                >
             comment := comment:varchar(44):15:REGULAR (1:42)                                                                                                                                      >
             linestatus := linestatus:varchar(1):9:REGULAR (1:42)                                                                                                                                  >
                                                                                                                                                                                                   >
 Fragment 4 [SOURCE]                                                                                                                                                                               >
     Output layout: [orderkey_0, custkey, orderstatus, totalprice, orderdate, orderpriority, clerk, shippriority, comment_1, $hashvalue_190]                                                       >
     Output partitioning: HASH [orderkey_0][$hashvalue_190]                                                                                                                                        >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - ScanProject[PlanNodeId 1,629][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=orders, analyzePartitionValues=Optional.empty}', layout='>
             Estimates: {source: CostBasedSourceInfo, rows: 15000 (1.99MB), cpu: 1948552.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 15000 (1.99MB), cpu: 4032104.00, memo>
             $hashvalue_190 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(orderkey_0), BIGINT'0')) (1:58)                                                                                >
             LAYOUT: tpch.orders{}                                                                                                                                                                 >
             orderpriority := orderpriority:varchar(15):5:REGULAR (1:58)                                                                                                                           >
             orderstatus := orderstatus:varchar(1):2:REGULAR (1:58)                                                                                                                                >
             shippriority := shippriority:int:7:REGULAR (1:58)                                                                                                                                     >
             totalprice := totalprice:double:3:REGULAR (1:58)                                                                                                                                      >
             orderkey_0 := orderkey:bigint:0:REGULAR (1:58)                                                                                                                                        >
             custkey := custkey:bigint:1:REGULAR (1:58)                                                                                                                                            >
             comment_1 := comment:varchar(79):8:REGULAR (1:58)                                                                                                                                     >
             clerk := clerk:varchar(15):6:REGULAR (1:58)                                                                                                                                           >
             orderdate := orderdate:date:4:REGULAR (1:58)                                                                                                                                          >
                                                                                                                                                                                                   >
 Fragment 5 [SOURCE]                                                                                                                                                                               >
     Output layout: [custkey_6, name, address, nationkey, phone, acctbal, mktsegment, comment_7, field, $hashvalue_194]                                                                            >
     Output partitioning: HASH [custkey_6][$hashvalue_194]                                                                                                                                         >
     Stage Execution Strategy: UNGROUPED_EXECUTION                                                                                                                                                 >
     - Unnest[PlanNodeId 7][replicate=custkey_6:bigint, name:varchar(25), address:varchar(40), nationkey:bigint, phone:varchar(15), acctbal:double, mktsegment:varchar(10), comment_7:varchar(117),>
         - ScanProject[PlanNodeId 5,6][table = TableHandle {connectorId='hive', connectorHandle='HiveTableHandle{schemaName=tpch, tableName=customer, analyzePartitionValues=Optional.empty}', layo>
                 Estimates: {source: CostBasedSourceInfo, rows: 1500 (301.62kB), cpu: 287855.00, memory: 0.00, network: 0.00}/{source: CostBasedSourceInfo, rows: 1500 (301.62kB), cpu: 665710.00, >
                 expr_11 := [Block: position count: 3; size: 68 bytes]                                                                                                                             >
                 $hashvalue_194 := combine_hash(BIGINT'0', COALESCE($operator$hash_code(custkey_6), BIGINT'0')) (1:114)                                                                            >
                 LAYOUT: tpch.customer{}                                                                                                                                                           >
                 nationkey := nationkey:bigint:3:REGULAR (1:114)                                                                                                                                   >
                 name := name:varchar(25):1:REGULAR (1:114)                                                                                                                                        >
                 custkey_6 := custkey:bigint:0:REGULAR (1:114)                                                                                                                                     >
                 comment_7 := comment:varchar(117):7:REGULAR (1:114)                                                                                                                               >
                 acctbal := acctbal:double:5:REGULAR (1:114)                                                                                                                                       >
                 phone := phone:varchar(15):4:REGULAR (1:114)                                                                                                                                      >
                 mktsegment := mktsegment:varchar(10):6:REGULAR (1:114)                                                                                                                            >
                 address := address:varchar(40):2:REGULAR (1:114)                                                                                                                                  >
                                                                                                                                                                                                   >
                                                                                                                                                                                                   >
(1 row)

@jaystarshot
Copy link
Member Author

I think the output estimation of the second join will be unknown in my previous example.
But the sides can be reversed or it can be a replicated join if the right side of the join has unknown stats. We had observed such in production. I will try to add a sharable test case

@jaystarshot
Copy link
Member Author

Closing this since @feilong-liu has a planned improvement over this PR in #22791

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants