feat(optimizer): Add exchange on table scan when number of files to be scanned is small#26941
Conversation
Reviewer's GuideAdds file-count-aware table statistics and new optimizer/session configuration to optionally insert a shuffle (round-robin exchange) above table scans when tables have few files, plus corresponding tests and plumbing across stats, cost, and planning code. Sequence diagram for planning table scan shuffle based on file countsequenceDiagram
actor Planner
participant Session
participant SystemSessionProperties
participant AddExchanges
participant Metadata
participant HiveStatsProvider as MetastoreHiveStatisticsProvider
participant TableStatistics
Planner->>AddExchanges: planTableScan(tableScanNode, predicate)
AddExchanges->>Session: getProperty(TABLE_SCAN_SHUFFLE_STRATEGY)
Session-->>AddExchanges: shuffleStrategy
AddExchanges->>SystemSessionProperties: getTableScanShuffleStrategy(session)
SystemSessionProperties-->>AddExchanges: shuffleStrategy
alt shuffleStrategy == DISABLED
AddExchanges-->>Planner: plan without shuffle (existing behavior)
else shuffleStrategy == ALWAYS_ENABLED
AddExchanges->>AddExchanges: roundRobinExchange(REMOTE_STREAMING, tableScanPlan)
AddExchanges-->>Planner: plan with shuffle
else shuffleStrategy == FILE_COUNT_BASED
AddExchanges->>SystemSessionProperties: getTableScanShuffleFileCountThreshold(session)
SystemSessionProperties-->>AddExchanges: fileCountThreshold
AddExchanges->>Metadata: getTableStatistics(session, tableHandle, assignmentsColumns, constraint)
Metadata->>HiveStatsProvider: computeTableStatistics(partitions, statistics)
HiveStatsProvider->>HiveStatsProvider: calculateAverageFileCountPerPartition(statistics)
HiveStatsProvider->>TableStatistics: TableStatistics.Builder.setFileCount(Estimate.of(totalFileCount))
HiveStatsProvider-->>Metadata: TableStatistics
Metadata-->>AddExchanges: TableStatistics
AddExchanges->>TableStatistics: getFileCount()
TableStatistics-->>AddExchanges: fileCountEstimate
alt fileCount known and < fileCountThreshold
AddExchanges->>AddExchanges: roundRobinExchange(REMOTE_STREAMING, tableScanPlan)
else fileCount unknown or >= threshold
AddExchanges->>AddExchanges: keep original table scan plan
end
AddExchanges-->>Planner: final plan
end
Updated class diagram for table statistics, features config, and planner shuffle logicclassDiagram
class TableStatistics {
- Estimate rowCount
- Estimate totalSize
- Estimate fileCount
- Map~ColumnHandle, ColumnStatistics~ columnStatistics
- ConfidenceLevel confidenceLevel
+ static TableStatistics empty()
+ Estimate getRowCount()
+ Estimate getTotalSize()
+ Estimate getFileCount()
+ ConfidenceLevel getConfidence()
+ Map~ColumnHandle, ColumnStatistics~ getColumnStatistics()
+ static TableStatistics.Builder buildFrom(TableStatistics tableStatistics)
}
class TableStatistics_Builder {
- Estimate rowCount
- Estimate totalSize
- Estimate fileCount
- Map~ColumnHandle, ColumnStatistics~ columnStatisticsMap
- ConfidenceLevel confidenceLevel
+ TableStatistics_Builder setRowCount(Estimate rowCount)
+ TableStatistics_Builder setTotalSize(Estimate totalSize)
+ TableStatistics_Builder setFileCount(Estimate fileCount)
+ TableStatistics_Builder setColumnStatistics(ColumnHandle columnHandle, ColumnStatistics columnStatistics)
+ TableStatistics_Builder setColumnStatistics(Map~ColumnHandle, ColumnStatistics~ columnStatistics)
+ TableStatistics_Builder setConfidenceLevel(ConfidenceLevel confidenceLevel)
+ Map~ColumnHandle, ColumnStatistics~ getColumnStatistics()
+ TableStatistics build()
}
TableStatistics *-- TableStatistics_Builder : built_via
class FeaturesConfig {
- boolean pushdownSubfieldForMapFunctions
- long maxSerializableObjectSize
- boolean utilizeUniquePropertyInQueryPlanning
- int tableScanShuffleFileCountThreshold
- ShuffleForTableScanStrategy tableScanShuffleStrategy
+ long getMaxSerializableObjectSize()
+ int getTableScanShuffleFileCountThreshold()
+ FeaturesConfig setTableScanShuffleFileCountThreshold(int tableScanShuffleFileCountThreshold)
+ ShuffleForTableScanStrategy getTableScanShuffleStrategy()
+ FeaturesConfig setTableScanShuffleStrategy(ShuffleForTableScanStrategy tableScanShuffleStrategy)
}
class ShuffleForTableScanStrategy {
<<enumeration>>
DISABLED
ALWAYS_ENABLED
FILE_COUNT_BASED
}
FeaturesConfig --> ShuffleForTableScanStrategy : uses
class SystemSessionProperties {
<<utility>>
+ static String TABLE_SCAN_SHUFFLE_FILE_COUNT_THRESHOLD
+ static String TABLE_SCAN_SHUFFLE_STRATEGY
+ static int getTableScanShuffleFileCountThreshold(Session session)
+ static ShuffleForTableScanStrategy getTableScanShuffleStrategy(Session session)
}
SystemSessionProperties --> ShuffleForTableScanStrategy : returns
class AddExchanges {
- PlanNodeIdAllocator idAllocator
+ PlanWithProperties planTableScan(TableScanNode node, RowExpression predicate, boolean nativeExecution)
}
AddExchanges --> SystemSessionProperties : reads_session_properties
AddExchanges --> TableStatistics : reads_fileCount
class MetastoreHiveStatisticsProvider {
+ static TableStatistics getTableStatistics(Table table, List~Partition~ partitions, Map~Partition, PartitionStatistics~ statistics)
+ static OptionalDouble calculateAverageSizePerPartition(Collection~PartitionStatistics~ statistics)
+ static OptionalDouble calculateAverageFileCountPerPartition(Collection~PartitionStatistics~ statistics)
}
MetastoreHiveStatisticsProvider --> TableStatistics_Builder : builds
class ConnectorFilterStatsCalculatorService {
+ TableStatistics filterTableStatistics(TableStatistics tableStatistics, RowExpression filter)
}
ConnectorFilterStatsCalculatorService --> TableStatistics_Builder : setFileCount
ConnectorFilterStatsCalculatorService --> TableStatistics : consumes
class Estimate {
+ static Estimate unknown()
+ static Estimate of(double value)
+ boolean isUnknown()
+ double getValue()
}
TableStatistics --> Estimate : uses
TableStatistics_Builder --> Estimate : uses
MetastoreHiveStatisticsProvider --> Estimate : uses
ConnectorFilterStatsCalculatorService --> Estimate : uses
Flow diagram for shuffle insertion above table scanflowchart TD
A[Start planning table scan] --> B{nativeExecution and containsSystemTableScan?}
B -->|Yes| C[Add gatheringExchange REMOTE_STREAMING]
C --> Z[Return plan]
B -->|No| D{Shuffle strategy from session}
D -->|DISABLED| Z
D -->|ALWAYS_ENABLED| E[Add roundRobinExchange REMOTE_STREAMING]
E --> Z
D -->|FILE_COUNT_BASED| F[Read tableScanShuffleFileCountThreshold]
F --> G[Fetch TableStatistics for table scan]
G --> H{tableStatistics.fileCount is known?}
H -->|No| Z
H -->|Yes| I{fileCount < threshold?}
I -->|No| Z
I -->|Yes| J[Add roundRobinExchange REMOTE_STREAMING]
J --> Z[Return plan]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In ConnectorFilterStatsCalculatorService,
fileCountis currently propagated unchanged after applying a filter, which may be misleading when filters drastically reduce row counts; consider scalingfileCountby the same selectivity factor astotalSizeor explicitly documenting the intended semantics. - The
FILE_COUNT_BASEDbranch in AddExchanges unconditionally callsmetadata.getTableStatisticsfor each table scan when the strategy is enabled; if this proves expensive, consider short‑circuiting when the threshold is non‑positive or when existing statistics are already available higher up in planning rather than always re-fetching them.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In ConnectorFilterStatsCalculatorService, `fileCount` is currently propagated unchanged after applying a filter, which may be misleading when filters drastically reduce row counts; consider scaling `fileCount` by the same selectivity factor as `totalSize` or explicitly documenting the intended semantics.
- The `FILE_COUNT_BASED` branch in AddExchanges unconditionally calls `metadata.getTableStatistics` for each table scan when the strategy is enabled; if this proves expensive, consider short‑circuiting when the threshold is non‑positive or when existing statistics are already available higher up in planning rather than always re-fetching them.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java:3303-3306` </location>
<code_context>
+
+ @Config("optimizer.table-scan-shuffle-file-count-threshold")
+ @ConfigDescription("File count threshold for adding a shuffle above table scan. When the table has fewer files than this threshold and TABLE_SCAN_SHUFFLE_STRATEGY is FILE_COUNT_BASED, a round-robin shuffle exchange is added above the table scan to redistribute data.")
+ public FeaturesConfig setTableScanShuffleFileCountThreshold(int tableScanShuffleFileCountThreshold)
+ {
+ this.tableScanShuffleFileCountThreshold = tableScanShuffleFileCountThreshold;
+ return this;
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** No validation on table-scan shuffle file-count threshold allows negative values
The setter for `optimizer.table-scan-shuffle-file-count-threshold` accepts a raw `int`, so negative values are allowed. Since a negative threshold will effectively disable the shuffle (almost all file counts will be `>=` a negative number), this is likely unintended. Please add validation (e.g., `@Min(0)`/`@Min(1)` or an explicit check) to reject negative values.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/planner/optimizations/TestAddExchangesPlans.java:527-525` </location>
<code_context>
+ assertEquals(calculateAverageFileCountPerPartition(ImmutableList.of(fileCount(0), fileCount(10))), OptionalDouble.of(5));
+ }
+
@Test
public void testCalculateDistinctPartitionKeys()
{
@@ -692,6 +707,102 @@ public void testGetTableStatistics()
expected);
}
+ @Test
+ public void testGetTableStatisticsWithFileCount()
+ {
+ String partitionName1 = "p1=string1/p2=1234";
+ String partitionName2 = "p1=string2/p2=5678";
+ PartitionStatistics statistics1 = PartitionStatistics.builder()
+ .setBasicStatistics(new HiveBasicStatistics(OptionalLong.of(10), OptionalLong.of(1000), OptionalLong.of(5000), OptionalLong.empty()))
+ .build();
</code_context>
<issue_to_address>
**suggestion (testing):** Add planner tests for FILE_COUNT_BASED shuffle strategy to verify threshold and unknown-file-count behavior
The current tests only exercise the DISABLED and ALWAYS_ENABLED strategies; the FILE_COUNT_BASED branch in `AddExchanges.planTableScan` is still untested despite being core to this feature and depending on `TableStatistics.getFileCount()` and `getTableScanShuffleFileCountThreshold(session)`. Please add tests that cover:
- FILE_COUNT_BASED with file count below the threshold → round-robin exchange added above the table scan
- FILE_COUNT_BASED with file count at/above the threshold → no additional round-robin exchange
- FILE_COUNT_BASED with unknown file count (no fileCount in stats) → no additional round-robin exchange
These should assert on the distributed plan shape, similar to the existing planner tests, using controllable table statistics or a mocked metadata/stats provider to set specific fileCount values.
</issue_to_address>
### Comment 3
<location> `presto-hive/src/test/java/com/facebook/presto/hive/statistics/TestMetastoreHiveStatisticsProvider.java:289-287` </location>
<code_context>
+ assertEquals(calculateAverageFileCountPerPartition(ImmutableList.of(fileCount(0), fileCount(10))), OptionalDouble.of(5));
+ }
+
@Test
public void testCalculateDistinctPartitionKeys()
{
@@ -692,6 +707,102 @@ public void testGetTableStatistics()
expected);
}
+ @Test
+ public void testGetTableStatisticsWithFileCount()
+ {
+ String partitionName1 = "p1=string1/p2=1234";
+ String partitionName2 = "p1=string2/p2=5678";
+ PartitionStatistics statistics1 = PartitionStatistics.builder()
+ .setBasicStatistics(new HiveBasicStatistics(OptionalLong.of(10), OptionalLong.of(1000), OptionalLong.of(5000), OptionalLong.empty()))
+ .build();
</code_context>
<issue_to_address>
**suggestion (testing):** Add a test for getTableStatistics when only fileCount is present but row/size stats are missing
The implementation now returns a `TableStatistics` with only `fileCount` set when `calculateAverageRowsPerPartition` is empty (early return with the builder initialized from fileCount). To cover this path, please add a test where:
- Only fileCount is present in `PartitionStatistics` (row count and size unknown for all partitions), and
- `getTableStatistics` is invoked on those partitions.
The test should assert that:
- `tableStatistics.getFileCount()` is a concrete `Estimate` with the expected total (average * partition count), and
- `tableStatistics.getRowCount()` and `tableStatistics.getTotalSize()` are `Estimate.unknown()`.
This ensures fileCount-only statistics are correctly propagated for optimizer use.
Suggested implementation:
```java
// fileCount-only statistics: row count and size unknown
PartitionStatistics statistics1 = PartitionStatistics.builder()
.setBasicStatistics(new HiveBasicStatistics(
OptionalLong.of(10), // fileCount
OptionalLong.empty(), // rowCount unknown
OptionalLong.empty(), // inMemoryDataSize unknown
OptionalLong.empty())) // onDiskDataSize unknown
.build();
PartitionStatistics statistics2 = PartitionStatistics.builder()
.setBasicStatistics(new HiveBasicStatistics(
OptionalLong.of(20), // fileCount
OptionalLong.empty(), // rowCount unknown
OptionalLong.empty(), // inMemoryDataSize unknown
OptionalLong.empty())) // onDiskDataSize unknown
.build();
```
After constructing `statisticsProvider` and `session` in `testGetTableStatisticsWithFileCount()`, add code to:
1. Invoke `getTableStatistics` on the table/partitions under test to get `TableStatistics tableStatistics`.
2. Assert:
- `assertEquals(tableStatistics.getFileCount(), Estimate.of(30.0));`
(since the average fileCount per partition is (10 + 20) / 2 = 15 and there are 2 partitions, total = 30)
- `assertEquals(tableStatistics.getRowCount(), Estimate.unknown());`
- `assertEquals(tableStatistics.getTotalSize(), Estimate.unknown());`
Use the same table handle / partitioning setup and assertion style as the other `testGetTableStatistics*` methods in this file to keep consistency with the existing tests.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
We can just decrease the hive split size which will force the tablescan task to be run on more nodes? |
I think there are some limits on it but also the idea is to not do anything but shuffle right after scan |
kaikalur
left a comment
There was a problem hiding this comment.
Can we trust the stats?
|
@feilong-liu thanks for this change. There might be an issue with this approach when it comes to UPDATE/DELETE. In the current UPDATE/DELETE implementation, an So, it seems that we need to ensure that an |
I've tried this approach, even when I tune the split size to have way more splits than the number of workers, it just slightly increase the number of tasks, and still not fully utilizing all available workers. And to give some context, one of the most recent queries we fixed has latency decrease from 2 minutes to 12 seconds after manually addding shuffle above the table scan (and we have seen more before) |
7f0e0e0 to
0e55ef7
Compare
Thanks for checking, just updated the code so as to exclude update and delete query from this change |
For the queries you shared yesterday, it gives the right file count. We can revisit it if later we found gap in the stats |
0e55ef7 to
45f9d30
Compare
45f9d30 to
675be8c
Compare
|
A file is a concept in storage. The Presto runtime doesn't understand files, it understands and schedules splits--under the hood, connectors reason about files and translate them into splits. Instead of introducing the new concept of files to Presto, I'm wondering if the connector instead could internally reason about how those files may translate into splits (since it also owns the code to do that translation), and instead we pass in
@feilong-liu can you help to explain why this is? I'm just wondering if the proper fix is to investigate and fix that. |
We have used this approach super effectively at Meta for over 3 years. Time to sesssion prop and expose it to users. Also I think most of these are opportunistic optimizations so it's fine to have them so support teams can use when needed and keep them off by default. At some point we can make them cost-based. Splits is not the issue. If scan/filter/project do a ton of other things, it will still be slow as the parallelism is independent of the number of splits. We should definitely look at Split scheduling separately. |
|
@kaikalur I agree that this is likely an effective optimization, my primary feedback is using a different metric for reporting from the connector that doesn't rely on introducing a storage-related concept of files (I believe we could rely on
For my understanding, what could be an example of this where this issue would surface in a manner independent of the parallelism of the split--for example, a very wide table with a lot of predicates? Basically, because SOURCE scheduled stages can have a max parallelism of the entire cluster, vs. the hash partition count which is typically smaller than the cluster, I'm just confused how the exchange improves things independent of the scheduling fix proposed above. I'm just asking for better understanding. |
That has been a mystery to us as well :( Maybe @arhimondr knows better. The symptom we see is too few files->too few tasks. Also there are situations where the number of rows is small like LLM queries in several K rows so even if you set split size to tiny, it will produce too few splits and few tasks. One more situation we see is that heavily compressed tables (like 50-200x) with a ton of string fields produce too few splits because the table appears deceptively small on hive. |
|
@tdcmeehan Also want to clarify this is just a latency issue not compute efficiency. |
@tdcmeehan I tried to reduce the
This may not be trivial. The split generation logic is so complex, just to check the SystemSessionProperties, there are so many session properties related to split generations, e.g. I also want to add that, this optimization of adding exchange does not only help the few tasks due to few files problem, but also help skew problems. What we observed in some queries is that even when all workers are used, sometimes a few scan workers get much more work than others which a random shuffle immediately after scan help accelerate query execution a lot.
For this part, it's more about what statistics to use to identify problematic table scans. The metric to be used should be 1) accurate enough 2) easy to calculate 3) available all the time. |
Please note the system session properties you mention here are all governing split scheduling, not split creation. Split creation is completely handled internally by the connector, which is why I made the earlier point about files, we don't want to begin leaking the concept into core Presto. One way to avoid leaking the concept of files is to just abstract this into something more general, basically how much parallelism do we expect from this table? I chose splits because splits are the abstraction to represent the parallelism. Note that we don't need to get the estimate perfect or even close, it just needs to be high signal. Perhaps for your internal connector, it's enough to simply use a very basic heuristic where estimated splits is just # of files. Other connectors that have more accurate statistics could improve this. And to give your users a way to just force it, you could provide a connector session property which forces this metric to indicate low parallelism, regardless of what stats you have available. |
|
@feilong-liu maybe we simply make it a cost-based just like the other optimizations and cost is something we can provide using number of files in our connectors and others can do whatever is appropriate for their connectors |
|
606c3e1 to
9a25631
Compare
@tdcmeehan Thanks for the suggestions. |
|
@kaikalur @feilong-liu : Thanks for this optimization. Am curious what is the "table-scan-shuffle-parallelism-threshold" value you are using and how did you calibrate it ? |
tdcmeehan
left a comment
There was a problem hiding this comment.
Thanks for the updates @feilong-liu. The parallelism factor abstraction makes a lot of sense. I just have some nits and a question.
It also seems like this will be useful for single connection connectors like JDBC.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/AddExchanges.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
Outdated
Show resolved
Hide resolved
| private boolean utilizeUniquePropertyInQueryPlanning = true; | ||
| private String expressionOptimizerUsedInRowExpressionRewrite = ""; | ||
| private double tableScanShuffleParallelismThreshold = 0.1; | ||
| private ShuffleForTableScanStrategy tableScanShuffleStrategy = ShuffleForTableScanStrategy.DISABLED; |
There was a problem hiding this comment.
To avoid leaving this disabled by default, I'm wondering if you considered always making it cost based, but set a default estimate if not provided to 1, which would effectively leave the optimization disabled until connectors modify their code to provide a better estimate?
Please also add documentation for these session properties and config properties.
There was a problem hiding this comment.
I prefer to keep it more explicit, and it's also consistent with other existing optimization config.
Added documentation in the corresponding rst files.
9a25631 to
6e5f1db
Compare
This scan parallelism is hard to estimate and also connector specific, for our internal connector our plan is to use file count as an approximation. |
steveburnett
left a comment
There was a problem hiding this comment.
Thanks for the doc! It looks great. Just two requests.
6e5f1db to
c26ac6c
Compare
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull updated branch, new local doc build. Looks good, thanks!
…hange (#26943) ## Description This is targeting queries like the following: Scan -> Remote Project -> exchange -> ... -> output So a problem we see is that, when the number of files is small for the scan, the number of tasks for the leaf fragment is small (even after tuning the split size, I still see this problem). What we observe is that, remote project which calls remote service is affected more by the few tasks problem. We have observed queries whose latency decrease from 2 minutes to 12 seconds after we manually add shuffle above the table scan. In order to solve the problem, I first added #26941 which adds a shuffle above the table scan. However this is not enough, as the remote project will still be pushed below the exchange node by the `PushProjectionThroughExchange` rule. So to resolve this problem, I added options to not pushdown projections below exchange in certain cases. There are four options: * ALWAYS_PUSHDOWN: current behavior * SKIP_IF_TABLESCAN: if the plan fragment below the exchange is a simple scan fragment, i.e. only have project and filter in addition to the scan node, we may not see much benefit in projection pushdown, hence skip. * SKIP_IF_REMOTE_PROJECTION: do not pushdown remote project * SKIP_IF_REMOTE_PROJECTION_ON_TABLESCAN: do not pushdown only when the above two conditions both satisfied We may only need `SKIP_IF_REMOTE_PROJECTION_ON_TABLESCAN` for the #26941, but I want to get more flexibility here hence added the other two options too. ## Motivation and Context As in description ## Impact We have observed queries whose latency decrease from 2 minutes to 12 seconds after we manually add shuffle above the table scan. ## Test Plan Unit tests. Also have local end to end test with production queries. ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. - [ ] If adding new dependencies, verified they have an [OpenSSF Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or higher (or obtained explicit TSC approval for lower scores). ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Add options to skip projection pushdown through exchange rule ```
Description
The number of tasks of a leaf fragment is proportional to the number of files in the table. When the number of files in a table is small, we want to add an exchange node on top of the table scan to increase parallelism.
Motivation and Context
In description
Impact
We have observed queries whose latency decrease from 2 minutes to 12 seconds after we manually add shuffle above the table scan.
Test Plan
Unit tests
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.