Increase dynamic filter limits for fault tolerant execution#16875
Increase dynamic filter limits for fault tolerant execution#16875arhimondr wants to merge 1 commit intotrinodb:masterfrom
Conversation
In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator. Increasing the limit is safe as the memory used by dynamic filters is tracked.
|
Do we have benchmark results showing improvement ? |
|
@raunaqmorarka This problem was discovered when running TPC/DS benchmarks on 10TB partitioned schema. I've noticed that the CPU is much higher with FTE than with streaming (+20-25%). I started looking more into it, and I realized that dynamic filters are very often not available in FTE. After increasing the limits CPU went down to the level close to streaming. Here's a detailed comparison: |
| public void applyFaultTolerantExecutionDefaults() | ||
| { | ||
| smallPartitionedMaxDistinctValuesPerDriver = 100_000; | ||
| smallPartitionedMaxSizePerDriver = DataSize.of(100, KILOBYTE); |
There was a problem hiding this comment.
Why not increase *RangeRowLimitPerDriver as well ?
That limit could be kept at 2x the distinct values limit.
|
Current limits were specifically tuned to get best results for 1TB partitioned scale. Probably the streaming mode would also improve on 10TB scale if we drastically upped the limits. |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
@arhimondr @raunaqmorarka can we close this one given #17130? |
We need to re-run the FTE sf10k benchmark to find out if the increased limits are sufficient. |
|
@raunaqmorarka Working on it |
|
@raunaqmorarka I rerun TPC-DS 10000 and I still see queries that would benefit from higher limits. Opened a new PR: #17831 |
Description
In fault tolerant executions dynamic filters are collected before shuffle resulting in higher number of distinct values per driver / operator.
Increasing the limit is safe as the memory used by dynamic filters is tracked.
Additional context and related issues
#16104
#16110
Release notes
(X) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: