-
Notifications
You must be signed in to change notification settings - Fork 5.5k
[Iceberg] Enable affinity scheduling on file sections #24598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Iceberg] Enable affinity scheduling on file sections #24598
Conversation
ac6fce7 to
6dfb1f2
Compare
bc804bc to
484408b
Compare
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull branch, local doc build, looks good. Thanks!
yingsu00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. One minor correction: "splits not being scheduled to enough nodes" : It's not necessarily they were not scheduled to enough nodes, but in general it had more skew than Hive, even when the splits were scheduled to the same number of nodes. Scheduling to less nodes happened non-determistically when I ran the queries multiple times. More than half times they did were scheduled to all nodes, but even in such cases the load was not as balanced as Hive.
...rg/src/main/java/com/facebook/presto/iceberg/equalitydeletes/EqualityDeletesSplitSource.java
Show resolved
Hide resolved
484408b to
fa2b10e
Compare
|
Thanks for the feedback @yingsu00 - I updated the PR description to be a bit more accurate |
fa2b10e to
7cf8789
Compare
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplitSource.java
Outdated
Show resolved
Hide resolved
presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergSplitManager.java
Show resolved
Hide resolved
7cf8789 to
d4ae7ad
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change, lgtm. A couple of little nits.
| Set to 0 to use the value in each Iceberg table's | ||
| ``read.split.target-size`` property. | ||
| ``iceberg.affinity_scheduling_file_section_size`` When the ``node_selection_strategy`` or | ||
| ``hive.node-selection-strategy`` property is set to ``SOFT_AFFINITY``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the property's name be iceberg.node-selection-strategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way we register the config, I believe it is still hive.node-selection-strategy. The config comes from HiveCommonClientConfig.java which is bound in HiveCommonModule.java. The injector doesn't register a prefix with the config, so it uses the same value as in the *Config class which is hive.node-selection-strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right. Perhaps in future we should consider binding separate prefixes to the configs in presto-hive-common in each lake house connector's own Module.
presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergSplit.java
Show resolved
Hide resolved
d4ae7ad to
973a860
Compare
This change moves the affinity scheduling file section size configuration from HiveClientConfig and HiveSessionProperties to HiveCommonClientConfig and HiveCommonSessionProperties so that the iceberg connector can benefit from this scheduling strategy when tables have a small number of files but a large number of splits.
973a860 to
5f8f14e
Compare
| Set to 0 to use the value in each Iceberg table's | ||
| ``read.split.target-size`` property. | ||
| ``iceberg.affinity_scheduling_file_section_size`` When the ``node_selection_strategy`` or | ||
| ``hive.node-selection-strategy`` property is set to ``SOFT_AFFINITY``, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, you are right. Perhaps in future we should consider binding separate prefixes to the configs in presto-hive-common in each lake house connector's own Module.
Description
This change moves the affinity scheduling file section size
configuration from HiveClientConfig and HiveSessionProperties
to HiveCommonClientConfig and HiveCommonSessionProperties so
that the iceberg connector can benefit from this scheduling
strategy when tables have a small number of files but a large
number of splits.
Motivation and Context
On tables with a small number of large files, queries may perform poorly due to the distribution in split scheduling being skewed. This is more likely to occur when there is a limited number of values being hashed to determine the preferred nodes to schedule to. By changing the identifier used for selecting the preferred nodes we increase the probability that the splits are scheduled more evenly across the cluster.
Impact
Test Plan
Added a unit test to verify that the number of unique identifiers changes as we scale up the file section size
Contributor checklist
Release Notes