Skip to content

Allow different scheduling affinity for different sections of a file#22563

Merged
arhimondr merged 1 commit intoprestodb:masterfrom
arhimondr:preferred-node-block-selection
Apr 23, 2024
Merged

Allow different scheduling affinity for different sections of a file#22563
arhimondr merged 1 commit intoprestodb:masterfrom
arhimondr:preferred-node-block-selection

Conversation

@arhimondr
Copy link
Member

@arhimondr arhimondr commented Apr 18, 2024

Description

Increase granularity of affinity node selection (from per file - to per file section)

Motivation and Context

Currently node affinity is assigned for the entire file as a whole. When files are large it may create a significant scheduling skew.

Impact

Splitting file into sections and assigning affinity independently should help mitigate the skew

Test Plan

Unit test

== RELEASE NOTES ==

Hive Connector Changes
* Improve affinity scheduling granularity from a file to a section of a file by adding a `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property. The default file size is 256MB. :pr:`22563`

@arhimondr arhimondr requested a review from a team as a code owner April 18, 2024 22:34
@arhimondr arhimondr force-pushed the preferred-node-block-selection branch from 8a693cd to 47cb31d Compare April 19, 2024 15:49
Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an integration test that verifies this all works when scheduling by section?

Duration::toString),
dataSizeSessionProperty(
AFFINITY_SCHEDULING_FILE_SECTION_SIZE,
"Size of file section for affinity scheduling. Each section of a give size may be assigned an independent affinity preference.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of a give size --> of a given size
Also, is that true? If there are five different sizes there can be five different affinity preferences? How does that work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does sound confusing. Maybe you can help me with a concise and more intuitive description.

Let me try to elaborate more on what is going on here.

Currently preferred nodes for affinity scheduling are assigned based on the file path. However it is hard to guarantee the files being similar is size and relatively small. Hence we are running into a skew.

Here's a real example from one of our production systems:

One of the input tables consist of multiple partitions. Each partition consist of a single file (~30GB in size). Queries usually read a single partition at a time, and multiple queries are submitted hitting a single partition at the same time.

Now since there's only a single file per partition, the partition is always read by a single "preferred" node creating a major bottleneck.

The idea is to increase granularity by assigning different preferred nodes to different sections of a file (256MB by default). So a single file can be cached and served by more than a single node.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I suspect right here you should simply delete "Each section of a give size may be assigned an independent affinity preference." "Size of file section for affinity scheduling" is all that's needed to describe this property. A detailed description of affinity scheduling can be added elsewhere in the prose docs. It's perhaps worthy of an entire page of its own.

NikhilCollooru
NikhilCollooru previously approved these changes Apr 21, 2024
Copy link
Contributor

@NikhilCollooru NikhilCollooru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NikhilCollooru
Copy link
Contributor

Can we also add to the release note about the new config, session property ?

@steveburnett
Copy link
Contributor

Can we also add to the release note about the new config, session property ?

If this PR adds a new config or session property, please add documentation for the new property.

@github-actions
Copy link

github-actions bot commented Apr 22, 2024

Codenotify: Notifying subscribers in CODENOTIFY files for diff 656d95a...0f300d9.

Notify File(s)
@steveburnett presto-docs/src/main/sphinx/cache/local.rst

@arhimondr
Copy link
Member Author

Updated documentation and release notes.

@steveburnett, @NikhilCollooru could you please take an another look?

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the documentation! I made a couple of rephrasing suggestions, let me know what you think.

I noticed in the release note draft the presence of both a hive.affinity-scheduling-file-section-size configuration property and a affinity_scheduling_file_section_size session property. This doc addition addresses the config property but there's no mention of the session property. Is that sufficient?

@steveburnett
Copy link
Contributor

Following the Release Notes Guidelines Order of Changes and Phrasing, suggest the following revision for consideration for the release note entry:

== RELEASE NOTES ==

Hive Connector Changes
* Improve affinity scheduling granularity from a file to a section of a file by adding a `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property. The default file size is 256MB. :pr:`22563`

or

== RELEASE NOTES ==

Hive Connector Changes
* Add `hive.affinity-scheduling-file-section-size` configuration property and `affinity_scheduling_file_section_size` session property to improve affinity scheduling granularity from a file to a section of a file. The default file size is 256MB. :pr:`22563`

depending on the relative importance of the feature to the property, as mentioned in Phrasing.

@arhimondr
Copy link
Member Author

@steveburnett

This doc addition addresses the config property but there's no mention of the session property. Is that sufficient?

I would probably not expect people to change this value much a certain session. But yeah, I think it might be worth adding for completeness. Updated.

@arhimondr arhimondr force-pushed the preferred-node-block-selection branch from 0046776 to 0f300d9 Compare April 23, 2024 00:20
@arhimondr
Copy link
Member Author

Thanks for the review @steveburnett . All great suggestions. Updated.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)
Pull updated branch, new local build, everything looks good. Thanks!

@arhimondr arhimondr merged commit ff5282e into prestodb:master Apr 23, 2024
@arhimondr arhimondr deleted the preferred-node-block-selection branch April 23, 2024 16:45
@wanglinsong wanglinsong mentioned this pull request Jun 25, 2024
36 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants