Skip to content

Support limiting the number of unacknowledged source splits per task#15761

Merged
shixuan-fan merged 3 commits intoprestodb:masterfrom
pettyjamesm:limit-unacknowledged-task-splits
Mar 12, 2021
Merged

Support limiting the number of unacknowledged source splits per task#15761
shixuan-fan merged 3 commits intoprestodb:masterfrom
pettyjamesm:limit-unacknowledged-task-splits

Conversation

@pettyjamesm
Copy link
Copy Markdown
Contributor

@pettyjamesm pettyjamesm commented Mar 1, 2021

Adds support for tracking (and optionally limiting) the number of splits that are not yet acknowledged for a given task. Previously, the only two scheduler supported limits were the number of total splits across all tasks on the worker node and the number of splits queued for a given task which included:

  • splits received by the worker but not yet running as of the last task status update
  • splits queued in the coordinator local RemoteTask but not yet sent to the task
  • splits assigned in the current scheduling batch but not yet added to the coordinator-local RemoteTask

The new max_unacknowledged_splits_per_task session property enables setting a limit on the number of splits that are either:

  • queued on the coordinator-local RemoteTask but not yet sent or at least not confirmed to have been received by the worker
  • assigned to the task in the current scheduling batch but not yet added to the coordinator-local RemoteTask

This limit enforcement takes precedence over both of the other existing split limit configurations and is designed to prevent large task update requests that might cause a query to fail.

== RELEASE NOTES ==

General Changes
* Adds support for configuring the maximum number of unacknowledged source splits per task. This can be enabled by setting the ``max_unacknowledged_splits_per_task`` session property or ``node-scheduler.max-unacknowledged-splits-per-task`` configuration property.

Copy link
Copy Markdown
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Refactor NodeAssignmentStats to simplify the tracked representation" LGTM

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to have getQueuedSplitCount return queuedSplitCount + assignedSplits, but I don't have a better name in mind.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the structure here is a little quirky (but is fundamentally the same as before, just with 1 fewer HashMap). Any split assigned within the current batch is "effectively queued" even though it hasn't made its way to the task yet. I couldn't think of a name that worked any better at making that piece clear.

Copy link
Copy Markdown
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Avoid unnecessary node list sorting in SimpleNodeSelector"

Copy link
Copy Markdown
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Support limiting the number of unacknowledged splits per task"

LGTM, mostly questions for my own understanding

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is the rationale behind using 500 as the default value? 🤔

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an experimental setup I was running with differently (very large) split queue depth configurations and small files (aka: cheap splits that are processed very quickly once delivered), 500 seemed to be about the point of diminishing returns in terms of setting this value higher.

In the original iteration, I had this default to Integer.MAX_VALUE since I don't aim to affect any of the default scheduling behavior with this change, but rather just want to add a safety net for deeper split queues blowing up task update sizes. Dain convinced me to pick a more "reasonable" value that was still above the ceiling of having any affect on the current default configs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a dumb question but I'm curious to understand why we need to check this before listeners are registered?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible for task initial splits to already exceed the pending split limit inside of the constructor, so setting the "no space" flag appropriately at initialization time had some effect but... now that I'm looking I can't see what the effect was. It might have had some relationship to the MockRemoteTask implementation in test and maybe isn't strictly required here.

@jainxrohit jainxrohit self-requested a review March 11, 2021 19:54
@pettyjamesm pettyjamesm force-pushed the limit-unacknowledged-task-splits branch from 88a5b11 to 545d8b9 Compare March 11, 2021 19:55
Adds support for tracking (and optionally limiting) the number of
splits that are not yet acknowledged for a given task. Previously,
the only two scheduler supported limits were the number of total
splits across all tasks on the worker node and the number of splits
queued for a given task which included:
- splits received by the worker but not yet running as of the last
  task status update
- splits queued in the coordinator local RemoteTask but not yet
  sent to the task
- splits assigned in the current scheduling run but not yet added
  to the coordinator local RemoteTask

The new max_unacknowledged_splits_per_task session property enables
setting a limit on the number of splits that are either:
- queued on the coordinator-local RemoteTask but not yet sent or
  at least confirmed to have been received by the worker
- assigned to the task in the current scheduling batch but not yet
  added to the coordinator-local RemoteTask

This limit enforcement takes precedence over both of the other
existing split limit configurations and is designed to prevent large
task update requests that might cause a query to fail.
@pettyjamesm pettyjamesm force-pushed the limit-unacknowledged-task-splits branch from 545d8b9 to 88049f2 Compare March 11, 2021 20:40
Copy link
Copy Markdown
Contributor

@shixuan-fan shixuan-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shixuan-fan
Copy link
Copy Markdown
Contributor

cc @NikhilCollooru This might touch some of the codes that you are currently working on. I don't think there is conflict but want to give you a heads-up.

@shixuan-fan shixuan-fan merged commit 9a83668 into prestodb:master Mar 12, 2021
@pettyjamesm pettyjamesm deleted the limit-unacknowledged-task-splits branch March 12, 2021 20:11
@varungajjala varungajjala mentioned this pull request Mar 23, 2021
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants