Use NodeAssignmentStats::getQueuedSplitsWeightForStage in task-based split scheduling#26124
Merged
spershin merged 1 commit intoprestodb:masterfrom Sep 23, 2025
Merged
Conversation
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideCorrects split weight calculations in task-based split scheduling by switching to NodeAssignmentStats.getQueuedSplitsWeightForStage to include scheduled but unacknowledged splits and cleans up an unused test variable. Sequence diagram for split weight calculation in task-based split schedulingsequenceDiagram
participant "SimpleNodeSelector"
participant "NodeAssignmentStats"
participant "RemoteTask"
participant "TaskStatus"
participant "InternalNode"
"SimpleNodeSelector" ->> "InternalNode": getNodeIdentifier()
"SimpleNodeSelector" ->> "RemoteTask": get task for node
alt No RemoteTask for node
"SimpleNodeSelector" ->> "NodeAssignmentStats": getQueuedSplitsWeightForStage(node)
else RemoteTask exists
"SimpleNodeSelector" ->> "RemoteTask": getTaskStatus()
"SimpleNodeSelector" ->> "TaskStatus": getRunningPartitionedSplitsWeight()
"SimpleNodeSelector" ->> "NodeAssignmentStats": getQueuedSplitsWeightForStage(node)
end
Class diagram for NodeAssignmentStats and related split weight methodsclassDiagram
class NodeAssignmentStats {
+getQueuedSplitsWeightForStage(node)
+getAssignedSplitsWeightForStage(node)
}
class TaskStatus {
+getQueuedPartitionedSplitsWeight()
+getRunningPartitionedSplitsWeight()
}
class RemoteTask {
+getTaskStatus()
}
class InternalNode {
+getNodeIdentifier()
}
NodeAssignmentStats --> InternalNode
RemoteTask --> TaskStatus
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Contributor
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- Consider renaming getQueuedSplitsWeightForStage (and related methods) to something more descriptive—e.g. reflecting that it includes scheduled and unacknowledged splits—to reduce confusion in the scheduling logic.
- Add a unit test that explicitly verifies the node load provider now sums running and queued split weights correctly, so this subtle behavior is protected from future regressions.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider renaming getQueuedSplitsWeightForStage (and related methods) to something more descriptive—e.g. reflecting that it includes scheduled and unacknowledged splits—to reduce confusion in the scheduling logic.
- Add a unit test that explicitly verifies the node load provider now sums running and queued split weights correctly, so this subtle behavior is protected from future regressions.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
arhimondr
approved these changes
Sep 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
In the initial PR which introduces "task-based split scheduling" in the last update I have replaced
assignmentStats.getQueuedSplitsWeightForStage(node)calls byassignmentStats.getAssignedSplitsWeightForStage(node)thinking that we get queued splits from theTaskStatusand don't need extra queued from theNodeAssignmentStats.But as it happened, while working on the issue of HttpRemoteTaskWithEventLoop, I noticed that the
getQueuedSplitsWeightForStageis a misnomer and actually returns not queued, but scheduled and unacknowled splits plus the splits what we assigned in the current split scheduling run.While
getAssignedSplitsWeightForStageonly returns splits what we assigned in the current split scheduling run.This PR fixes it.
Actually, it would be good to follow up this with a PR doing some renaming in the NodeAssignmentStats and HttpRemoteTask classes as we have the same things called different names and different things called the same names and also some methods returning A + B not reflecting this in their names.
Hard to follow and reason about the code.