-
Notifications
You must be signed in to change notification settings - Fork 616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dispatcher: Always send task updates to agent when state <= ASSIGNED #1295
Conversation
The use of TaskEqualStable in the dispatcher to avoid sending unnecessary task updates is problematic for global services. These tasks only move to the ASSIGNED state once the scheduler confirms resources are available. Since state changes are not considered relevant changes and do not trigger assignment set updates, the dispatcher may never send this update, and tasks can become stuck in the ASSIGNED state. If a task is in the state ASSIGNED or below, always count a state change as a modification, even if nothing else has changed. Signed-off-by: Aaron Lehmann <[email protected]>
Current coverage is 55.08% (diff: 100%)@@ master #1295 diff @@
==========================================
Files 78 78
Lines 12467 12467
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 6844 6867 +23
+ Misses 4674 4658 -16
+ Partials 949 942 -7
|
LGTM |
1 similar comment
LGTM |
Soooo ... It made sense to create an equality package because those two comparisons will always be the same, right? :P |
@@ -595,7 +595,10 @@ func (d *Dispatcher) Tasks(r *api.TasksRequest, stream api.Dispatcher_TasksServe | |||
modificationCnt++ | |||
case state.EventUpdateTask: | |||
if oldTask, exists := tasksMap[v.Task.ID]; exists { | |||
if equality.TasksEqualStable(oldTask, v.Task) { | |||
// States ASSIGNED and below are set by the orchestrator/scheduler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the task is not ASSIGNED
, how can we send it to an agent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the task is not ASSIGNED, how can we send it to an agent?
This is a fair point. Filtering out these tasks would be a behavioral change, though. The aim of this PR is just to make sure the agent is notified when a task transitions into ASSIGNED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not what I mean. If a task is not assigned, we cannot send it to a node, since it doesn't have an assignment.
I mean, this should be impossible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task.NodeID
is set before the task reaches the ASSIGNED state for global services.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
task.NodeID is set before the task reaches the ASSIGNED state for global services.
So, if the NodeID
is set, the task should be assigned.
We can't break these kinds of invariants, even if it means having a second field indicating that a task was created for assignment to a specific node. It makes making logical decisions hard.
The use of TaskEqualStable in the dispatcher to avoid sending
unnecessary task updates is problematic for global services. These tasks
only move to the ASSIGNED state once the scheduler confirms resources
are available. Since state changes are not considered relevant changes
and do not trigger assignment set updates, the dispatcher may never send
this update, and tasks can become stuck in the ASSIGNED state.
If a task is in the state ASSIGNED or below, always count a state change
as a modification, even if nothing else has changed.
Fixes #1291
I know this isn't the cleanest solution, but it seems like the simplest and lowest risk approach for 1.12.1. We can try to find something better for 1.13.
cc @dongluochen @aluzzardi