-
Notifications
You must be signed in to change notification settings - Fork 619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure task status is reported before cleanup #705
Changes from 1 commit
496df86
0bd64f3
57b21c7
eb98724
c9a1295
3d154da
4477e30
d113ab4
61371b4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,8 @@ import ( | |
|
||
const ( | ||
steadyStateTaskVerifyInterval = 10 * time.Minute | ||
stoppedSentWaitInterval = 30 * time.Second | ||
maxStoppedWaitTimes = 72 * time.Hour / stoppedSentWaitInterval | ||
) | ||
|
||
type acsTaskUpdate struct { | ||
|
@@ -474,6 +476,9 @@ func (mtask *managedTask) time() ttime.Time { | |
return mtask._time | ||
} | ||
|
||
var _stoppedSentWaitInterval = stoppedSentWaitInterval | ||
var _maxStoppedWaitTimes = int(maxStoppedWaitTimes) | ||
|
||
func (mtask *managedTask) cleanupTask(taskStoppedDuration time.Duration) { | ||
cleanupTimeDuration := mtask.GetKnownStatusTime().Add(taskStoppedDuration).Sub(ttime.Now()) | ||
// There is a potential deadlock here if cleanupTime is negative. Ignore the computed | ||
|
@@ -489,8 +494,27 @@ func (mtask *managedTask) cleanupTask(taskStoppedDuration time.Duration) { | |
cleanupTimeBool <- true | ||
close(cleanupTimeBool) | ||
}() | ||
// wait for the cleanup time to elapse, signalled by cleanupTimeBool | ||
for !mtask.waitEvent(cleanupTimeBool) { | ||
} | ||
stoppedSentBool := make(chan bool) | ||
go func() { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the new written go routine, shall we start enforcing the rule of 'always pass "Context" to it', so that it can provide simple "cancelation"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's no I don't think that always passing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without context, there is no way to stop this potential "long run" NEW go routine (which could run for at worst 72 hours). In case if there is another kind of state mismatch between agent and backend, backend thinks this instance is able to launch new task but agent is holding those "long run" cleanup GO routines. Is it possible, that agent could run out of memory ...? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant "backend service" keep on starting a new task, and these task get stuck in "cleanup" state for 72 hours..., eventually will agent run out of memory? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the desired behavior. A successful submission of task state will result in this goroutine exiting. Unsuccessful submissions will delay until success or the timeout or 72 hours, whichever is sooner. There is no use-case to stop the goroutine other than this. |
||
for i := 0; i < _maxStoppedWaitTimes; i++ { | ||
// ensure that we block until api.TaskStopped is actually sent | ||
sentStatus := mtask.GetSentStatus() | ||
if sentStatus >= api.TaskStopped { | ||
stoppedSentBool <- true | ||
close(stoppedSentBool) | ||
return | ||
} | ||
seelog.Warnf("Blocking cleanup for task %v until the task has been reported stopped. SentStatus: %v (%d/%d)", mtask, sentStatus, i, _maxStoppedWaitTimes) | ||
mtask._time.Sleep(_stoppedSentWaitInterval) | ||
} | ||
}() | ||
// wait for api.TaskStopped to be sent | ||
for !mtask.waitEvent(stoppedSentBool) { | ||
} | ||
|
||
log.Info("Cleaning up task's containers and data", "task", mtask.Task) | ||
|
||
// For the duration of this, simply discard any task events; this ensures the | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be broken out into a named method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done