Fix task service shutdown, errors, and task handling #736
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The existing code that detected a long running tasks (at 2 times the runner maximum duration) would simple update the task in the database to
pending
and available. This inadvertently caused the same task to be run again while the long running task was still running. This yielded a number of errors. The code was updated to cancel the context on long running tasks, which would force long running task to actually stop cleanly.When the current task queue implementation was signaled to shutdown it would cancel the common context for the worker routines and the manager routine. The manager routine is responsible for taking finished tasks and storing the task in the database. Upon shutdown, however, many times the manager routine would exit before one or more of the work routines and so those tasks would not be updated in the database (and, thus, left permanently in the
running
state - this was why the "unstick tasks" code was written). The code was updated to separate out two cancel functions and two wait groups (one for workers, one for manager) and allow the manager to fully handle all of the worker tasks (saving them to the database) before itself shutdown. Unless there is a crash, there should no longer be foreverrunning
tasks.The task manager did not protect critical database operations (such as saving the final state of a canceled task) from a context cancel. The code was updated to protect those critical database operations from a cancel.
Also, some smaller changes to the task-related code.
NOTE: This is a part of the overall Dexcom work which will be collected into a final PR for final approval. However, since the work covers a number of different issues, I broke up the development into multiple smaller and more focused PRs. Once all of the smaller PRs are approved, I'll create a final overall PR for approval that will eventually be tested and deployed. (The smaller PRs will not be tested nor deployed.)