Add timeout capability to TaskCollection#1244
Conversation
Co-authored-by: Philipp Grete <pgrete@hs.uni-hamburg.de>
Co-authored-by: Philipp Grete <pgrete@hs.uni-hamburg.de>
Yes, it is ready for review. |
Yurlungur
left a comment
There was a problem hiding this comment.
Given that the default timeout is longer than any code will ever run, I think it's probably worth documenting this mechanism in the docs.
Also should the default timeout time be something shorter? Like a week or a day or something? Short enough that it might actually trigger, but long enough that nobody will hit it accidentally?
Approving now, though, as these are minor nitpicks
| task_collection_timeout_in_seconds(pin->GetOrAddInteger( | ||
| "parthenon/mesh", "task_collection_timeout_in_seconds", 60 * 60 * 24 * 365)), |
There was a problem hiding this comment.
ha, yeah, I wanted a timeout that no one would hit and I thought a year was funny.
We should probably have a discussion about what people think is a good default choice. I will make an issue.
|
We should just pick a timeout so this can get merged and we can change it later. |
pgrete
left a comment
There was a problem hiding this comment.
I just fixed the param doc and two typos.
After quick clarification of the question, I think we can merge (assuming the tests pass now)
| region[0].AddTask(TaskID(0), []() { return TaskStatus::incomplete; }); | ||
| const std::size_t timeout_in_seconds = 4; | ||
| tc.Execute(timeout_in_seconds); | ||
| REQUIRE(true); |
There was a problem hiding this comment.
I missed this yesterday.
How does this unit test check the timeout capabilities?
I'd have imagined sth like a task waiting for more than the timeout and and then check for the timeout return.
Given subtle issues in the past we had with tasking/threading, I think that it's important to have a test for this new feature.
There was a problem hiding this comment.
If the task list timeout didn't work, tc.Execute() would run forever since the task always says it is incomplete and the task would be repeatedly called (similar to what would happen for a TryReceive call that never gets data from a neighbor). I changed this to get the status of the TaskCollecion and make sure the task collection failed.
Remember that when we compile with SerialPool, a task that runs for a long time without returning will not trigger the timeout. The only way to have that behavior I think is to spin off another thread (similar to what happens when we compile with ThreadPool or in the Athena++ code you linked to the other day).
There was a problem hiding this comment.
Ah, right, thanks for clarifying.
PR Summary
PR Checklist