-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Occasional deadlock issue in tests that delete tasks #1027
Comments
Did some more digging on this one, because I am using the async console/utility task enabled for OS_printf, this should protect against any issues with the OS_printf calls. In this case the issue is a little deeper:
So in the end, this particular sequence of events is really a product of the deferred thread cancellation that exists on POSIX, and the fact that the It also wouldn't happen if OS_CONFIG_DEBUG_PRINTF is disabled, and also related to the fact that the OS_TaskGetId doesn't work once deletion is pending. Basically if any one of these is changed, the issue doesn't happen. |
Resolves two related issues: - OS_TaskGetId does not return a valid value for tasks where cancellation is pending, but they are still running. This in turn is likely to trigger other (bogus) debug checks which invoke OS_DEBUG and in turn do console writes. - The console write itself is a cancellation point, which is now done while holding a BSP mutex. If canceled here, then the mutex is not released. Solution is in two parts: - OS_TaskGetId should return the task ID it knows about, regardless of whether the task is pending cancellation or not. - Defer cancellation of the task while the BSP is locked, ensure it reaches the unlock, then restore the previous cancel state.
Resolves two related issues: - OS_TaskGetId does not return a valid value for tasks where cancellation is pending, but they are still running. This in turn is likely to trigger other (bogus) debug checks which invoke OS_DEBUG and in turn do console writes. - The console write itself is a cancellation point, which is now done while holding a BSP mutex. If canceled here, then the mutex is not released. Solution is in two parts: - OS_TaskGetId should return the task ID it knows about, regardless of whether the task is pending cancellation or not. - Defer cancellation of the task while the BSP is locked, ensure it reaches the unlock, then restore the previous cancel state.
Fix #1027, defer cancellation when BSP locked
Resolves two related issues: - OS_TaskGetId does not return a valid value for tasks where cancellation is pending, but they are still running. This in turn is likely to trigger other (bogus) debug checks which invoke OS_DEBUG and in turn do console writes. - The console write itself is a cancellation point, which is now done while holding a BSP mutex. If canceled here, then the mutex is not released. Solution is in two parts: - OS_TaskGetId should return the task ID it knows about, regardless of whether the task is pending cancellation or not. - Defer cancellation of the task while the BSP is locked, ensure it reaches the unlock, then restore the previous cancel state.
Remove reference to `CFE_MSG_Message_t` in doxygen comments
cFE Integration Candidate: 2020-11-24
Describe the bug
When running the unit tests repeatedly, occasionally some tests are getting into a deadlock. These tests are ones that:
In the event that the sub-task was in the midst of an OS_printf() call when OS_TaskDelete was invoked, the underlying BSP lock will not get released.
Observed in mutex-test, but others may have similar patterns.
To Reproduce
Run mutex-test repeatedly, may deadlock at some runs. (it is a race condition, not 100% reproducible)
Expected behavior
Should run consistently.
System observed on:
Ubuntu
Additional context
This really just a symptom of a generic/known issue with OS_TaskDelete, in that other resources held by that task are not necessarily tracked or freed, depending on what it was doing at the time it was deleted.
Linux/Pthreads does have a workaround but the issue is likely to exist on all OS's
Reporter Info
Joseph Hickey, Vantage Systems, Inc.
The text was updated successfully, but these errors were encountered: