Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFE_ES_CleanupTaskResources attempts to delete child task twice #684

Closed
johnphamngc opened this issue May 6, 2020 · 0 comments · Fixed by #685 or #712
Closed

CFE_ES_CleanupTaskResources attempts to delete child task twice #684

johnphamngc opened this issue May 6, 2020 · 0 comments · Fixed by #685 or #712
Assignees
Labels
Milestone

Comments

@johnphamngc
Copy link

Describe the bug
CFE_ES_CleanupTaskResources appears to attempt to delete a child task twice, first via CFE_ES_CleanupObjectCallback and subsequently via a direct call to OS_TaskDelete

To Reproduce
On Linux, call CFE_ES_RESTART_APP on an app that has a child task, such as CI, FM, or HS
In VxWorks, can induce also by inducing an exception causing an application restart

Expected behavior
App should exit and restart cleanly

Code snips
See CFE_ES_CleanupTaskResources

System observed on:

  • PC, SP0-s
  • OS: Linux, VxWorks
  • Versions [e.g. cFE 6.7.12, OSAL 5.0.11, PSP 1.4.8, CI, FM, HS]

Additional context
My colleague Alan Wang attempted the following:

I purposely caused a program exception to each of the following tasks (built from n cFE Version 6.7.6.0)  to see whether CFE can restart them or not.
 
SCH,   CI,   TO,  HS,   HK,  SC,   DS,   LC,  FM,  MD,  MM,   and CS.
 
CFE successfully (at least on the surface) started all of them except CI, HS, and FM.
 
-> 
program
Exception current instruction address: 0x05266828
Machine Status Register: 0x02029230
Condition Register: 0x24000882
Exception Syndrome Register: 0x08000000
Task: 0x53a6888 "CI"
0x53a6888 (CI): task 0x53a6888 has had a failure and has been stopped.
0x53a6888 (CI): The task has been terminated because it triggered an exception that raised the signal 4.
1980-012-14:16:59.59263 CFE_ES_RestartApp: Restart Application CI Initiated
1980-012-14:17:04.43385 CFE_ES_CleanUpApp: CleanUpTaskResources for Task ID:10 returned Error: 0xC4000026
EVS Port1 66/1/CFE_ES 41: Restart Application CI Failed: CleanUpApp Error 0xC4000023.
 
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
 
-> 
program
Exception current instruction address: 0x054f12cc
Machine Status Register: 0x02029230
Condition Register: 0x24000888
Exception Syndrome Register: 0x08000000
Task: 0x53a7110 "HS"
0x53a7110 (HS): task 0x53a7110 has had a failure and has been stopped.
0x53a7110 (HS): The task has been terminated because it triggered an exception that raised the signal 4.
1980-012-14:11:48.58489 CFE_ES_RestartApp: Restart Application HS Initiated
1980-012-14:11:54.84338 CFE_ES_CleanUpApp: CleanUpTaskResources for Task ID:15 returned Error: 0xC4000026
EVS Port1 66/1/CFE_ES 41: Restart Application HS Failed: CleanUpApp Error 0xC4000023.
 
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
 
-> 
program
Exception current instruction address: 0x0553d814
Machine Status Register: 0x02029230
Condition Register: 0x24000888
Exception Syndrome Register: 0x08000000
Task: 0x53a8220 "FM"
0x53a8220 (FM): task 0x53a8220 has had a failure and has been stopped.
0x53a8220 (FM): The task has been terminated because it triggered an exception that raised the signal 4.
1980-012-14:09:03.19939 CFE_ES_RestartApp: Restart Application FM Initiated
1980-012-14:09:08.63372 CFE_ES_CleanUpApp: CleanUpTaskResources for Task ID:22 returned Error: 0xC4000026
EVS Port1 66/1/CFE_ES 41: Restart Application FM Failed: CleanUpApp Error 0xC4000023

Another colleague John Hueber reported the following:

CI doesn’t restart when commanded because it calls CFE_ES_ExitApp with the wrong status (running).
If I put a 5 second task delay in CI_AppMain before calling CFE_ES_ExitApp the tasks restarts fine.
It looks like whenever CFE_ES_ExitApp gets called before the task is deleted then the restart is unsuccessful.
If the task is deleted before it gets to CFE_ES_ExitApp then the restart is successful.
If the task has child tasks then it takes longer to get to deleting the main task because the child task is in the list of resources that have to be deleted.
There is a bug in this part because deleting the resources of the task also deletes child tasks, and when the resources are deleted CFE_ES_CleanUpApp tries to delete child tasks (again), which fails and the failure prevents a restart.

I put CI in apps/hs/fsw/tables/hs_xct.c (HS_XCT_TYPE_APP_MAIN) and apps/hs/fsw/tables/hs_amt.c (HS_AMT_ACT_APP_RESTART)
then caused an exception in CI no-op processing by clearing an instruction. With the 5 second delay in CI_AppMain the restart was successful.
Without the delay the restart is unsuccessful.

Reporter Info
John N Pham, Northrop Grumman

@skliper skliper added the bug label May 6, 2020
@jphickey jphickey self-assigned this May 6, 2020
@skliper skliper added this to the 6.8.0 milestone May 7, 2020
jphickey added a commit to jphickey/cFE that referenced this issue May 7, 2020
When cleaning up a task the child task resources should be
cleaned first, followed by the main task resources.

This is because child tasks are also associated with the original
creator within OSAL and will be found through OSAL ForEachObject,
and also via links within the ES task table.

By cleaning child tasks first, this avoids attempting to delete
the child task twice.
astrogeco added a commit that referenced this issue May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants