Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES - Invalid Memory Handle When Restarting/Deleting an Application with Tables (GSFC DCR 14483) #121

Closed
skliper opened this issue Sep 30, 2019 · 6 comments · Fixed by #622
Assignees
Labels
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

When you delete an application that uses tables (e.g. HK) and then restart another task (e.g. SC) for a second time, the ES task writes to the system log that there are invalid memory handles.
 
Seems like when tables get unregistered is where the errors are happening. Message says it got a bad pointer for this table, not sure if the app in messed up. This problem is not isolated to RestartApp. It occurs in DeleteApp as well. What it looks like is that the linked list is not getting cleaned up properly when an app is deleted or restarted.
 
Further investigation in the CFS Lab narrows the problem down to the RemoveAccessLink function in cfe_tbl_internal.c. The errors are being generated on table handles from the deleted app. The buffer that is trying to be placed back into the pool is set to NULL because it has already been put back into the pool. The tables that were "cleaned up" still contain the AppID of the deleted app. When the subsequent app is restarted, its AppID becomes that of the deleted app and inherits the table handles from the previous app. For example, the HK app has 2 tables and the SC app has 73 tables. When HK is deleted, the 2 tables are removed and the entries still contain the AppID of HK. When SC is restarted, it becomes the AppID that HK was. The reason the errors occur on the 2nd restart is because on the first restart SC had a unique AppID. On the second restart, it has inherited HKs original AppID. In this case, you will see 2 sets of errors when SC is restarted. The SC application did not show any adverse functionality because of these errors. All that is happening is that the PutPoolBuf function is reporting an error when trying to return a NULL buffer to the pool.

@skliper skliper added this to the 6.7.1 milestone Sep 30, 2019
@skliper skliper self-assigned this Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 90. Created by sstrege on 2015-08-27T13:59:18, last modified: 2019-07-03T12:48:08

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2016-10-18 15:00:28:

redispatching these tickets to clear my name from the "owner" field,
so people willing to work on tickets will not think that I am already
working these issues ;)

@skliper skliper added the bug label Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2016-11-08 14:17:08:

Current crop of cfe-next are all going into CFE 6.6

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by gdecaruf on 2018-02-08 15:27:04:

Ran into exact same issue.

the culprit is that the UsedFlag isn't checked when looping over Handles. This leads to the call to CFE_TBL_RemoveAccessLink to fail in cfe_tbl_internal.c:1450 (CFE_TBL_CleanUpApp) because the restarted app was reassigned to the AppID that was previously owned by the other app that has been shutdown. And so when it loops over the Table handles, it incorrectly tries to remove the access link to a tbl registry that has already been cleared previously.

Fix is to add Check for UsedFlag == TRUE

{{{
/* Check to see if the Handle belongs to the Application being deleted */
if (CFE_TBL_TaskData.Handles[i].AppId == AppId &&
CFE_TBL_TaskData.Handles[i].UsedFlag == TRUE)
...

}}}

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by gdecaruf on 2018-02-08 15:31:59:

What are the repercussions in a current build that doesn't have this fix? Is it dangerous? It seems to only fail with the memory access error and keep going.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-07-03 12:48:08:

Moved unfinished 6.6.1 issues to next minor release

@skliper skliper removed their assignment Sep 30, 2019
@skliper skliper modified the milestones: 6.7.1, 6.7.0, 6.8.0 Sep 30, 2019
@skliper skliper removed this from the 6.8.0 milestone Nov 4, 2019
@skliper skliper added this to the 6.8.0 milestone Mar 2, 2020
@dmknutsen dmknutsen self-assigned this Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants