Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CFE_ES_AppCreate and CFE_ES_LoadLibrary #173

Closed
skliper opened this issue Sep 30, 2019 · 21 comments · Fixed by #960 or #975
Closed

Refactor CFE_ES_AppCreate and CFE_ES_LoadLibrary #173

skliper opened this issue Sep 30, 2019 · 21 comments · Fixed by #960 or #975
Assignees
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

These functions actually perform 3 major operations:

  • Calling OS_ModuleLoad to load the module

  • Calling OS_SymbolLookup to find the entry point / init function and calling that entry point / init function. (For apps this also involves OS_TaskCreate here whereas a library init function is called directly)

  • Manipulate the internal CFE ES Global tables to store the data related to the application/library

These major functional items should be broken into separate functions. This will make CFE_ES_AppCreate and CFE_ES_LoadLibrary much cleaner looking and remove the overloads needed to support static loading as in ticket #115.

@skliper skliper self-assigned this Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 142. Created by jphickey on 2016-03-08T14:10:41, last modified: 2019-05-15T12:16:36

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2016-03-15 13:28:53:

Note - this should also consider the functionality provided by the object table for the core applications. This proposed enhancement should ideally unify the code for all 3 startup methodologies (Statically linked Core Task, Static linked App, Dynamically linked App)

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2016-03-22 17:45:04:

Also note -- if the CFE_ES_ObjectTable remains for starting up the core apps, then this table should be qualified as const. I was unable to make this change as part of #170, since this broke the unit test. However that unit test is arguably invalid and I would rather see that test removed rather than keeping the startup table as a writeable object (having it correctly qualified as read-only has more benefits than preserving a dubious unit test).

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2016-11-08 14:17:08:

Current crop of cfe-next are all going into CFE 6.6

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sstrege on 2017-06-27 13:55:21:

As per the 6-27-17 CCB meeting the following cppcheck errors/warnings will be addressed under this ticket:

  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:457: error: Array 'CFE_ES_Global.AppTable[32]' accessed at index 32, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:478: error: Array 'CFE_ES_Global.AppTable[32]' accessed at index 32, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:489: error: Array 'CFE_ES_Global.AppTable[32]' accessed at index 32, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:767: error: Array 'CFE_ES_Global.LibTable[10]' accessed at index 10, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:788: error: Array 'CFE_ES_Global.LibTable[10]' accessed at index 10, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:799: error: Array 'CFE_ES_Global.LibTable[10]' accessed at index 10, which is out of bounds.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:409: warning: Either the condition 'FileName!=0' is redundant or there is possible null pointer dereference: FileName.
  • cfe/fsw/cfe-core/src/es/cfe_es_apps.c:574: warning: Either the condition 'FileName!=0' is redundant or there is possible null pointer dereference: FileName.

the CCB discussed two possible solutions:

  1. Add comments to the code that include keywords to allow cppcheck suppression. Comments need to warn about the cppcheck error/waring and state how code is harmless as is.
  2. Clean up the code to remove the errors and warnings. This solution is preferred for the error messages.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2017-10-19 16:10:37:

As CFE 6.6 is about to be frozen, there is not enough time to get in a significant rework like this. Since the startup code is not //horribly// broken (just ugly) I recommend pushing this to the next pass.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2018-06-05 11:02:26:

Per email from Preston Faiks on 2018-06-04, there is an actual observed race condition issue with CFE_ES_AppCreate() out in the field:

When ES is loading and starting apps, one app might fail initialization and call CFE_ES_ExitApp() If that occurs, its app state will be set to CFE_ES_APP_STATE_STOPPED.

When apps are scanned, it will be removed from the app table and that table entry set to not in use.

As ES continues to load apps, it will make use on the now unused app table entry. It will not change the app state in the entry until it has successfully loaded the app into memory.

The process of loading an app into memory can cause the task to pend on file system (or network file system) and allow other tasks to run.

As that app continues to be loaded, another app scan can occur and detect the app entry as both in use and stopped, and will unload it.

When ES finishes loading the app, it will spawn a task at an entry point which was just unloaded by the scanning task, causing it to execute from unloaded memory and crash.

I have reviewed this code again and the race condition risk described is definitely still present in the current development branch, but this isn't the only example. There are other similar race conditions that are possible regarding the use if the RecordUsed boolean field.

Having an observed failure should escalate this in priority now.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-06-25 22:41:21:

EDIT: Previous comment was wrong. However, looks like POSIX OSAL is using detached threads. This problem could be solved by a thread-join mechanism in OSAL.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-06-25 22:48:45:

CFS_ES_AppCreate is an if-pyramid

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-06-26 04:00:14:

I took a stab at correcting this in changeset:54c96326. I'm not really sure how to test it, however, without banging on it. Is this something Preston Falks can test with his setup? If need be, I can work out a way to test it. I certainly support a group review. The changes are small.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-08-07 14:05:18:

I refactored this again in changeset:e2c4acc20f7d7d339536e2a18af97323e96cd8e8. This new effort creates CFE_ES_AppCreate_Threaded() which is called by everyone who uses CFE_ES_AppCreate() that isn't in the command pipe processing thread. The user in that thread calls CFE_ES_AppCreate_Unsync(). By funneling all calls of CFE_ES_AppCreate_Threaded through the pipe processing thread, the scanning of the app table never runs while this command is in progress.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-08-14 19:32:57:

Seems like CFE_ES_LoadLibrary uses the global LibTable data and that is the only place that uses it. I don't think there are any issues here.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2018-08-17 16:58:12:

I have a proposal for a cleaner fix for this.

The way to fix this would appear to have CFE_ES_CleanUpApp set the app state to some neutral value as it sets RecordUsed = FALSE. Or, we can ensure once a record is marked as used, that the state is set to some sort of LOADING state.

Also, we should add another entry to the AppTable to indicate a LOADING state for the record itself. This is because CFE_ES_AppCreate() still needs to use the AppTable entry after calling OS_TaskCreate(). If a scan is performed and a record is LOADING, then the scan will skip that record and try again next time. This would prevent the race condition without the need for extra locking (or funky locking, like my first attempt at a fix.)

One idea for a redesign is to change the behavior of the app table scanning. It appears to serve two issues: clean up any app that wants to stop immediately, and manage a per-app timer to perform some function on the app. This functionality can probably be removed and let the app handle its own cleanup, while populating an entry in another table that a scan function can run to handle a reload/restart. In this way, the app table scanning would never actually touch the app table.

Please consider this a request for comments. Or we can discuss at the next CCB.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2018-09-18 17:11:34:

Per discussion at 2018-09-18 CCB, it is agreed that the bug fix for the race condition really should be tracked separately from the general cleanup of CFE_ES_AppCreate().

Ticket #279 has been entered specifically to track the issue related to the race condition and internal table management using the RecordUsed boolean and/or state flags.

The original goal of this ticket was to do other simplification / refactoring of the app loading in order to provide more flexibility and better error handling in the future. This is still a pending work item, but of lower priority than fixing the race condition.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-03-04 09:32:00:

Owner - please reassess this ticket in the context of #279 (work complete, hopefully merged soon) addressing the race condition. Is further information (CCB review) needed on this ticket or is the simplification/refactoring defined well enough to work it?

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sseeger on 2019-04-24 09:44:33:

Actually this ticket is rather confusing to me at this point. We should either discuss it or have a couple of folks who have been more heavily involved with CFS at this point write a proposal and let me implement it. I know from experience that my way of thinking is not necessarily compatible with the CCB and so I have done work that isn't looked heavily upon and I'd like to avoid that.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-04-24 09:55:18:

Marked for review by CCB to clarify path forward.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-05-15 12:16:36:

CCB 5/15/19 - Deferred to later release, Joe taking ownership

Likely related to implementing CPU affinity, etc.

@skliper
Copy link
Contributor Author

skliper commented Jan 2, 2020

Refactor should also support common checksumming functionality (or authentication/integrity as it evolves). Related to #28

@jphickey
Copy link
Contributor

Probably the time has come to fix this one - Need to address this as part of the the general effort to improve consistency in Apps and Libs as described in #28. There are also potential issues WRT the way the app state machine handles locking during clean up.

@skliper skliper added this to the 7.0.0 milestone Oct 15, 2020
@skliper
Copy link
Contributor Author

skliper commented Oct 15, 2020

Concur, added target milestone.

jphickey added a commit to jphickey/cFE that referenced this issue Oct 19, 2020
Reorganize the global data structures for apps and
libraries into components that can be shared between
the two concepts.

Break up the monolithic AppCreate and LoadLibrary
functions and have these call subroutines that
operate on the common components.
jphickey added a commit to jphickey/cFE that referenced this issue Oct 21, 2020
Reorganize the global data structures for apps and
libraries into components that can be shared between
the two concepts.

Break up the monolithic AppCreate and LoadLibrary
functions and have these call subroutines that
operate on the common components.
@jphickey jphickey linked a pull request Oct 21, 2020 that will close this issue
jphickey added a commit to jphickey/cFE that referenced this issue Oct 26, 2020
Reorganize the global data structures for apps and
libraries into components that can be shared between
the two concepts.

Break up the monolithic AppCreate and LoadLibrary
functions and have these call subroutines that
operate on the common components.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants