Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirement cES1008 Failure #952

Closed
dmknutsen opened this issue Oct 9, 2020 · 6 comments · Fixed by #1038 or #1047
Closed

Requirement cES1008 Failure #952

dmknutsen opened this issue Oct 9, 2020 · 6 comments · Fixed by #1038 or #1047
Assignees
Milestone

Comments

@dmknutsen
Copy link
Contributor

Describe the bug
Requirement cES1008 specifies: Upon receipt of a Command the cFE shall Reload the Command specified cFE Application from the Command specified cFE Application file.

This is not consistent with how the FSW works. If a reload command is received, the FSW will restart the Application with the previously loaded version as long as the command specified cFE Application file exists. The same result is seen if an App is started/stopped - then loaded/started with a new version. The system/event log entries will indicate that the loading/starting of the new version of the App is successful...however the original version of the App will be the one that will actually be executed.

To Reproduce
Steps to reproduce the behavior:

  1. Start an App
  2. Send the reload command with a filename that contains a different version of the App (App name must remain constant)
  3. Verify that the original version of the App was actually the one started.

Alternatively:

  1. Start an App
  2. Stop the App
  3. Start a different version of the same App (App name must remain constant)
  4. Verify that the original version of the App was actually the one started.

Expected behavior
Able to restart a task of the the same name.

Code snips
If applicable, add references to the software.

System observed on:
OS: ubuntu-19.10
Versions: cfe: v6.7.0+dev295; osal: v5.0.0+dev247; psp: v1.4.14.0

Reporter Info
Dan Knutsen

@skliper skliper transferred this issue from nasa/cFS Oct 16, 2020
@skliper skliper added this to the 7.0.0 milestone Oct 29, 2020
@skliper skliper added the bug label Oct 29, 2020
@jphickey
Copy link
Contributor

jphickey commented Nov 2, 2020

Dug into this one today --- I believe this is an artifact of using RTLD_GLOBAL flag when loading libraries.

I confirmed that there is no problem with the way CFE is invoking OSAL here - it is indeed calling OS_ModuleUnload() on the old module - which is calling dlclose(), and calling OS_ModuleLoad() on the new module, with the correct filename, and calling dlopen(), and everything is correct. But the subsequent dlsym() call still returns the same entry point from the original module.

This likely is related to the way dlclose() actually works on GNU/Linux. Specifically - this function is more of a request. See official documentation here: https://pubs.opengroup.org/onlinepubs/009695399/functions/dlclose.html

Specifically this says:

The use of dlclose() reflects a statement of intent on the part of the process, but does not create any requirement upon the implementation, such as removal of the code or symbols referenced by handle.

Clearly, observations strongly suggest that the original module must have still has some references/relocations pointing to it, such that the runtime loader never unloads the original module, despite it being "closed".

BUT - by using RTLD_LOCAL instead of RTLD_GLOBAL flag in dlopen(), this makes it so symbols in the module are not able to be referenced by other entities. I was able to confirm that this does make it so the module is consistently unloaded with dlclose() as expected.

So using this flag solves the reload issue for apps but creates another problem on libraries - the libraries DO need to be loaded with RTLD_GLOBAL because the symbols in libs need to be available globally (that's the point). So we need a way to use RTLD_LOCAL flag on apps and RTLD_GLOBAL on libs (which are never reloaded, so its not an issue).

This may necessitate a new API variant of OS_ModuleLoad that can provide a hint to OSAL whether it is being loaded as a library (where things may reference it, and not likely to be unloaded) vs. an app (things should not reference it, thereby ensure it is possible to unload).

@jphickey jphickey added the CCB:Ready Ready for discussion at the Configuration Control Board (CCB) label Nov 2, 2020
@jphickey
Copy link
Contributor

jphickey commented Nov 2, 2020

My proposal is that we add something like:

#define OS_MODULE_FLAG_LOCAL   0x01
#define OS_MODULE_FLAG_GLOBAL  0x00

int32 OS_ModuleLoadWithFlags(osal_id_t *module_id, const char *module_name, const char *filename, uint32 flags);`

My proposal is to make GLOBAL mode the default (unlike POSIX) because that is the only thing that RTEMS and VxWorks do, and it is the current behavior.

CFE can then invoke OS_ModuleLoadWithFlags, using OS_MODULE_FLAG_LOCAL for apps, and GLOBAL for libraries.
The existing OS_ModuleLoad can become an alias for flags=GLOBAL.

@skliper
Copy link
Contributor

skliper commented Nov 3, 2020

Sounds good to me. @acudmore?

@jphickey
Copy link
Contributor

jphickey commented Nov 3, 2020

Maybe the terms LOCAL/GLOBAL aren't the best - we could also use a more CFE-ish term like "LIBRARY" or "APPLICATION"?

This also may necessitate a variant of OS_SymbolLookup() that accepts a module ID to find the symbol in. This would be the only way to look up the entry point in a LOCAL/APPLICATION style module, as the symbols won't be found in the global symbol table (by definition).

@jphickey
Copy link
Contributor

jphickey commented Nov 3, 2020

Also worth noting - pretty much every other resource type already has a flags option on its respective "create" call ... except for modules, for some reason....?

Perhaps just to make things consistent and simpler we should just add a flags option to OS_ModuleLoad() - which would be a breaking change of course, but it'll make it consistent, and avoid the need for an ugly wrapper/compatibility API.

I'll make a PR that does this, and we can discuss.

@astrogeco astrogeco added CCB-20201104 and removed CCB:Ready Ready for discussion at the Configuration Control Board (CCB) labels Nov 4, 2020
@astrogeco
Copy link
Contributor

CCB 2020-11-04

  • This is probably limited glibc implementation in POSIX

jphickey added a commit to jphickey/cFE that referenced this issue Dec 7, 2020
Set the newly-added flags parameter on the OS_ModuleLoad
properly to allow an app to be properly unloaded, which
in turn allows the reload command to work as expected.
jphickey added a commit to jphickey/cFE that referenced this issue Dec 7, 2020
Set the newly-added flags parameter on the OS_ModuleLoad
properly to allow an app to be properly unloaded, which
in turn allows the reload command to work as expected.
astrogeco added a commit that referenced this issue Dec 18, 2020
Fix #952, OSAL module flags to permit app reload
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants