Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[v1.9.x] stop closing opened libs #20523

Merged
merged 3 commits into from
Aug 18, 2021
Merged

Conversation

samskalicky
Copy link
Contributor

@samskalicky samskalicky commented Aug 12, 2021

Description

stop closing opened libs. fixes #20411 using portions already merged into master/2.0 in #19016.

According to [1]

The dlclose() function shall inform the system that the object referenced by a handle returned from a previous dlopen() invocation is no longer needed by the application.

The use of dlclose() reflects a statement of intent on the part of the process, but does not create any requirement upon the implementation, such as removal of the code or symbols referenced by handle.

So calling dlclose is just a suggestion to the OS that it could close the library. But the OS may not do anything (ie. and wait until the process exits to clean up any opened handles). Im thinking we might as well just let it be cleaned up anyway rather than calling dlclose at all.

As suggested in [2]

If you only ever open one library, use it throughout your program, then calling dlclose just before you exit is probably not essential, but if you open a lot of libraries (e.g. using some sort of plugin in a long-running program that can/will use many different plugins, the program may run out of virtual address space if you don't call dlclose.

(All shared libraries are closed on exit anyway, so leaving it open at exit should not be an issue)

So if a user was loading/unloading a bunch of libraries at runtime they might want to close some they dont need anymore.

However, since MXNet registers components from the library (ie. operators, partitioners, graph passes, etc) and has NO capability to unregister them the references to the loaded library will live at least as long as MXNet (ie. libmxnet.so) does. MXNet already has quite a few components (or its dependencies like TVM/NNVM) that live forever and require the OS to clean them up when the process exits. So its not possible to close libmxnet.so and assume everything is cleaned up. We might as well do the same thing for custom libraries.

This PR removes the code to close opened libraries in the destructor of the LibraryInitializer class with the caveat that MXNet will still have pointers to that library, so closing would not be recommended until process exit anyway.

@mxnet-bot
Copy link

Hey @samskalicky , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [windows-cpu, unix-gpu, centos-cpu, edge, windows-gpu, unix-cpu, clang, miscellaneous, centos-gpu, website, sanity]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@samskalicky samskalicky changed the base branch from master to v1.9.x August 12, 2021 23:34
@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Aug 12, 2021
@mseth10 mseth10 added the pr-awaiting-review PR is waiting for code review label Aug 13, 2021
@samskalicky
Copy link
Contributor Author

Manual build on Mac runs the gemm_lib example in extensions without segfault

* \return 0 when success, -1 when failure happens.
*/
MXNET_DLL int MXLoadLib(const char *path, unsigned verbose);
MXNET_DLL int MXLoadLib(const char *path, unsigned verbose, void** lib);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a C API change. Is it required for the fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @szha! No this change isnt required, but without it we cannot return a handle to the opened library. Its not required per-se since opening the same library using ctypes or some other API will return the same handle anyway (with linux doing the mapping). So we can remove this change if that is important, either way is fine to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C API is part of the API to maintain semantic versioning for. Let's not include the API change then.

@mseth10
Copy link
Contributor

mseth10 commented Aug 17, 2021

Thanks for the PR. The destructor change looks good.

What's the benefit of returning handle to opened library, if it's not recommended that users close them? Is there any other use case?

@mseth10 mseth10 added pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-review PR is waiting for code review labels Aug 18, 2021
@samskalicky
Copy link
Contributor Author

samskalicky commented Aug 18, 2021

Thanks @szha @mseth10 ive made the changes to not return the handle. Now this PR is effectively just not closing opened libraries on exit (and some misc comment updates)

@mseth10 mseth10 added pr-work-in-progress PR is still work in progress pr-awaiting-testing PR is reviewed and waiting CI build and test and removed pr-awaiting-testing PR is reviewed and waiting CI build and test pr-work-in-progress PR is still work in progress labels Aug 18, 2021
Copy link
Contributor

@mseth10 mseth10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mseth10 mseth10 added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed pr-awaiting-testing PR is reviewed and waiting CI build and test labels Aug 18, 2021
@mseth10 mseth10 merged commit b5e9ae8 into apache:v1.9.x Aug 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1.7.0.post2: consistent segfault in ~LibraryInitializer() with loaded extensions on OSX Mojave
4 participants