Fix a segmentation fault due to unloaded libcufile #158
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Do not unload the
cufile (GDS)library becauselibcufileregisters a cleanup function withatexit()and unloading the library will cause a segfault (calling the cleanup function that doesn't exist anymore).It turns out that CUDA 11.5 bundled GDS library (libcufile) and it is available in GPUCI's build/test container image (such as
gpuci/rapidsai:21.12-cuda11.5-devel-ubuntu18.04-py3.7).cuCIM would dynamically load
libcufile.soshared library and unload it when a global static variable in cuCIM is destroyed.Since libcufile's cleanup function(through atexit_thread handler) is registered after the libcufile is loaded, it causes a segmentation fault at exit if the libcufile is explicitly unloaded through
dlclose().(See #153)
You can see discussions with
atexit in dynamically loaded shared librarykeywords (the actual root cause is the use ofthread_localvariable inlibcufile.so).Maybe using destructor attribute could fix the issue from GDS(cufile) side.
This patch leaves the libcufile library loaded, without calling
::dlclose(library_handle)method to unloadlibcufile.so.Update (2021-11-20): I couldn't find
atexit()used in libcufile (though I can seeatexit()call for an executable file[fio]) but it seems that a method is registered and called at the exit time so we cannot help but leave the dynamically loaded library without unloading.Update (2021-11-23): libcufile.so started using
thread_localvariable since v1.1 which makes the shared library unloadable.For this reason, this patch is a correct patch until libcufile is updated to make it possible.
Related information: