Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cublas] Fix race condition in cublas handle deletion #136

Merged
merged 3 commits into from
Nov 17, 2021

Conversation

npmiller
Copy link
Contributor

This was causing segmentation faults at the end of the program when
running tests.

Cublas handles need to be created per thread and per context, therefore
they're currently being stored in a static thread_local map, mapping
contexts to cublas handles.

These handles are deleted in two places, the map destructor called when
the thread is deleted, or the context destructor, where a callback is
registered on the context to delete the cublas handle.

Since there is two places where a handle can be deleted and that these
two will likely run in separate threads, the handle was placed in an
C++ atomic to avoid race conditions, and allocated on the heap so that
it can stay alive when one of either the thread or the context is
deleted.

But there was two issues with this, the main one is that the callback
registered on the context was not given the pointer to the atomic
handle, rather the pointer to the slot in the thread_local map where
the atomic pointer was stored. Which meant that whenever the thread
would be deleted before the context, the pointer available to the
callback would be invalid as the map was deleted. The solution for that
is to simply pass directly the pointer to the atomic handle to the
callback.

The second issue is that both of these places were deleting the atomic
in all cases, which means one of the deletion was invalid. The solution
for this is to only do the deletion in the last one being called, that
is to say the one which will get a nullptr when atomically accessing
the handle pointer.

With these two patches the tests are running fine.

Here's a log file running the cublas tests:
cublas.log

@mmeterel
Copy link
Contributor

@npmiller have you applied clang-format?

@mmeterel mmeterel self-requested a review November 15, 2021 20:50
@npmiller
Copy link
Contributor Author

@npmiller have you applied clang-format?

I thought I did but maybe it missed the specific repo configuration, I believe it should be fixed now.

This was causing segmentation faults at the end of the program when
running tests.

Cublas handles need to be created per thread and per context, therefore
they're currently being stored in a `static thread_local` map, mapping
contexts to cublas handles.

These handles are deleted in two places, the map destructor called when
the thread is deleted, or the context destructor, where a callback is
registered on the context to delete the cublas handle.

Since there is two places where a handle can be deleted and that these
two will likely run in separate threads, the handle was placed in an
C++ atomic to avoid race conditions, and allocated on the heap so that
it can stay alive when one of either the thread or the context is
deleted.

But there was two issues with this, the main one is that the callback
registered on the context was not given the pointer to the atomic
handle, rather the pointer to the slot in the `thread_local` map where
the atomic pointer was stored. Which meant that whenever the thread
would be deleted before the context, the pointer available to the
callback would be invalid as the map was deleted. The solution for that
is to simply pass directly the pointer to the atomic handle to the
callback.

The second issue is that both of these places were deleting the atomic
in all cases, which means one of the deletion was invalid. The solution
for this is to only do the deletion in the last one being called, that
is to say the one which will get a `nullptr` when atomically accessing
the handle pointer.

With these two patches the tests are running fine.
`half` has been removed from the global scope in DPC++, see
intel/llvm#4818
@npmiller
Copy link
Contributor Author

I also had to rebase this since the hipSYCL changes moved some of the code changed in here, I've attached an updated log file:

And I also had to extend the hipSYCL workaround for half as the type has also been removed from the global namespace in DPC++ (see: intel/llvm#4818)

@mmeterel mmeterel requested a review from mkrainiuk November 16, 2021 17:36
Copy link
Contributor

@mmeterel mmeterel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npmiller Thanks a lot for the PR and cleaning the random segfaults I was seeing. It looks good to me.

Copy link
Contributor

@mkrainiuk mkrainiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

@mkrainiuk mkrainiuk merged commit a157c56 into uxlfoundation:develop Nov 17, 2021
sbalint98 pushed a commit to sbalint98/oneMKL that referenced this pull request Dec 6, 2021
)

* [cublas] Fix race condition in cublas handle deletion

This was causing segmentation faults at the end of the program when
running tests.

Cublas handles need to be created per thread and per context, therefore
they're currently being stored in a `static thread_local` map, mapping
contexts to cublas handles.

These handles are deleted in two places, the map destructor called when
the thread is deleted, or the context destructor, where a callback is
registered on the context to delete the cublas handle.

Since there is two places where a handle can be deleted and that these
two will likely run in separate threads, the handle was placed in an
C++ atomic to avoid race conditions, and allocated on the heap so that
it can stay alive when one of either the thread or the context is
deleted.

But there was two issues with this, the main one is that the callback
registered on the context was not given the pointer to the atomic
handle, rather the pointer to the slot in the `thread_local` map where
the atomic pointer was stored. Which meant that whenever the thread
would be deleted before the context, the pointer available to the
callback would be invalid as the map was deleted. The solution for that
is to simply pass directly the pointer to the atomic handle to the
callback.

The second issue is that both of these places were deleting the atomic
in all cases, which means one of the deletion was invalid. The solution
for this is to only do the deletion in the last one being called, that
is to say the one which will get a `nullptr` when atomically accessing
the handle pointer.

With these two patches the tests are running fine.

* [cublas] Fix formatting of cublas handle patch

* Fix half types for DPC++

`half` has been removed from the global scope in DPC++, see
intel/llvm#4818
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants