-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cublas] Fix race condition in cublas handle deletion #136
Conversation
@npmiller have you applied clang-format? |
I thought I did but maybe it missed the specific repo configuration, I believe it should be fixed now. |
This was causing segmentation faults at the end of the program when running tests. Cublas handles need to be created per thread and per context, therefore they're currently being stored in a `static thread_local` map, mapping contexts to cublas handles. These handles are deleted in two places, the map destructor called when the thread is deleted, or the context destructor, where a callback is registered on the context to delete the cublas handle. Since there is two places where a handle can be deleted and that these two will likely run in separate threads, the handle was placed in an C++ atomic to avoid race conditions, and allocated on the heap so that it can stay alive when one of either the thread or the context is deleted. But there was two issues with this, the main one is that the callback registered on the context was not given the pointer to the atomic handle, rather the pointer to the slot in the `thread_local` map where the atomic pointer was stored. Which meant that whenever the thread would be deleted before the context, the pointer available to the callback would be invalid as the map was deleted. The solution for that is to simply pass directly the pointer to the atomic handle to the callback. The second issue is that both of these places were deleting the atomic in all cases, which means one of the deletion was invalid. The solution for this is to only do the deletion in the last one being called, that is to say the one which will get a `nullptr` when atomically accessing the handle pointer. With these two patches the tests are running fine.
865943c
to
3c1da73
Compare
`half` has been removed from the global scope in DPC++, see intel/llvm#4818
I also had to rebase this since the hipSYCL changes moved some of the code changed in here, I've attached an updated log file: And I also had to extend the hipSYCL workaround for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@npmiller Thanks a lot for the PR and cleaning the random segfaults I was seeing. It looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
) * [cublas] Fix race condition in cublas handle deletion This was causing segmentation faults at the end of the program when running tests. Cublas handles need to be created per thread and per context, therefore they're currently being stored in a `static thread_local` map, mapping contexts to cublas handles. These handles are deleted in two places, the map destructor called when the thread is deleted, or the context destructor, where a callback is registered on the context to delete the cublas handle. Since there is two places where a handle can be deleted and that these two will likely run in separate threads, the handle was placed in an C++ atomic to avoid race conditions, and allocated on the heap so that it can stay alive when one of either the thread or the context is deleted. But there was two issues with this, the main one is that the callback registered on the context was not given the pointer to the atomic handle, rather the pointer to the slot in the `thread_local` map where the atomic pointer was stored. Which meant that whenever the thread would be deleted before the context, the pointer available to the callback would be invalid as the map was deleted. The solution for that is to simply pass directly the pointer to the atomic handle to the callback. The second issue is that both of these places were deleting the atomic in all cases, which means one of the deletion was invalid. The solution for this is to only do the deletion in the last one being called, that is to say the one which will get a `nullptr` when atomically accessing the handle pointer. With these two patches the tests are running fine. * [cublas] Fix formatting of cublas handle patch * Fix half types for DPC++ `half` has been removed from the global scope in DPC++, see intel/llvm#4818
This was causing segmentation faults at the end of the program when
running tests.
Cublas handles need to be created per thread and per context, therefore
they're currently being stored in a
static thread_local
map, mappingcontexts to cublas handles.
These handles are deleted in two places, the map destructor called when
the thread is deleted, or the context destructor, where a callback is
registered on the context to delete the cublas handle.
Since there is two places where a handle can be deleted and that these
two will likely run in separate threads, the handle was placed in an
C++ atomic to avoid race conditions, and allocated on the heap so that
it can stay alive when one of either the thread or the context is
deleted.
But there was two issues with this, the main one is that the callback
registered on the context was not given the pointer to the atomic
handle, rather the pointer to the slot in the
thread_local
map wherethe atomic pointer was stored. Which meant that whenever the thread
would be deleted before the context, the pointer available to the
callback would be invalid as the map was deleted. The solution for that
is to simply pass directly the pointer to the atomic handle to the
callback.
The second issue is that both of these places were deleting the atomic
in all cases, which means one of the deletion was invalid. The solution
for this is to only do the deletion in the last one being called, that
is to say the one which will get a
nullptr
when atomically accessingthe handle pointer.
With these two patches the tests are running fine.
Here's a log file running the cublas tests:
cublas.log