Skip to content

Conversation

@rafbiels
Copy link
Contributor

@rafbiels rafbiels commented Aug 15, 2024

Do not increment NumComputeStreams / NumTransferStreams before cuStreamCreateWithPriority returns. Too early increment caused other threads to read the incremented count before a CUDA stream was created and try to use an invalid stream handle, causing crashes.

The construction:

if (condition) {
  lock_this_scope
  if (condition) {
    create_object
    update_condition
  }
}
use_object

is only thread-safe if update_condition happens after create_object is completed. This PR ensures the ordering.

intel/llvm PR: intel/llvm#15100

@rafbiels
Copy link
Contributor Author

Reproducer for the crashes:

#include <future>
#include <sycl/sycl.hpp>

int main() {
  sycl::queue q{};

  constexpr static unsigned int numThreads{128};
  std::vector<std::future<void>> futures;
  std::array<unsigned int, numThreads> hostData{};
  std::array<unsigned int*, numThreads> devicePointers{};
  futures.reserve(numThreads);

  for (unsigned int i{0}; i<numThreads; ++i)  {
    devicePointers[i] = sycl::malloc_device<unsigned int>(1, q);
  }

  for (unsigned int i{0}; i<numThreads; ++i)  {
    futures.push_back(std::async([&q, &devicePointers, &hostData, i](){
      q.copy(hostData.data()+i, devicePointers[i], 1);
    }));
  }

  for (unsigned int i{0}; i<numThreads; ++i)  {
    futures[i].wait();
  }
  futures.clear();

  q.wait_and_throw();

  for (unsigned int i{0}; i<numThreads; ++i)  {
    sycl::free(devicePointers[i], q);
  }

}

This crashes for me in ~50% of runs before this PR with:

UR CUDA ERROR:
        Value:           400
        Name:            CUDA_ERROR_INVALID_HANDLE
        Description:     invalid resource handle
        Function:        urEnqueueUSMMemcpy

Copy link
Contributor

@hdelan hdelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Tricky one

@kbenzie kbenzie added the ready to merge Added to PR's which are ready to merge label Aug 16, 2024
Do not increment NumComputeStreams / NumTransferStreams
before cuStreamCreateWithPriority returns. Too early increment
caused other threads to read the incremented count before a CUDA
stream was created and try to use an invalid stream handle,
causing crashes.

The construction:
```
if (condition) {
  lock_this_scope
  if (condition) {
    create_object
    update_condition
  }
}
use_object
```
is only thread-safe if update_condition happens after create_object
is completed.
@omarahmed1111 omarahmed1111 force-pushed the rafbiels/cuda-stream-race-cond branch from 24b6ff7 to 15bca3b Compare August 19, 2024 13:14
@github-actions github-actions bot added the cuda CUDA adapter specific issues label Aug 19, 2024
@omarahmed1111 omarahmed1111 merged commit cabf128 into oneapi-src:main Aug 19, 2024
@npmiller
Copy link
Contributor

npmiller commented Aug 19, 2024

Should we add this to v0.10.0?

steffenlarsen pushed a commit to intel/llvm that referenced this pull request Aug 20, 2024
Fix race condition in CUDA stream creation in the UR CUDA adapter

See oneapi-src/unified-runtime#1984
@kbenzie kbenzie added the v0.10.x Include in the v0.10.x release label Aug 20, 2024
kbenzie pushed a commit that referenced this pull request Aug 20, 2024
Fix race condition in CUDA stream creation
@kbenzie kbenzie mentioned this pull request Aug 20, 2024
53 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda CUDA adapter specific issues ready to merge Added to PR's which are ready to merge v0.10.x Include in the v0.10.x release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants