Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make proxy tests with LocalCUDACluster asynchronous #1084

Merged
merged 5 commits into from
Jan 16, 2023

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Jan 16, 2023

After dask/distributed#7429 was merged, some of those tests started hanging and I could confirm there were two threads concurrently attempting to take the UCX spinlock and the GIL, which led to such deadlock. UCX-Py is currently not thread-safe, and indeed can cause problems like this should two or more threads attempt to call communication routines that will required the UCX spinlock. My theory is that the synchronous cluster will indeed cause communication on the main thread (in this case, the pytest thread) upon attempting to shutdown the cluster, instead of only within the Distributed communication thread, likely being the reason behind the test hanging.

Asynchronous Distributed clusters seem not to cause any communication from the main thread, but only in the communication thread as expected, thus making the tests asynchronous suffice to resolve such issues. In practice, it's unlikely that people will use sync Distributed clusters from the same process (as pytest does), and thus it's improbable to happen in real use-cases.

@github-actions github-actions bot added the python python code needed label Jan 16, 2023
@pentschev pentschev added bug Something isn't working 2 - In Progress Currently a work in progress non-breaking Non-breaking change labels Jan 16, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 16, 2023

Codecov Report

Base: 0.00% // Head: 87.04% // Increases project coverage by +87.04% 🎉

Coverage data is based on head (d71a5b9) compared to base (8c87288).
Patch coverage: 93.58% of modified lines in pull request are covered.

❗ Current head d71a5b9 differs from pull request most recent head 4e29b37. Consider uploading reports for the commit 4e29b37 to get more accurate results

Additional details and impacted files
@@                Coverage Diff                @@
##           branch-23.02    #1084       +/-   ##
=================================================
+ Coverage          0.00%   87.04%   +87.04%     
=================================================
  Files                26       18        -8     
  Lines              3439     2300     -1139     
=================================================
+ Hits                  0     2002     +2002     
+ Misses             3439      298     -3141     
Impacted Files Coverage Δ
dask_cuda/utils.py 83.43% <82.35%> (+83.43%) ⬆️
dask_cuda/explicit_comms/dataframe/shuffle.py 98.65% <95.91%> (+98.65%) ⬆️
dask_cuda/cuda_worker.py 77.92% <100.00%> (+77.92%) ⬆️
dask_cuda/explicit_comms/comms.py 99.05% <100.00%> (+99.05%) ⬆️
dask_cuda/get_device_memory_objects.py 92.85% <100.00%> (+92.85%) ⬆️
dask_cuda/local_cuda_cluster.py 89.01% <100.00%> (+89.01%) ⬆️
dask_cuda/proxify_host_file.py 93.71% <100.00%> (+93.71%) ⬆️
dask_cuda/benchmarks/common.py
dask_cuda/benchmarks/local_cudf_groupby.py
dask_cuda/benchmarks/local_cudf_shuffle.py
... and 22 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pentschev pentschev changed the title Make proxy UCX tests async Make proxy tests with LocalCUDACluster asynchronous Jan 16, 2023
@pentschev pentschev added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Jan 16, 2023
@pentschev pentschev marked this pull request as ready for review January 16, 2023 15:43
@pentschev pentschev requested a review from a team as a code owner January 16, 2023 15:43
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @pentschev !

@pentschev
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 2eee5eb into rapidsai:branch-23.02 Jan 16, 2023
@pentschev
Copy link
Member Author

Thanks @madsbk for the review/approval!

@pentschev pentschev deleted the make-proxy-tests-async branch February 28, 2023 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team bug Something isn't working non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants