Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle UCXNotConnected error #5449

Merged
merged 1 commit into from
Oct 21, 2021
Merged

Conversation

pentschev
Copy link
Member

This is an issue that occurs non-deterministically, generally during
client._reconnect at shutdown when peers have already begun closing.
Therefore it needs to be catched and raise CommClosedErrorto prevent
unhandled errors.

Given the non-deterministic nature of the issue, testing it is very challenging and unfortunately there's no known test that can be added currently.

Error handling depends on rapidsai/ucx-py#799 , but the change is backwards-compatible.

This is an issue that occurs non-deterministically, generally during
`client._reconnect` at shutdown when peers have already begun closing.
Therefore it needs to be catched and raise `CommClosedError`to prevent
unhandled errors.
@pentschev
Copy link
Member Author

Thanks for the review @quasiben . I don't think errors are related, they're all on Windows runs and this PR exclusively modifies UCX which isn't even supported on Windows.

@quasiben quasiben merged commit 918e3fb into dask:main Oct 21, 2021
@pentschev pentschev deleted the handle-ucx-not-connected branch October 27, 2021 11:57
zanieb pushed a commit to zanieb/distributed that referenced this pull request Oct 28, 2021
Handle UCXNotConnected error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants