Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecation warning for ucx_net_devices='auto' on UCX 1.11+ #681

Merged

Conversation

pentschev
Copy link
Member

Add deprecation warning for ucx_net_devices='auto' on UCX 1.11+, fallback to UCX's default when that's requested.

Using ucx_net_devices='auto' can cause hangs, this is because it would need to be updated to include both the IPoIB and MLX interfaces in UCX_NET_DEVICES. Given UCX can now successfully handle automatic CUDA<->IB mapping, -- including on DGX A100, which this option wasn't capable of handling -- there's no reason for us to support that anymore and deprecating it should be the best approach at this time.

@pentschev pentschev requested a review from a team as a code owner July 22, 2021 18:37
@github-actions github-actions bot added the python python code needed label Jul 22, 2021
@github-actions github-actions bot added the doc Documentation label Jul 22, 2021
@pentschev pentschev added 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Jul 22, 2021
@pentschev
Copy link
Member Author

cc @charlesbluca this should prevent UCX from hanging even when specifying --ucx-net-devices auto on UCX 1.11 and newer.

@pentschev
Copy link
Member Author

@ayushdg @beckernick @VibhuJawa FYI

@quasiben
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 14b8d63 into rapidsai:branch-21.08 Jul 23, 2021
@pentschev pentschev deleted the ucx-net-devices-auto-warning branch August 12, 2021 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team doc Documentation non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants