[bugfix]put nccl window register/deregister behind cuda platform by Amir-19 · Pull Request #25608 · vllm-project/vllm

Amir-19 · 2025-09-24T20:59:47Z

Purpose

put nccl window register/deregister behind cuda platform

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Amir Samani <asamani@nvidia.com>

gemini-code-assist

Code Review

This pull request aims to conditionally load NCCL window register/deregister functions only on the CUDA platform. The implementation correctly separates the CUDA-specific functions but introduces a critical caching bug where the function cache is not platform-aware. This can lead to AttributeErrors if the cache is populated on a non-CUDA platform first. Additionally, there is a minor typo in a new variable name. I've provided suggestions to fix both issues.

vllm/distributed/device_communicators/pynccl_wrapper.py

Signed-off-by: Amir Samani <asamani@nvidia.com>

gshtras · 2025-09-24T21:27:13Z

vllm/distributed/device_communicators/pynccl_wrapper.py

            raise e
-
+        function_specs = list(NCCLLibrary.exported_functions)
+        if current_platform.is_cuda():


These functions should exist on both platforms in NCCL >= 2.27.03.
The original issue was that on ROCm trying to import a non-existent symbol causes a crash, while on CUDA (apparently) it does not.

Amir-19 · 2025-09-25T19:42:25Z

closed in favor of #25605

put nccl window register/deregister behind cuda platform

29fc66a

Signed-off-by: Amir Samani <asamani@nvidia.com>

Amir-19 changed the title ~~put nccl window register/deregister behind cuda platform~~ [bugfix]put nccl window register/deregister behind cuda platform Sep 24, 2025

gemini-code-assist bot reviewed Sep 24, 2025

View reviewed changes

vllm/distributed/device_communicators/pynccl_wrapper.py Outdated Show resolved Hide resolved

Amir-19 mentioned this pull request Sep 24, 2025

[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so #25605

Merged

typo

5f7499c

Signed-off-by: Amir Samani <asamani@nvidia.com>

gshtras reviewed Sep 24, 2025

View reviewed changes

Amir-19 closed this Sep 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix]put nccl window register/deregister behind cuda platform#25608

[bugfix]put nccl window register/deregister behind cuda platform#25608
Amir-19 wants to merge 2 commits intovllm-project:mainfrom
Amir-19:pynccl_fix

Amir-19 commented Sep 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

gshtras Sep 24, 2025

Uh oh!

Amir-19 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Amir-19 commented Sep 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gshtras Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

Amir-19 commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Amir-19 commented Sep 24, 2025 •

edited by github-actions bot

Loading