[Platform] Add current_platform.num_compute_units interface#35042
[Platform] Add current_platform.num_compute_units interface#35042vllm-bot merged 10 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The pull request introduces a new get_num_sm interface to abstract the retrieval of SM (Streaming Multiprocessor) counts across different platforms (CUDA, ROCm, XPU). This change unifies the way SM counts are obtained, making the codebase cleaner and more extensible for non-CUDA hardware. The existing torch.cuda.get_device_properties().multi_processor_count calls are replaced with the new platform-agnostic interface. This is a good step towards improving platform independence and maintainability.
|
Thanks @jikunshang! LGTM but wonder if we should name it something more generic since SM is a CUDA term AFAIK Maybe |
|
Hi @jikunshang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
@njhill seems |
|
Hi @jikunshang, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
njhill
left a comment
There was a problem hiding this comment.
I prefer to rename to
get_num_compute_units, thoughts?
How about just compute_unit_count, since there is already device_count?
Or if not then I think just num_compute_units would be better
| if allspark_supported: | ||
| properties = torch.cuda.get_device_properties(b.device.index) | ||
| sm_count = properties.multi_processor_count | ||
| sm_count = current_platform.get_num_compute_units(b.device.index) |
There was a problem hiding this comment.
Probably doesn't make sense to change in places like this which are cuda-specific anyway
There was a problem hiding this comment.
I think it's ok to always use this current_platfrom API here. we should avoid/reduce using torch.cuda APIs since we are proposing this RFC(#30679). (though non-cuda platform will never fall into current code path)
my understanding here is it will use property to check cuda capability later. I feel we can also refactor this part into something like current_platform.supported_arch
There was a problem hiding this comment.
I just think in places like this we are already using torch.cuda.get_device_properties (line above) so it's better to keep the code consistent (like you say, it can always be refactored for portability in future if/when appropriate).
Similarly if it's in cuda or rocm or xpu-specific files/code then I don't think there's a need to use the current_platform version, and actually maybe better not to since it implies that code could be cross-platform which could be misleading.
There was a problem hiding this comment.
got it. reverted:)
I feel count is for something countable and you can control use it or not.. while we always want to use all compute units. so let's use |
|
Also remove |
nice catch, done! |
|
did this PR break Distributed Tests (2 GPus)? |
torch.cuda.current_device() returns an int directly, not a device object with an .index attribute. This was introduced in vllm-project#35042. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
oh yes. sorry for that and thanks @LucasWilkinson for fixing |
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: xjx <493337577@qq.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com>
…ject#35042) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
…ect#35042 Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Monishver Chandrasekaran <monishverchandrasekaran@gmail.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Nithin Chalapathi <nithin.ch10@gmail.com>
…ect#35042 (vllm-project#37764) Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Purpose
there are some
torch.cuda.get_device_properties().multi_processor_countacross vllm code base. we can unify it intocurrent_platform.num_compute_unitsinterface to make it clean and extensible for non-cuda hardware like xpu and npu.Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.