[XPU] Fix the bug of LoRA logits on the XPU platform by chaojun-zhang · Pull Request #24081 · vllm-project/vllm

chaojun-zhang · 2025-09-02T06:47:07Z

Purpose

Fix the bug of LoRA logits on the XPU platform

Test Plan

VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 examples/offline_inference/multilora_inference.py

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces support for multi-LoRA on the XPU platform. The changes are well-targeted and align with existing implementations for other backends like TPU. Key changes include updating the LoRA layers to handle XPU-specific logic, modifying the Punica wrapper for XPU to correctly handle sampler indices, and disabling torch.compile for LoRA on XPU as a safe guard. I've identified a correctness issue in add_lora_logits which this PR fixes, ensuring the sampler indices have the correct size. Overall, the changes look good and are a solid step towards enabling multi-LoRA on XPU.

gemini-code-assist · 2025-09-02T06:52:33Z

vllm/lora/punica_wrapper/punica_xpu.py

The change from self.sampler_indices to a sliced version of self._sampler_indices is a crucial correctness fix. The bgmv_shrink and bgmv_expand kernels expect the lora_indices_tensor to match the number of tokens in the input tensor x. The original implementation using self.sampler_indices could lead to a size mismatch, as its length is based on the number of sequence groups, not tokens. This change ensures the indices tensor has the correct size, making the implementation more robust and correct. It also aligns this logic with the TPU backend implementation.

jeejeelee

The changes are relatively localized. I assume you have already verified it on XPU.

Signed-off-by: chzhang <chaojun.zhang@intel.com>

* 'main' of https://github.com/845473182/vllm: (457 commits) [BugFix] Fix routed_scaling_factor double mul for dots1 and glm4 MoE models (vllm-project#24132) [Misc] Add check for dual_chunk_attention (vllm-project#24070) [Doc]: fix typos in Python comments (vllm-project#24115) [Doc]: fix typos in Python comments (vllm-project#24093) [Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (vllm-project#23660) fix some typos (vllm-project#24071) [V1] Wrapper which plumbs request-level logits processors into vLLM batch-level logits processing (vllm-project#23656) Upgrade xgrammar to 0.1.23 (vllm-project#22988) Update release pipeline post PyTorch 2.8.0 update (vllm-project#24073) [XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081) [CI/Build] Disable SiluMul NVFP4 quant fusion tests (vllm-project#24121) [Bug] R1 Accuracy: Fix `routed_scaling_factor` Double Mul Issue (vllm-project#24119) [AMD][Kernel][Bugfix] Cast offsets tensor bn to tl.int64 to avoid GPU segfault (vllm-project#23692) [CI] Enable all hf transformers baselines in test_hybrid (vllm-project#23936) [Log] Only Print Profiler Results on Rank 0 (vllm-project#23370) Fix weights loading for Apertus (vllm-project#24100) [Metrics] Deprecate TPOT in favor of ITL (vllm-project#24110) [Bugfix] Fix packed_factor missing attribute error (vllm-project#23902) Run ruff format on a few files. (vllm-project#24075) [Bugfix] Fix transform_config parsing in Compressed Tensors (vllm-project#23945) ...

) Signed-off-by: chzhang <chaojun.zhang@intel.com>

chaojun-zhang requested review from jeejeelee and jikunshang as code owners September 2, 2025 06:47

gemini-code-assist bot reviewed Sep 2, 2025

View reviewed changes

chaojun-zhang changed the title ~~[XPU] Support multi lora on XPU platform~~ [XPU] Fix the bug of LoRA logits on the XPU platform Sep 2, 2025

chaojun-zhang force-pushed the multilora_infence branch from e34c255 to e3a06ce Compare September 2, 2025 06:53

jeejeelee approved these changes Sep 2, 2025

View reviewed changes

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 2, 2025

jikunshang approved these changes Sep 2, 2025

View reviewed changes

Fix the bug of LoRA logits on the XPU platform

6b43ac1

Signed-off-by: chzhang <chaojun.zhang@intel.com>

chaojun-zhang force-pushed the multilora_infence branch from 72cabcf to 6b43ac1 Compare September 2, 2025 13:27

jikunshang merged commit 862f2ef into vllm-project:main Sep 3, 2025
42 checks passed

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081

232fc8c

) Signed-off-by: chzhang <chaojun.zhang@intel.com>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[XPU] Fix the bug of LoRA logits on the XPU platform (vllm-project#24081

1f12788

) Signed-off-by: chzhang <chaojun.zhang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Fix the bug of LoRA logits on the XPU platform#24081

[XPU] Fix the bug of LoRA logits on the XPU platform#24081
jikunshang merged 1 commit intovllm-project:mainfrom
chaojun-zhang:multilora_infence

chaojun-zhang commented Sep 2, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 2, 2025

Uh oh!

jeejeelee left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

chaojun-zhang commented Sep 2, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

jeejeelee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chaojun-zhang commented Sep 2, 2025 •

edited by github-actions bot

Loading