[LoRA] Support dual CUDA streams-Linear Layer by jeejeelee · Pull Request #35721 · vllm-project/vllm

jeejeelee · 2026-03-02T08:10:15Z

Purpose

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee · 2026-03-02T08:11:55Z

WIP. Triggering CI to test for any uncovered cases.

gemini-code-assist

Code Review

This pull request introduces support for dual CUDA streams to enable overlapping base layer and LoRA computations, which is a great performance optimization. The implementation correctly sets up a custom PyTorch operation and an auxiliary stream. However, I've found a critical issue in the asynchronous implementation that serializes the computations, defeating the purpose of using dual streams. My review includes a comment with a suggested fix for this issue. The other changes related to plumbing for this feature seem correct.

gemini-code-assist · 2026-03-02T08:13:47Z

+            # LoRA stream waits for base layer output before reading.
+            self._lora_stream.wait_stream(current_stream())


The comment on line 212 is incorrect, and the wait_stream call on line 213 introduces a serialization point that prevents the intended overlap between the base layer and LoRA computations. The LoRA computation depends on the input x, not the output of the base layer. The wait on the current stream here forces the LoRA computation to wait until the base layer computation is complete, defeating the purpose of using a separate stream. Removing this wait will allow the two computations to run in parallel as intended.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-03-08T16:19:22Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

mergify · 2026-03-31T03:28:23Z

Hi @jeejeelee, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify · 2026-04-01T02:28:56Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @jeejeelee.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jhaotingc · 2026-04-01T18:39:53Z

Hi @jeejeelee, will you extend this to MoE LoRA layers in the future? Thanks!

varun-sundar-rabindranath

Thanks @jeejeelee . left some comments - generally looks good to me. Will take another look tomorrow.

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee · 2026-04-03T04:28:05Z

@jhaotingc Yeah, I will

varun-sundar-rabindranath · 2026-04-09T02:59:15Z

    VLLM_MEMORY_PROFILER_ESTIMATE_CUDAGRAPHS: bool = False
    VLLM_NIXL_EP_MAX_NUM_RANKS: int = 32
    VLLM_XPU_ENABLE_XPU_GRAPH: bool = False
+    VLLM_LORA_ENABLE_DUAL_STREAM: bool = False


a lot of changes in the file appear to be linting - is it maybe a linter version mismatch ? can you check please. Thanks.

varun-sundar-rabindranath

LGTM ! Thanks @jeejeelee .

Left a comment on linting in envs.py. PTAL. Thanks 🙌

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee added 2 commits March 2, 2026 08:08

Init

e1b83e7

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into lora-dual-stream

193bbd1

mergify bot added the nvidia label Mar 2, 2026

github-project-automation bot added this to NVIDIA Mar 2, 2026

jeejeelee added ready ONLY add when PR is ready to merge/full CI is needed and removed nvidia labels Mar 2, 2026

mergify bot added the nvidia label Mar 2, 2026

Merge branch 'main' into lora-dual-stream

ac348b2

gemini-code-assist bot reviewed Mar 2, 2026

View reviewed changes

Move forward

ee4cbd8

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee requested review from ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners March 2, 2026 12:17

Merge branch 'main' into lora-dual-stream

668066b

jeejeelee marked this pull request as draft March 2, 2026 13:18

jeejeelee removed the ready ONLY add when PR is ready to merge/full CI is needed label Mar 2, 2026

jeejeelee added 6 commits March 2, 2026 13:19

Fix

956a551

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Move

23a2758

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Move

f09723b

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'vllm-project:main' into lora-dual-stream

1ff9932

Merge branch 'main' into lora-dual-stream

d893daa

Test

0f8a59d

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

jeejeelee added 3 commits March 31, 2026 06:49

FMT

fd759ae

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

FIX

13a3d23

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

Merge branch 'main' into lora-dual-stream

691e0f4

mergify bot added the needs-rebase label Apr 1, 2026

varun-sundar-rabindranath reviewed Apr 1, 2026

View reviewed changes

Update

a57dedb

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

mergify bot removed the needs-rebase label Apr 3, 2026

Merge branch 'main' into lora-dual-stream

cfd110b

jeejeelee requested a review from varun-sundar-rabindranath April 4, 2026 02:23

Merge branch 'main' into lora-dual-stream

619703a

varun-sundar-rabindranath reviewed Apr 9, 2026

View reviewed changes

varun-sundar-rabindranath approved these changes Apr 9, 2026

View reviewed changes

jeejeelee added 2 commits April 9, 2026 16:28

Merge branch 'main' into lora-dual-stream

3e33425

Merge branch 'main' into lora-dual-stream

c2dbc3c

DarkLight1337 approved these changes Apr 13, 2026

View reviewed changes

github-project-automation bot moved this to Ready in NVIDIA Apr 13, 2026

DarkLight1337 merged commit 715681c into vllm-project:main Apr 13, 2026
57 checks passed

github-project-automation bot moved this from Ready to Done in NVIDIA Apr 13, 2026

jeejeelee deleted the lora-dual-stream branch April 13, 2026 02:58

wojciech-wais pushed a commit to wojciech-wais/vllm that referenced this pull request Apr 13, 2026

[LoRA] Support dual CUDA streams-Linear Layer (vllm-project#35721)

c2c148f

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

panpan0000 mentioned this pull request Apr 14, 2026

Introduce De-dup/Similarity-Check in CI Workflow for PR/Issue #39695

Open

5 tasks

lisp19 pushed a commit to lisp19/vllm that referenced this pull request Apr 20, 2026

[LoRA] Support dual CUDA streams-Linear Layer (vllm-project#35721)

0f7eb30

Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>

		# LoRA stream waits for base layer output before reading.
		self._lora_stream.wait_stream(current_stream())

Uh oh!

Conversation

jeejeelee commented Mar 2, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

jeejeelee commented Mar 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 8, 2026

Uh oh!

mergify bot commented Mar 31, 2026

Uh oh!

mergify bot commented Apr 1, 2026

Uh oh!

jhaotingc commented Apr 1, 2026

Uh oh!

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeejeelee commented Apr 3, 2026

Uh oh!

varun-sundar-rabindranath Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jeejeelee commented Mar 2, 2026 •

edited by github-actions bot

Loading