[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend by Akshat-Tripathi · Pull Request #14238 · vllm-project/vllm

Akshat-Tripathi · 2025-03-04T21:17:25Z

This PR adds a Multi-LoRA implementation that works on the TPU backend, extending the work done in #11100, and supercedes #12623. It has a functional but unoptimised Pallas kernel implementation for the bgmv kernel.

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

…` to be called with infinities Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

NickLucche · 2025-04-27T08:36:20Z

Hi I'm having trouble with the CI here

Right now I see there's a No space left on device error which we tried to fix just recently, might be worth to give it another spin. I will review this PR asap, thanks again for the great patience and work here!

NickLucche

My suggestion is still to get this merged but wait until TPU has decent performances before enabling it.

There are a number of changes which don't impact the "TPU area" directly, in both logic and interface in LoRa specific code which have been reviewed and still provide value in setting up the landscape.

The follow up PRs can focus on changes to TpuModelRunner and pallas+punica.

NickLucche · 2025-04-29T09:58:34Z

vllm/worker/tpu_worker.py

+        if vllm_config.lora_config is not None:
+            raise NotImplementedError(
+                """The V0 TPU backend doesn't support LoRA serving, please try \
+                    V1 by setting VLLM_USE_V1=1""")
+


I think this is misleading if we decide not to enable it just yet.

Yep I'll undo that there for now.

NickLucche · 2025-04-29T10:16:45Z

Wrt to the CI errors I see here:

Lora tests failures should be unrelated, but to be safe we'll have to merge from main again and confirm everything is working.
"No space left on device" is happening during the dockerfile build, hence the fix we landed doesn't cover this case @mgoin . We either need someone with access to the instance or we try to debug in a separate PR. I would assume images from previous runs are cleared but I could be mistaken. It's weird other PRs are not reporting this.

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Akshat-Tripathi · 2025-05-01T16:51:24Z

Thanks @NickLucche I think that's fixed the TPU tests. I've merged from main but the GPU side errors are still there. What's really fun is that they're errors with the Triton kernels which shouldn't be affected by this at all

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

mergify · 2025-05-06T15:00:37Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Akshat-Tripathi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

mgoin · 2025-05-06T21:03:50Z

vllm/platforms/interface.py

+    @classmethod
+    def get_infinity_values(cls, dtype: torch.dtype) -> Tuple[float, float]:
+        """
+        Return the platform specific values for (-inf, inf)
+        """
+        return float("-inf"), float("inf")


Why do we ignore the dtype here?

mergify · 2025-05-07T01:01:39Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Akshat-Tripathi.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

yaochengji

LGTM, thanks for your great contribution!

The TPU CI test failure seems irrelevant to this PR because it also happens in 621ca2c

…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>

…llm-project#14238) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: Yuqi Zhang <yuqizhang@google.com>

Akshat-Tripathi added 30 commits March 4, 2025 21:02

Added non-triton SGMV and BGMV ops (not kernels yet)

d993de9

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Made a copy of the layer tests for the TPU. TODO: DRY it out

4f816ed

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Removed extra print

5f0355b

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Made some minor shape-based fixes to the kernels

edd02c5

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Added basic lora execution code

aff94f9

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Replaced einsums with matmuls+reshaping for better xla compilation

adfd194

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Replaced inf/-inf with max/min since XLA doesn't allow `nan_to_num_()…

816a56c

…` to be called with infinities Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Added lora config to _dummy_run()

c8a51c8

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Changed torch._dynamo config

51f929d

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Quick patch to allow non lora code to run

23d4a24

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Minor fixes

47397a7

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Replaced einsums with matmuls to allow xla compilation

456eb37

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Removed xla ops for torch ops

eabc748

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Removed old debug log points

ac9753e

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed bgmv/sgmv shape error

aa8b0fd

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed lora batching crash in warmup

124215f

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed shape issue in add_lora_linear()

e148254

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed dynamic lora tensor shapes

494b35e

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed lora_input preparation for actual execution

1dbfcd9

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed wrong model bug

1bb2578

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Moved if statements outside of for loops in PunicaWrapperTPU

ddc4cbc

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Added early exits to PunicaWrapperTPU lora functions

48a6944

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Added torch ops for tpu (Static prefill sizes)

7802e84

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

XLA bgmv operations are now imported from the default torch_ops

ab5396b

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Removed TODOs

fdf29d3

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Removed old code

c2b4139

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Linting

f31b7d1

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed import error

87ff73e

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

lint

96c3dde

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Abstracted out infinity values

4e72ede

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Fixed pallas kernel dtype in test

50e9738

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Akshat-Tripathi mentioned this pull request Apr 28, 2025

[Hardware][TPU][V1] Better tpu multilora compilation #16989

Closed

NickLucche approved these changes Apr 29, 2025

View reviewed changes

Akshat-Tripathi added 5 commits April 29, 2025 10:25

Merge branch 'main' into multi_lora_tpu_v1

4a07cf6

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Disabled LoRA serving for now

8cd5cb7

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Merge branch 'main' into multi_lora_tpu_v1

6282cd5

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Temporarily disabled the TPU lora tests

1846ef3

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Merge branch 'main' into multi_lora_tpu_v1

a006f6b

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Merge branch 'main' into multi_lora_tpu_v1

d72a86b

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

mergify bot added the needs-rebase label May 6, 2025

Merge branch 'main' into multi_lora_tpu_v1

aff7414

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

mergify bot removed the needs-rebase label May 6, 2025

mgoin approved these changes May 6, 2025

View reviewed changes

mergify bot added the needs-rebase label May 7, 2025

Akshat-Tripathi added 2 commits May 7, 2025 11:45

Fixed incorrect torch.wheres

e487ecb

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

Merge branch 'main' into multi_lora_tpu_v1

20c5981

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

mergify bot removed the needs-rebase label May 7, 2025

Lint

df67053

Signed-off-by: Akshat Tripathi <akshat@krai.ai>

yaochengji approved these changes May 7, 2025

View reviewed changes

mgoin merged commit c20ef40 into vllm-project:main May 7, 2025
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend#14238

[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend#14238
mgoin merged 151 commits intovllm-project:mainfrom
krai:multi_lora_tpu_v1

Akshat-Tripathi commented Mar 4, 2025

Uh oh!

NickLucche commented Apr 27, 2025

Uh oh!

NickLucche left a comment

Uh oh!

NickLucche Apr 29, 2025

Uh oh!

Akshat-Tripathi Apr 29, 2025

Uh oh!

NickLucche commented Apr 29, 2025

Uh oh!

Akshat-Tripathi commented May 1, 2025

Uh oh!

mergify bot commented May 6, 2025

Uh oh!

mgoin May 6, 2025

Uh oh!

mergify bot commented May 7, 2025

Uh oh!

yaochengji left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

Akshat-Tripathi commented Mar 4, 2025

Uh oh!

NickLucche commented Apr 27, 2025

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

Akshat-Tripathi Apr 29, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche commented Apr 29, 2025

Uh oh!

Akshat-Tripathi commented May 1, 2025

Uh oh!

mergify bot commented May 6, 2025

Uh oh!

mgoin May 6, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented May 7, 2025

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants