[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend#15655
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend#15655yaochengji merged 268 commits intovllm-project:mainfrom
Conversation
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…because xla doesn't allow partial updates Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
This reverts commit b78b088. Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
yaochengji
left a comment
There was a problem hiding this comment.
LGTM, thanks for the contribution!
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
NickLucche
left a comment
There was a problem hiding this comment.
Nice work optimizing lora here! Just had some minor notes, please take a look when you find the time. Otherwise we can address them in a separate PR if needs be.
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Head branch was pushed to by a user without write access
NickLucche
left a comment
There was a problem hiding this comment.
LGTM, sorry for delaying the merge a bit! Let's get this landed today.
No worries! Yep I'm hoping once these tests pass we can merge it in. Would you mind re-enabling auto-merge? |
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
…llm-project#15655) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Signed-off-by: xihajun <junfan@krai.ai> Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Signed-off-by: Jorge de Freitas <jorge@krai.ai> Co-authored-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: xihajun <junfan@krai.ai> Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk> Co-authored-by: Jorge de Freitas <jorge@krai.ai> Signed-off-by: amit <amit.man@gmail.com>
I have a few questions about the data published in this PR:
I also have a few questions about the "Hot Swapping" and "Compare Multi-LoRAs" tabs in this link: https://insights.krai.ai/benchmarking-multi-lora
|
Hi @amanocha thanks for your interest.
As for the questions about the website.
|





Summary
This PR optimises the Multi-LoRA implementation from #14238. This one should be merged in after it.
This includes several kernel optimisations:
And a few general ones:
expandop a82f3feThings left/RFC
LogitsProcessorWithLoRAintroduces a long (~1.5 second) stall when it's enabled, but not much activity seems to happen on the CPU or TPU during this time. I've disabled this for now.LogitsProcessorWithLoRAis always created even if there's no LoRA adapter that needs it, is there a reason for this?