[RFC]: Optimizing the Expand and Shrink Algorithms for LoRA Inference in Dense Models

### Motivation.

During multi-LoRA performance testing on NVIDIA H800 GPUs, we identified performance bottlenecks in the original LoRA expand and shrink operators during dense model inference.

### Proposed Change.

By optimizing the execution branches of the relevant algorithms, we aim to enhance the overall GPU execution concurrency and improve operational efficiency.We have identified an optimization strategy and plan to implement it within one week.

### Feedback Period.

_No response_

### CC List.

_No response_

### Any Other Things.

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[RFC]: Optimizing the Expand and Shrink Algorithms for LoRA Inference in Dense Models #28190

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[RFC]: Optimizing the Expand and Shrink Algorithms for LoRA Inference in Dense Models #28190

Description

Motivation.

Proposed Change.

Feedback Period.

CC List.

Any Other Things.

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions