[Feature] Support dynamic loading and unloading of Lora adapters #2891

Fridge003 · 2025-01-14T20:26:08Z

Motivation

This PR aims to implement the dynamic LoRA feature mentioned in #2686.
This PR is still under developing, please comment if the code can be improved.

Modifications

Current implementation of LoRA modules

Current LoRA features are implemented under folder python/sglang/srt/lora, where three files lora.py, lora_manager.py, lora_config.py are included. Initial support can be referred to #1307.

In the __init__ function of ModelRunner, a LoraManager will be created if a valid lora_path is passed in server_args. The initialization of LoraManager contains two parts: first calling init_loras to load huggingface LoRA weights to CPU and replace the targeted layers with BaseLayerWithLoRA instances, then calling init_lora_memory_pool to preallocate the memory pool for S-Lora. The definition of lora modules in lora.py are implemented on the basis of vllm implementation.

Before forwarding the batch, LoraManager will call prepare_lora_batch method to load active Lora adapters from memory pool. During loading, lora weight not used in current batch can be evicted from buffer if necessary.

Unit tests are put under test/srt/models/test_lora.py. The test for inference can be passed, but the test for serving is skipped, so the feature of LoRA serving might require further check. The benchmark codes can be found in benchmark/lora/lora_bench.py.

Implementation of dynamic serving LoRA

Dynamical serving LoRA means the lora adapters can be loaded and unloaded at users' commands during server runtime. This feature has been supported in vllm (vllm lora doc). As mentioned in #1433, current implementation supports multi-lora serving, but the loading and unloading of Lora modules can only be done at initialization of servers.

The design of loading and unloading LoRA at API side can be similar to update_weights_from_disk API, since both of their behavior is changing the weights that current server is running on. In this design, the two APIs are named as load_lora_adapter and unload_lora_adapter as in vllm.

After the user send the LoadLoraAdapterReq/UnloadLoraAdapterReq request to server, the serve will grab a write lock and wait for the request in progress to be finished. Then, the request will be transmitted to ModelRunner through several passes, and be handled by LoraManager owned by ModelRunner.

At LoraManager side, the loading of new lora adapter just follows the process of initialization: collecting new target modules, initialize new lora weights on CPU, and if open new space in memory buffer if needed.

The implementation of unloading and testing scripts to be done...

Checklist

Format your code according to the Contributor Guide.
Add unit tests as outlined in the Contributor Guide.
Update documentation as needed, including docstrings or example tutorials.

python/sglang/srt/managers/scheduler.py

mitchklusty · 2025-03-05T13:38:56Z

Hi. I'm looking forward to this feature getting added. Any updates on progress? Thanks.

Fridge003 · 2025-03-05T17:16:05Z

Hi. I'm looking forward to this feature getting added. Any updates on progress? Thanks.

@mitchklusty thanks for noticing. We are recently supporting other Lora features such as unified paging and tensor parallelism, which will cause huge changes to Lora codes. I'm afraid the feature of dynamic loading/unloading have to wait for these features, so mass conflicts can be avoided. Really sorry for that.

mitchklusty · 2025-03-05T17:36:48Z

@Fridge003 That's ok, I completely understand. Any idea on a rough timeline for when it might get implemented or is it still too early to say?

Fridge003 · 2025-03-05T17:52:47Z

@Fridge003 That's ok, I completely understand. Any idea on a rough timeline for when it might get implemented or is it still too early to say?

We have added this feature to half year plan, so it should be implemented before end of June.

If everything goes smooth, it can be done before end of April ideally

mitchklusty · 2025-03-05T19:36:11Z

We have added this feature to half year plan, so it should be implemented before end of June.

If everything goes smooth, it can be done before end of April ideally

Ok, great! Thanks for adding this feature,

binarycrayon · 2025-03-14T18:21:35Z

excited to see this

github-actions · 2025-05-30T08:31:19Z

This pull request has been automatically closed due to inactivity. Please feel free to reopen it if needed.

kdduha · 2025-06-05T07:56:10Z

hi! so what's the progress? we're really waiting for this feature

Fridge003 · 2025-06-08T01:27:42Z

hi! so what's the progress? we're really waiting for this feature

Sorry for keep you waiting... We are in short of developers, and I'm also really busy with other tasks.
But @lifuhuang will take over this feature, he will work on it during June.

kdduha · 2025-06-08T08:17:36Z

Sorry for keep you waiting... We are in short of developers, and I'm also really busy with other tasks.
But @lifuhuang will take over this feature, he will work on it during June.

thx! Just wanted to clarify - this merge request will be closed and we need to wait for an another one from @lifuhuang, right? Mb we can help somehow to speed up the process? It seems like the main changes in the code are already have been done

lifuhuang · 2025-06-08T21:36:20Z

Hi @kdduha, I discussed with @Fridge003 offline, from what I learned, the change in this PR was branched off main in Jan so it has been somewhat outdated due to the changes introduced over the past months, so indeed we would need a separate PR.

I plan to start working on this feature roughly in a week after wrapping up something small task I have and should be able to finish in June. But if you are interested in collaborating or taking a stab yourself, let me know, you can find me in Slack (Lifu).

Fridge003 · 2025-06-22T21:29:32Z

@mitchklusty @binarycrayon @kdduha
This feature will be supported after #7446 is merged, please stay tuned

lifuhuang · 2025-06-26T06:51:58Z

@mitchklusty @binarycrayon @kdduha , currently #7446 is still in review, but I have added usage descriptions. Please feel free to checkout to this branch and test it out and let me and @Fridge003 know if you have any feedbacks or questions.

Please also be aware of the few usability limitations of this first version, I am working on addressing them at the moment and will send separate PRs after the current one is merged. Stay tuned 🥂

Fridge003 added 3 commits January 13, 2025 21:04

server side implementation of lora loading

86b454a

Implement lora loading for LoraManager

cdbe076

Reformatting

94c2015

This was referenced Jan 16, 2025

[Roadmap] Lora Support #2929

Open

[Feature] Dynamic Lora Support in SGLang #2686

Closed

Edenzzzz reviewed Jan 23, 2025

View reviewed changes

python/sglang/srt/managers/scheduler.py Show resolved Hide resolved

Fridge003 added lora feature labels Feb 9, 2025

github-actions bot closed this May 30, 2025

github-actions bot added the inactive label May 30, 2025

lifuhuang mentioned this pull request Jun 22, 2025

Support dynamic LoRA loading / unloading in engine/server API #7446

Merged

6 tasks

lifuhuang assigned lifuhuang and unassigned lifuhuang Jun 26, 2025

Fridge003 deleted the dynamic_lora branch June 29, 2025 01:03

[Feature] Support dynamic loading and unloading of Lora adapters #2891

[Feature] Support dynamic loading and unloading of Lora adapters #2891

Uh oh!

Conversation

Fridge003 commented Jan 14, 2025

Motivation

Modifications

Current implementation of LoRA modules

Implementation of dynamic serving LoRA

Checklist

Uh oh!

Uh oh!

mitchklusty commented Mar 5, 2025

Uh oh!

Fridge003 commented Mar 5, 2025

Uh oh!

mitchklusty commented Mar 5, 2025

Uh oh!

Fridge003 commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitchklusty commented Mar 5, 2025

Uh oh!

binarycrayon commented Mar 14, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

kdduha commented Jun 5, 2025

Uh oh!

Fridge003 commented Jun 8, 2025

Uh oh!

kdduha commented Jun 8, 2025

Uh oh!

lifuhuang commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fridge003 commented Jun 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lifuhuang commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Fridge003 commented Mar 5, 2025 •

edited

Loading

lifuhuang commented Jun 8, 2025 •

edited

Loading

Fridge003 commented Jun 22, 2025 •

edited

Loading