[BugFix] fix loading new adapter with added_tokens by glenliu21 · Pull Request #17794 · sgl-project/sglang

glenliu21 · 2026-01-27T05:00:28Z

Motivation

Fixes #17096.

Modifications

Update LoRAMemoryPool's lora_added_tokens_size after loading in a new adapter that adds tokens to the vocabulary.

Accuracy Tests

Added to test_lora_update.py

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-27T05:00:32Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

python/sglang/srt/lora/lora_manager.py

glenliu21 · 2026-01-29T03:34:35Z

/gemini review

gemini-code-assist · 2026-01-29T03:34:37Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

glenliu21 · 2026-01-29T13:20:28Z

/gemini review

gemini-code-assist

Code Review

This pull request addresses a bug where lora_added_tokens_size was not updated when a new LoRA adapter with added tokens was loaded after server initialization. The changes correctly introduce a mechanism to update this size and re-initialize the memory pool accordingly. The addition of a new test case to verify this fix is also a good improvement.

I've identified a critical issue in the implementation that could lead to a server crash during startup if a LoRA adapter with added tokens is provided initially. I've also pointed out a potential logic issue in how different lora_added_tokens_size values from multiple adapters are handled, which could lead to silent failures. Please see my detailed comments.

python/sglang/srt/lora/lora_manager.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

yushengsu-thu · 2026-01-30T05:46:18Z

/tag-and-rerun-ci

Fridge003 · 2026-01-31T01:24:25Z

python/sglang/srt/lora/lora_manager.py

+
+        # Some LoRA adapters are loaded before the memory pool is initialized
+        if hasattr(self, "memory_pool"):
+            self.init_memory_pool()


Will this operation delete the earlier adaptors in GPU memory? If so I feel it's risky

I think theoretically that should be fine because adapters will just be reloaded on the next forward pass.

extra tokens only affects embedding memory pool. Can we only reinitialize embedding part?

glenliu21 · 2026-01-31T06:01:39Z

The test I added where adapters are loaded on server start fails even on our existing implementation.

Unless it is the case that we don't want users to load adapters with different added_tokens sizes (which seems unlikely) then the existing implementation is broken and needs to be fixed.

vedantjh2 · 2026-02-13T20:52:30Z

Do we have an ETA for this fix? #18046 was supposed to be temporary iirc.

Fridge003 · 2026-04-01T06:34:49Z

Fixed by #17905

fix loading new adapter with added_tokens

65df2ee

glenliu21 requested review from Fridge003, Ying1123 and lifuhuang as code owners January 27, 2026 05:00

github-actions bot added the lora label Jan 27, 2026

glenliu21 commented Jan 27, 2026

View reviewed changes

python/sglang/srt/lora/lora_manager.py Outdated Show resolved Hide resolved

yushengsu-thu self-assigned this Jan 27, 2026

glenliu21 added 2 commits January 27, 2026 21:28

use init_memory_pool

f4b8662

remove infer code

5a0441d

gemini-code-assist bot reviewed Jan 29, 2026

View reviewed changes

python/sglang/srt/lora/lora_manager.py Outdated Show resolved Hide resolved

python/sglang/srt/lora/lora_manager.py Show resolved Hide resolved

hasattr fix

e33fcce

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

yushengsu-thu added the run-ci label Jan 30, 2026

glenliu21 added 2 commits January 30, 2026 08:17

Merge branch 'main' into lora_added_tokens_fix

226149a

Merge branch 'main' into lora_added_tokens_fix

27f6562

Fridge003 reviewed Jan 31, 2026

View reviewed changes

add more to test

2c499e8

glenliu21 mentioned this pull request Jan 31, 2026

throw error if got adapter with added_tokens #18046

Merged

2 tasks

glenliu21 marked this pull request as draft February 1, 2026 02:02

Fridge003 closed this Apr 1, 2026

glenliu21 deleted the lora_added_tokens_fix branch April 4, 2026 17:24

Conversation

glenliu21 commented Jan 27, 2026

Motivation

Modifications

Accuracy Tests

Checklist

Review Process

Uh oh!

gemini-code-assist bot commented Jan 27, 2026

Uh oh!

Uh oh!

glenliu21 commented Jan 29, 2026

Uh oh!

gemini-code-assist bot commented Jan 29, 2026

Uh oh!

glenliu21 commented Jan 29, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yushengsu-thu commented Jan 30, 2026

Uh oh!

Fridge003 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

glenliu21 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

Fridge003 Jan 31, 2026

Choose a reason for hiding this comment

Uh oh!

glenliu21 commented Jan 31, 2026

Uh oh!

vedantjh2 commented Feb 13, 2026

Uh oh!

Fridge003 commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants