[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules by bhoomit · Pull Request #34984 · vllm-project/vllm

bhoomit · 2026-02-20T19:47:46Z

Purpose

Add deployment-time control over which model modules have LoRA applied via a new --lora-target-modules CLI parameter and LoRAConfig.target_modules field.

This accepts module suffixes (e.g., o_proj, qkv_proj) and restricts LoRA application to only those modules, useful for performance tuning. When not specified, all supported LoRA modules are used (existing behavior).

Usage

vllm serve model --enable-lora --lora-target-modules o_proj qkv_proj

Changes

vllm/config/lora.py: Add target_modules field to LoRAConfig
vllm/engine/arg_utils.py: Add --lora-target-modules CLI argument
vllm/lora/model_manager.py: Filter modules in _match_target_modules
docs/features/lora.md: Document the new parameter
Tests: CLI arg parsing and LoRAModelManager unit tests

Benchmark: `--lora-target-modules` Latency Impact

Configuration

Parameter	Value
Model	Qwen/Qwen3-32B (bf16)
GPU	NVIDIA H200 (143 GB) × 1
LoRA rank	16
vLLM version	0.16.0rc2.dev258
Torch version	2.10.0+cu129
LoRA adapter	Random weights (PEFT format, all 64 layers)

Serving config: input_len=256, output_len=128, num_prompts=32, request_rate=2 req/s

Baseline = adapter only contains weights for the target modules, no --lora-target-modules flag.
With TM = full adapter (all 4 modules) + --lora-target-modules restricts at engine level.

Results with CUDA graphs + torch.compile (production mode)

TTFT (ms)

Subset	Baseline	With TM	Δ
all	94.6	92.1	−2.6%
qkv_proj	94.2	74.3	−21.1%
o_proj	92.2	74.2	−19.5%
gate_up_proj+down_proj	95.8	86.6	−9.6%

TPOT (ms)

Subset	Baseline	With TM	Δ
all	23.4	23.3	−0.1%
qkv_proj	23.4	20.6	−11.9%
o_proj	23.4	20.2	−13.6%
gate_up_proj+down_proj	23.4	21.8	−6.8%

Results with enforce_eager

TTFT (ms)

Subset	Baseline	With TM	Δ
all	206.2	216.3	+4.9%
qkv_proj	214.3	123.1	−42.6%
o_proj	205.1	122.9	−40.1%
gate_up_proj+down_proj	216.9	154.8	−28.6%

TPOT (ms)

Subset	Baseline	With TM	Δ
all	71.4	73.4	+2.7%
qkv_proj	72.3	38.1	−47.3%
o_proj	70.4	38.5	−45.3%
gate_up_proj+down_proj	74.8	50.1	−33.0%

Key takeaways

No overhead when all modules active (<1% noise)
CUDA graph mode: up to 14% TPOT reduction, 21% TTFT reduction for single-module restriction
Eager mode: up to 47% TPOT reduction for single-module configs
Adapter-level restriction is ineffective — vLLM wraps all supported modules regardless. --lora-target-modules skips wrapping entirely.

mergify · 2026-02-20T19:48:24Z

Documentation preview: https://vllm--34984.org.readthedocs.build/en/34984/

gemini-code-assist

Code Review

This pull request introduces the --lora-target-modules CLI parameter and LoRAConfig.target_modules field, allowing users to restrict LoRA application to specific model modules at deployment time. This is a valuable feature for performance tuning. The implementation correctly integrates the new configuration into the engine and model manager. However, I have identified a critical logic bug in the vocab size validation for the logits processor and a performance/flexibility issue in the module matching logic that should be addressed.

I am having trouble creating individual review comments. Click here to see my feedback.

vllm/lora/layers/logits_processor.py (91-94)

The validation logic here uses a comparison chain 32000 < self.base_layer.vocab_size > 258048 which is logically equivalent to self.base_layer.vocab_size > 258048 (since 258048 > 32000). Furthermore, the error message 32000 >= vocab_size <= 258048 is mathematically confusing and likely incorrect, as it implies vocab_size must be less than or equal to 32000. If the intent is to enforce an upper bound of 258048, the logic should be simplified and the message clarified.

        if self.base_layer.vocab_size > 258048:
            raise ValueError(
                f"When using LoRA, vocab size must be <= 258048, "
                f"but found {self.base_layer.vocab_size}"
            )

vllm/lora/model_manager.py (571-572)

The current implementation of target_modules matching is too restrictive and inefficient.

Restrictiveness: By only checking the last component of the module name (split(".")[-1]), users cannot target specific layers or sub-paths (e.g., layers.0.self_attn.o_proj). This is inconsistent with how supported_lora_modules are matched and how PEFT's target_modules usually work.
Performance: Creating a set() from self.lora_config.target_modules inside this method is inefficient because _match_target_modules is called in a loop for every module in the model during initialization and warmup.

I suggest using a matching logic consistent with the is_supported check above, which also avoids the redundant set creation.

        return any(
            module_name.endswith(f".{target}") or module_name == target
            for target in self.lora_config.target_modules
        )

mergify · 2026-02-20T19:58:38Z

Hi @bhoomit, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

dcmaddix · 2026-02-20T20:09:15Z

Thanks @bhoomit. Looks good to me. Do we want similar logic for MoE-LoRA models?

bhoomit · 2026-02-20T21:10:40Z

Thanks @bhoomit. Looks good to me. Do we want similar logic for MoE-LoRA models?

It should already work, as long as the TM identifier, is last part of the layer identifier.

e.g.

x.y.o_proj -> o_proj

mergify · 2026-02-20T21:15:43Z

Hi @bhoomit, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

dcmaddix · 2026-02-20T21:28:46Z

Thanks @bhoomit. Looks good to me. Do we want similar logic for MoE-LoRA models?

It should already work, as long as the TM identifier, is last part of the layer identifier.

e.g.

x.y.o_proj -> o_proj

Yes that naming is the same but MoE has other target parameters, e.g., gate_up_proj and down_proj

bhoomit · 2026-02-20T21:31:51Z

Thanks @bhoomit. Looks good to me. Do we want similar logic for MoE-LoRA models?

It should already work, as long as the TM identifier, is last part of the layer identifier.
e.g.
x.y.o_proj -> o_proj

Yes that naming is the same but MoE has other target parameters, e.g., gate_up_proj and down_proj

@dcmaddix Yes, its tested for those two as well. It will work as expected.

cjackal · 2026-02-21T11:09:36Z

Related PR and some previous discussion points are at #31452 FYI. As mentioned there, it would be ideal to accept layer indices, not only the module types I think.

bhoomit · 2026-02-22T07:35:09Z

Related PR and some previous discussion points are at #31452 FYI. As mentioned there, it would be ideal to accept layer indices, not only the module types I think.

Thanks @cjackal for taking a look and adding reference to related PR.

I went through the discussion.

While I agree that adding layer indices (and regex support) would add more flexibility, it would also make DX/UX a bit more complicated for simple use cases. I believe "--lora-target-modules" should focus on basic use case, which might cater to large portion of users. For regex and layer indices, we can think of implementing --lora-target-parameters or --lora-targets-with-pattern. WDYT?
I see that this implementation is missing warning when an adapter with "unsupported" targets are passed. As described here - [LoRA] Add --lora-target-modules to selectively apply LoRA layers #31452 (comment). I will send an update.

cjackal · 2026-02-22T12:41:48Z

While I agree that adding layer indices (and regex support) would add more flexibility, it would also make DX/UX a bit more complicated for simple use cases. I believe "--lora-target-modules" should focus on basic use case, which might cater to large portion of users. For regex and layer indices, we can think of implementing --lora-target-parameters or --lora-targets-with-pattern. WDYT?

I also love the agile way, we may go ahead to cover the basic usecase and extend further later. I'd mentioned the PR mostly because this feature is largely a matter of UX design (implementation-wise, all the building blocks to support selective lora targets are already there and the size of diff is also pretty small - not that tough to go at anytime) and @jeejeelee seems to have some opinion on the design.

mergify · 2026-02-23T21:23:36Z

Hi @bhoomit, the pre-commit checks have failed. Please run:

uv pip install pre-commit
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy or markdownlint failing?

mypy and markdownlint are run differently in CI. If the failure is related to either of these checks, please use the following commands to run them locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10
# For markdownlint
pre-commit run --hook-stage manual markdownlint

…odules Add deployment-time control over which model modules have LoRA applied via a new --lora-target-modules CLI parameter and LoRAConfig.target_modules field. This accepts module suffixes (e.g., o_proj, qkv_proj) and restricts LoRA application to only those modules, useful for performance tuning. When not specified, all supported LoRA modules are used (existing behavior). Changes: - vllm/config/lora.py: Add target_modules field to LoRAConfig - vllm/engine/arg_utils.py: Add --lora-target-modules CLI argument - vllm/lora/model_manager.py: Filter modules in _match_target_modules - docs/features/lora.md: Document the new parameter - tests: CLI arg parsing and LoRAModelManager unit tests Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

…dules Add a warning_once in _load_adapter when a LoRA adapter contains modules not in the model's supported LoRA target modules. These parameters would be silently ignored, which may cause unexpected model behavior. The warning helps users identify misconfigured adapters early. Also adds a unit test that verifies the warning is emitted. Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

…rning Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

…educe test duplication Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

varun-sundar-rabindranath · 2026-03-02T23:48:32Z

+                if not any(
+                    module_name.endswith(f".{suffix}")
+                    for suffix in supported_lora_modules
+                ):


it looks like the if statment above should use the matching logic from _match_target_modules for consistency ?

is it true or am I missing something ?

_match_target_modules checks more than the new feature "lora-target-modules".

Check if the TM is in supported_lora_modules of the model

Check if the TM is in "lora-target-modules" (new feature)

I see the inconsistency, working on making it better.

Addressed the concern in latest commit.

…s and add target_modules warning - Replace endswith() check with split('.')[-1] suffix matching in the unsupported module warning, consistent with _match_target_modules - Add a second warning when an adapter module is excluded by the deployment-time target_modules restriction (previously silent) - Add test_load_adapter_warns_on_target_modules_restriction to cover the new warning path - Refactor _test_target_modules helper to accept expected_lora and expected_no_lora assertion lists, moving asserts into the helper Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

varun-sundar-rabindranath · 2026-03-05T17:45:55Z

+                        module_name,
+                        lora_request.lora_path,
+                        ", ".join(sorted(target_modules)),
+                    )


Thanks for the refactor @bhoomit . can we introduce a utility to do this check ? something like,

# in a utils file. def is_module_supported(module_name, supported_lora_modules, target_modules) -> bool: ... # model_manager.py def _match_target_modules(self, module_name: str) -> bool: return is_module_supported(module_name, self.supported_lora_modules, self.lora_config.target_modules) # worker_manager.py (here) if not is_module_supported(): logger.warning_once("...")

this doesn't let us differentiate between what is not-supported and what is ignored. but I think that is fine. wdyt ?

I can do that.

If we want to have two diff warnings, we will need two utility function. And they will be used by both these files. Will update with that change.

Thanks

…s utils Extract shared module-matching logic into two utility functions in vllm/lora/utils.py so both model_manager.py and worker_manager.py reuse the same checks: - is_supported_lora_module: regex check against model-defined modules - is_in_target_modules: suffix check against deployment-time filter Add unit tests in tests/lora/test_lora_utils.py. Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

varun-sundar-rabindranath

LGTM. There is another PR with wider filter support #31452 but there are some design questions that need answering there.

I am good with landing this PR for immediate benefits if we think the #31452 needs more thought.

Thanks @bhoomit

jeejeelee · 2026-03-17T05:01:59Z

@varun-sundar-rabindranath thank you, just add my stamp

bhoomit requested review from DarkLight1337, NickLucche, ProExpertProg, WoosukKwon, aarnphm, hmellor, houseroad, jeejeelee, mgoin, robertgshaw2-redhat, tlrmchlsmth, yewentao256 and youkaichao as code owners February 20, 2026 19:47

mergify Bot added the documentation Improvements or additions to documentation label Feb 20, 2026

bhoomit force-pushed the lora-target-modules branch from 1e56a85 to 3aa9721 Compare February 20, 2026 19:50

gemini-code-assist Bot reviewed Feb 20, 2026

View reviewed changes

bhoomit force-pushed the lora-target-modules branch from 3aa9721 to 7a70487 Compare February 20, 2026 19:53

bhoomit force-pushed the lora-target-modules branch from 7a70487 to 530f1e7 Compare February 20, 2026 21:11

bhoomit force-pushed the lora-target-modules branch from 530f1e7 to 340cee9 Compare February 20, 2026 21:17

github-project-automation Bot added this to AMD Mar 2, 2026

github-project-automation Bot moved this to In review in NVIDIA Mar 2, 2026

mergify Bot added the cpu Related to CPU backends label Mar 2, 2026

github-project-automation Bot moved this to Todo in AMD Mar 2, 2026

mergify Bot added speculative-decoding v1 kv-connector labels Mar 2, 2026

github-project-automation Bot moved this to In progress in gpt-oss Issues & Enhancements Mar 2, 2026

bhoomit force-pushed the lora-target-modules branch from 8b06127 to 7e0bfcb Compare March 2, 2026 23:22

bhoomit and others added 6 commits March 2, 2026 15:23

[Misc][LoRA] Apply suggestion from @hmellor

18d2edc

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

[Misc][LoRA] fix: Use correct lora_kwargs key for target_modules

8dce456

Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

[Misc][LoRA] fix(lora): Use suffix matching for unsupported module wa…

1af3a6d

…rning Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

[Misc][LoRA] refactor(lora): extract _test_target_modules helper to r…

c183eec

…educe test duplication Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>

bhoomit force-pushed the lora-target-modules branch from 7e0bfcb to c183eec Compare March 2, 2026 23:25

varun-sundar-rabindranath reviewed Mar 2, 2026

View reviewed changes

bhoomit requested a review from varun-sundar-rabindranath March 3, 2026 00:22

varun-sundar-rabindranath reviewed Mar 5, 2026

View reviewed changes

bhoomit requested a review from varun-sundar-rabindranath March 5, 2026 20:49

varun-sundar-rabindranath mentioned this pull request Mar 13, 2026

[LoRA] Add --lora-target-modules to selectively apply LoRA layers #31452

Open

5 tasks

varun-sundar-rabindranath approved these changes Mar 13, 2026

View reviewed changes

Merge branch 'main' into lora-target-modules

8a51df3

jeejeelee approved these changes Mar 17, 2026

View reviewed changes

Merge branch 'main' into lora-target-modules

40c77b5

This was referenced Apr 26, 2026

[Bug]: Timeout when using LoRA with Nemotron Super (Nano is OK) #40913

Closed

Fix timeout when using LoRA adapters with Nemotron Super #40916

Merged

Uh oh!

Conversation

bhoomit commented Feb 20, 2026

Purpose

Usage

Changes

Benchmark: --lora-target-modules Latency Impact

Configuration

Results with CUDA graphs + torch.compile (production mode)

TTFT (ms)

TPOT (ms)

Results with enforce_eager

TTFT (ms)

TPOT (ms)

Key takeaways

Uh oh!

mergify Bot commented Feb 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

vllm/lora/layers/logits_processor.py (91-94)

vllm/lora/model_manager.py (571-572)

Uh oh!

mergify Bot commented Feb 20, 2026

Uh oh!

dcmaddix commented Feb 20, 2026

Uh oh!

bhoomit commented Feb 20, 2026

Uh oh!

mergify Bot commented Feb 20, 2026

Uh oh!

dcmaddix commented Feb 20, 2026

Uh oh!

bhoomit commented Feb 20, 2026

Uh oh!

cjackal commented Feb 21, 2026

Uh oh!

bhoomit commented Feb 22, 2026

Uh oh!

cjackal commented Feb 22, 2026

Uh oh!

mergify Bot commented Feb 23, 2026

Uh oh!

varun-sundar-rabindranath Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

bhoomit Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

bhoomit Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

bhoomit Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

varun-sundar-rabindranath left a comment

Choose a reason for hiding this comment

Uh oh!

jeejeelee commented Mar 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Benchmark: `--lora-target-modules` Latency Impact