[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules#34984
Conversation
|
Documentation preview: https://vllm--34984.org.readthedocs.build/en/34984/ |
1e56a85 to
3aa9721
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces the --lora-target-modules CLI parameter and LoRAConfig.target_modules field, allowing users to restrict LoRA application to specific model modules at deployment time. This is a valuable feature for performance tuning. The implementation correctly integrates the new configuration into the engine and model manager. However, I have identified a critical logic bug in the vocab size validation for the logits processor and a performance/flexibility issue in the module matching logic that should be addressed.
I am having trouble creating individual review comments. Click here to see my feedback.
vllm/lora/layers/logits_processor.py (91-94)
The validation logic here uses a comparison chain 32000 < self.base_layer.vocab_size > 258048 which is logically equivalent to self.base_layer.vocab_size > 258048 (since 258048 > 32000). Furthermore, the error message 32000 >= vocab_size <= 258048 is mathematically confusing and likely incorrect, as it implies vocab_size must be less than or equal to 32000. If the intent is to enforce an upper bound of 258048, the logic should be simplified and the message clarified.
if self.base_layer.vocab_size > 258048:
raise ValueError(
f"When using LoRA, vocab size must be <= 258048, "
f"but found {self.base_layer.vocab_size}"
)vllm/lora/model_manager.py (571-572)
The current implementation of target_modules matching is too restrictive and inefficient.
- Restrictiveness: By only checking the last component of the module name (
split(".")[-1]), users cannot target specific layers or sub-paths (e.g.,layers.0.self_attn.o_proj). This is inconsistent with howsupported_lora_modulesare matched and how PEFT'starget_modulesusually work. - Performance: Creating a
set()fromself.lora_config.target_modulesinside this method is inefficient because_match_target_modulesis called in a loop for every module in the model during initialization and warmup.
I suggest using a matching logic consistent with the is_supported check above, which also avoids the redundant set creation.
return any(
module_name.endswith(f".{target}") or module_name == target
for target in self.lora_config.target_modules
)
3aa9721 to
7a70487
Compare
|
Hi @bhoomit, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
|
Thanks @bhoomit. Looks good to me. Do we want similar logic for MoE-LoRA models? |
It should already work, as long as the TM identifier, is last part of the layer identifier. e.g.
|
7a70487 to
530f1e7
Compare
|
Hi @bhoomit, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
530f1e7 to
340cee9
Compare
Yes that naming is the same but MoE has other target parameters, e.g., |
@dcmaddix Yes, its tested for those two as well. It will work as expected. |
|
Related PR and some previous discussion points are at #31452 FYI. As mentioned there, it would be ideal to accept layer indices, not only the module types I think. |
Thanks @cjackal for taking a look and adding reference to related PR. I went through the discussion.
|
I also love the agile way, we may go ahead to cover the basic usecase and extend further later. I'd mentioned the PR mostly because this feature is largely a matter of UX design (implementation-wise, all the building blocks to support selective lora targets are already there and the size of diff is also pretty small - not that tough to go at anytime) and @jeejeelee seems to have some opinion on the design. |
|
Hi @bhoomit, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
8b06127 to
7e0bfcb
Compare
…odules Add deployment-time control over which model modules have LoRA applied via a new --lora-target-modules CLI parameter and LoRAConfig.target_modules field. This accepts module suffixes (e.g., o_proj, qkv_proj) and restricts LoRA application to only those modules, useful for performance tuning. When not specified, all supported LoRA modules are used (existing behavior). Changes: - vllm/config/lora.py: Add target_modules field to LoRAConfig - vllm/engine/arg_utils.py: Add --lora-target-modules CLI argument - vllm/lora/model_manager.py: Filter modules in _match_target_modules - docs/features/lora.md: Document the new parameter - tests: CLI arg parsing and LoRAModelManager unit tests Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
…dules Add a warning_once in _load_adapter when a LoRA adapter contains modules not in the model's supported LoRA target modules. These parameters would be silently ignored, which may cause unexpected model behavior. The warning helps users identify misconfigured adapters early. Also adds a unit test that verifies the warning is emitted. Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
…rning Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
…educe test duplication Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
7e0bfcb to
c183eec
Compare
| if not any( | ||
| module_name.endswith(f".{suffix}") | ||
| for suffix in supported_lora_modules | ||
| ): |
There was a problem hiding this comment.
it looks like the if statment above should use the matching logic from _match_target_modules for consistency ?
is it true or am I missing something ?
There was a problem hiding this comment.
_match_target_modules checks more than the new feature "lora-target-modules".
- Check if the TM is in supported_lora_modules of the model
- Check if the TM is in "lora-target-modules" (new feature)
I see the inconsistency, working on making it better.
There was a problem hiding this comment.
Addressed the concern in latest commit.
…s and add target_modules warning
- Replace endswith() check with split('.')[-1] suffix matching in the
unsupported module warning, consistent with _match_target_modules
- Add a second warning when an adapter module is excluded by the
deployment-time target_modules restriction (previously silent)
- Add test_load_adapter_warns_on_target_modules_restriction to cover
the new warning path
- Refactor _test_target_modules helper to accept expected_lora and
expected_no_lora assertion lists, moving asserts into the helper
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
| module_name, | ||
| lora_request.lora_path, | ||
| ", ".join(sorted(target_modules)), | ||
| ) |
There was a problem hiding this comment.
Thanks for the refactor @bhoomit . can we introduce a utility to do this check ? something like,
# in a utils file.
def is_module_supported(module_name, supported_lora_modules, target_modules) -> bool:
...
# model_manager.py
def _match_target_modules(self, module_name: str) -> bool:
return is_module_supported(module_name, self.supported_lora_modules, self.lora_config.target_modules)
# worker_manager.py (here)
if not is_module_supported():
logger.warning_once("...")
this doesn't let us differentiate between what is not-supported and what is ignored. but I think that is fine. wdyt ?
There was a problem hiding this comment.
I can do that.
If we want to have two diff warnings, we will need two utility function. And they will be used by both these files. Will update with that change.
Thanks
…s utils Extract shared module-matching logic into two utility functions in vllm/lora/utils.py so both model_manager.py and worker_manager.py reuse the same checks: - is_supported_lora_module: regex check against model-defined modules - is_in_target_modules: suffix check against deployment-time filter Add unit tests in tests/lora/test_lora_utils.py. Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
|
@varun-sundar-rabindranath thank you, just add my stamp |
Purpose
Add deployment-time control over which model modules have LoRA applied via a new
--lora-target-modulesCLI parameter andLoRAConfig.target_modulesfield.This accepts module suffixes (e.g.,
o_proj,qkv_proj) and restricts LoRA application to only those modules, useful for performance tuning. When not specified, all supported LoRA modules are used (existing behavior).Usage
Changes
vllm/config/lora.py: Addtarget_modulesfield toLoRAConfigvllm/engine/arg_utils.py: Add--lora-target-modulesCLI argumentvllm/lora/model_manager.py: Filter modules in_match_target_modulesdocs/features/lora.md: Document the new parameterLoRAModelManagerunit testsBenchmark:
--lora-target-modulesLatency ImpactConfiguration
Serving config: input_len=256, output_len=128, num_prompts=32, request_rate=2 req/s
Baseline = adapter only contains weights for the target modules, no
--lora-target-modulesflag.With TM = full adapter (all 4 modules) +
--lora-target-modulesrestricts at engine level.Results with CUDA graphs + torch.compile (production mode)
TTFT (ms)
TPOT (ms)
Results with enforce_eager
TTFT (ms)
TPOT (ms)
Key takeaways
--lora-target-modulesskips wrapping entirely.