[Core] Initialize LoRA support for tower and connector in multi-modal models#26674
[Core] Initialize LoRA support for tower and connector in multi-modal models#26674vllm-bot merged 91 commits intovllm-project:mainfrom
Conversation
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces LoRA support for tower modules in multi-modal models by creating distinct punica_wrapper instances for different model components, such as the vision tower and the language model, and dispatching LoRA operations accordingly. The changes are well-structured, primarily modifying vllm/lora/models.py for the core logic and updating other files for necessary plumbing. My review identifies a few critical and high-severity issues in the core logic, including hardcoded values that should be dynamic, potentially incorrect use of configuration parameters between encoder and decoder, and unsafe list access that could lead to runtime errors. Addressing these points will improve the robustness and correctness of the implementation.
vllm/lora/models.py
Outdated
| self.mm_punica_wrapper_mapping[ | ||
| self.mm_mapping.tower_model[0] | ||
| ].update_metadata( | ||
| mapping, | ||
| self.lora_index_to_id, | ||
| self.lora_slots + 1, | ||
| self.vocab_size, | ||
| self.lora_config.lora_extra_vocab_size, | ||
| ) | ||
| else: | ||
| self.mm_punica_wrapper_mapping[ | ||
| self.mm_mapping.language_model[0] | ||
| ].update_metadata( | ||
| mapping, | ||
| self.lora_index_to_id, | ||
| self.lora_slots + 1, | ||
| self.vocab_size, | ||
| self.lora_config.lora_extra_vocab_size, | ||
| ) |
There was a problem hiding this comment.
The code accesses self.mm_mapping.tower_model[0] (line 513) and self.mm_mapping.language_model[0] (line 523) without checking if the lists are empty. If get_mm_mapping() returns a MultiModelKeys object with an empty tower_model or language_model list, this will raise an IndexError, causing a crash. A similar issue exists in __init__ on line 414. It's safer to add checks to ensure these lists are not empty before accessing their elements.
For example:
if not self.mm_mapping.tower_model:
raise ValueError("No tower model found in mm_mapping for LoRA.")
tower_model_name = self.mm_mapping.tower_model[0]
self.mm_punica_wrapper_mapping[tower_model_name].update_metadata(...)
vllm/lora/models.py
Outdated
| self.mm_config = model_config.multimodal_config | ||
| # limit_per_prompt: int = max( | ||
| # self.info.get_allowed_mm_limits().values()) | ||
| limit_per_prompt = 5 # TODO |
There was a problem hiding this comment.
The limit_per_prompt is hardcoded to 5. This could lead to unexpected behavior or limitations if not aligned with the actual model's capabilities or user's expectations. It would be better to derive this value from the model configuration, as suggested by the commented-out code, to ensure it's dynamically and correctly set. Using max(values or [0]) would also prevent a ValueError if get_allowed_mm_limits() returns an empty mapping.
| limit_per_prompt = 5 # TODO | |
| limit_per_prompt: int = max(self.info.get_allowed_mm_limits().values() or [0]) |
vllm/lora/models.py
Outdated
| self.mm_punica_wrapper_mapping = { | ||
| name: get_punica_wrapper( | ||
| self.info.get_num_mm_encoder_tokens(max_num_batched_tokens), | ||
| max_batches=self.max_num_seqs * limit_per_prompt, | ||
| device=self.device, | ||
| max_loras=self.lora_config.max_loras, | ||
| ) | ||
| for name in self.mm_mapping.tower_model | ||
| } |
There was a problem hiding this comment.
The max_num_batched_tokens parameter, which is typically configured for the decoder, is being used to determine the number of tokens for the multi-modal encoder via self.info.get_num_mm_encoder_tokens(max_num_batched_tokens). This could be incorrect as the encoder might have a different token budget. The commented-out line # max_num_batched_tokens = encoder_budget suggests that a separate encoder_budget should be used. Using the decoder's token budget for the encoder could lead to misconfiguration and potential runtime errors.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
vllm/lora/model_manager.py
Outdated
| be filtered out. | ||
| """ | ||
| if self.supports_mm: | ||
| module_mapping: MultiModelKeys = self.model.get_mm_mapping() | ||
| prefix_lst = module_mapping.connector + module_mapping.tower_model | ||
| return any([module_name.startswith(prefix) for prefix in prefix_lst]) | ||
| prefix_lst = self.mm_mapping.connector + self.mm_mapping.tower_model | ||
| if self.supports_mm_lora: | ||
| return self._get_mm_punica_wrapper(module_name) is None | ||
| else: | ||
| return any([module_name.startswith(prefix) for prefix in prefix_lst]) |
There was a problem hiding this comment.
Access undefined multimodal mapping when LoRA tower unsupported
When LoRA is enabled for a multimodal model that does not yet implement tower LoRA (i.e. supports_mm is true but supports_mm_lora is false because the processing info lacks get_num_mm_encoder_tokens), _filter_unsupported_mm_module still dereferences self.mm_mapping to build the prefix list. However self.mm_mapping is only created inside __init__ when supports_mm_lora is true, so any such model will raise an AttributeError during adapter registration, breaking existing multimodal models that only supported language-model LoRA before this change. The mapping should be fetched regardless of supports_mm_lora or lazily inside the filter.
Useful? React with 👍 / 👎.
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
add connector support
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: bk-201 <joy25810@foxmail.com>
Sure, feel free to disable the cache when enabling mm lora! We can address it in following PRs. |
Signed-off-by: bk-201 <joy25810@foxmail.com>
…vllm into mlm-full-lora-support
… models (vllm-project#26674) Signed-off-by: bk-201 <joy25810@foxmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: bk-201 <joy25810@foxmail.com> Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Anexdeus <5142168@mail.ru> Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
… models (vllm-project#26674) Signed-off-by: bk-201 <joy25810@foxmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: bk-201 <joy25810@foxmail.com> Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Anexdeus <5142168@mail.ru>
… models (vllm-project#26674) Signed-off-by: bk-201 <joy25810@foxmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: bk-201 <joy25810@foxmail.com> Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Anexdeus <5142168@mail.ru> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
… models (vllm-project#26674) Signed-off-by: bk-201 <joy25810@foxmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: bk-201 <joy25810@foxmail.com> Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Anexdeus <5142168@mail.ru>
Purpose
FIX #26422
FIX #27916
FIX #29635
Summary
This PR enables LoRA support for the tower modules and connector in multi-modal models. Previously, a single global Punica wrapper was used, which was insufficient because the input structures for the tower/connector differ from those of the language model.
Key Changes
To address this, the following changes were implemented:
ProcessingInfoMethods: Addedget_num_mm_encoder_tokensandget_num_mm_connector_tokensto theProcessingInfointerface. These are used to calculate the actual input token counts for the vision tower and connector.Why add
get_num_mm_*_tokens?The number of multi-modal tokens represented in the language model does not necessarily match the input length required by the linear layers in the vision tower or connector. Since the
lora_mappingrequires the precise input token length prior to activation, these helper functions are necessary to bridge the discrepancy and calculate the correct lengths.Implementation Details
Currently, this PR implements the logic for the Qwen2/2.5/3 VL and Idefics3. Below is an example of the implementation:
Note: To support LoRA for the tower and connector of other multi-modal models, these two functions must be implemented in their respective classes.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.