[Fix] Enable mm_processor_cache with vision LoRA#31927
[Fix] Enable mm_processor_cache with vision LoRA#31927DarkLight1337 merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables the multi-modal processor cache for vision LoRA, which was previously unsupported. The changes introduce a mm_hash field to MultiModalFeatureSpec to allow sharing the processor cache across different LoRAs by using a base hash for the multi-modal data, independent of the LoRA. The encoder cache remains LoRA-specific by using a prefixed identifier. Additionally, the LoRA prefix for the identifier is updated for robustness.
The overall approach is sound and the new test case correctly validates the cache sharing logic. However, I've identified a critical issue with the new method of generating the LoRA prefix for cache identifiers. The use of Python's built-in hash() function is not deterministic across processes, which can lead to cache inconsistencies in a distributed environment. I've provided a suggestion to use a deterministic encoding method instead.
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
7afd479 to
ab8710d
Compare
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Purpose
Enables multi-modal processor cache when
enable_tower_connector_lorais active. #26674 prefixesidentifierwith LoRA hash to avoid incorrect encoder cache hits(#26674 (comment)) but processor cache should be shared across LoRAs.Solution:
mm_hashfield toMultiModalFeatureSpecto store the base hash (without LoRA prefix)mm_hashfor cache lookups (shared across LoRAs)identifier(LoRA-prefixed, separate per LoRA)