[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers.#23394
[Core][Multimodal] Allow passing multi_modal_uuids as multimodal identifiers.#23394ywang96 merged 43 commits intovllm-project:mainfrom
multi_modal_uuids as multimodal identifiers.#23394Conversation
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
multi_modal_ids and uuid has custom multimodal identifiersmulti_modal_ids as multimodal identifiers.
Signed-off-by: Roger Wang <hey@rogerw.io>
vllm/multimodal/hasher.py
Outdated
| logger = init_logger(__name__) | ||
|
|
||
| MultiModalHashDict = Mapping[str, list[str]] | ||
| MultiModalHashDict = dict[str, list[Optional[str]]] |
There was a problem hiding this comment.
Based on my other comment on hashing uuid, we might need to have
MultiModalUUIDDict = dict[str, list[Optional[str]]]
MultiModalHashDict = dict[str, list[Optional[str]]]
separately so they represent raw user uuids and hash values. Even though they are the same string types, we should better separate their semantics.
vllm/v1/engine/processor.py
Outdated
| are allowed and will be auto-hashed downstream. | ||
| """ | ||
|
|
||
| def _validate_single(single_prompt: Union[dict, str]) -> None: |
There was a problem hiding this comment.
We should also validate that if an entry in multi_modal_data[modality] is None, its corresponding uuid must not be None.
There was a problem hiding this comment.
Today we don't allow users to pass in a None multi_modal_data item.
I see the point is about passing a uuid for an existing cached object so no data transmission for the object itself is required, but I think that's outside the scope of this PR and will defer to a later one.
|
After #23018 is merged, we should validate on P0 that if the user passes in UUID, the corresponding item should exist in the cache, and reject the request otherwise, in order to avoid crashing the engine in P1. |
multi_modal_ids as multimodal identifiers.multi_modal_uuids as multimodal identifiers.
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Roger Wang <hey@rogerw.io>
…entifiers. (vllm-project#23394) Signed-off-by: Roger Wang <hey@rogerw.io>
…entifiers. (vllm-project#23394) Signed-off-by: Roger Wang <hey@rogerw.io>
Purpose
This PR is a follow-up to #23308 and #22711. Since multimodal hashes will now be required for the upcoming reworked multimodal encoder cache, we want to allow users to be able to pass in their own multimodal identifiers in case hashing tensors results in non-negligible overhead.
This PR allows
multi_modal_uuidsto be passed toAsyncLLM. The structure ofmulti_modal_uuidsmatchesmulti_modal_data. An entry is either a string orNone(in this case, we fall back to the default hashing logic to computemm_hashfor the item)Follow-up: Allow passing
uuidfromChatMessageswhich will be supported by #23449.Partially resolves: #22044
Usage
Test Plan
pytest v1/engine/test_processor_multi_modal_uuids.pyTest Result
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.