Improve multimodal hasher performance for re-used Image prompts #22825

p88h · 2025-08-13T15:19:14Z

When processing batches of questions with multiple images each, MultiModalHasher will re-compute the hash for every image in each prompt. This is quite expensive and it blocks the ingress thread.

This change allows to apply an UUID tag to any Image. If detected, Hasher will just use the UUID bytes rather than the actual image.

To avoid conflicts, only real UUID objects are supported, though that could theoretically be opened to arbitrary inputs as well. The ImageID ExifTag used here theoretically could be used for this purpose even then - it should be uniquely identifying the image.

This PR does not modify any Image loading code to use this, but adds a test case to validate the functionality.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Speed up image hashing time when re-using same Images across multiple prompts.

Test Plan

Tested with https://github.com/p88h/fake-vqa to check multi-prompt processing performance.

Test Result

In local testing, using 64 images per prompt, the time to just add the requests to the engine was at about 2 seconds per prompt. After the change, this drops to 0.1 second per prompt, about 20x improvement.

(Optional) Documentation Update

github-actions · 2025-08-13T15:19:22Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

When processing batches of questions with multiple images each, MultiModalHasher will re-compute the hash for every image in each prompt. This is quite expensive and it blocks the ingress thread. This change allows to apply an UUID tag to any Image. If detected, Hasher will just use the UUID bytes rather than the actual image. To avoid conflicts, only real UUID objects are supported, though that could theoretically be opened to arbitrary inputs as well. The ImageID ExifTag used here _theoretically_ could be used for this purpose even then - it should be uniquely identifying the image. This PR does not modify any Image loading code to use this, but adds a test case to validate the functionality. Signed-off-by: Staszek Pasko <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a performance optimization for hashing multimodal image prompts by using a UUID from the image's EXIF data as a cache key, avoiding expensive re-computation of image hashes. The implementation is sound and the included tests validate the new functionality.

My main feedback is a critical issue regarding the handling of images that do not support EXIF data. The current implementation can cause a crash, which is a regression. I've provided a code suggestion to handle this case gracefully by catching the potential AttributeError.

vllm/multimodal/hasher.py

Signed-off-by: Staszek Pasko <[email protected]>

DarkLight1337 · 2025-08-13T18:26:50Z

Thanks for adding this support! cc @huachenheli

Let's take some time to think about whether this would introduce any security vulnerabilites, since our processor cache, embedding cache and prefix cache all use this hash, and your PR would allow users to submit their own hash.

DarkLight1337 · 2025-08-13T18:36:07Z

Also cc @russellb regarding this

p88h · 2025-08-13T22:58:05Z

In the current state these identifiers can only be provided as part of the Image, but cannot be loaded from storage (this will render the tag as str/bytes and not UUID). So it should be quite contained from security perspective. Staszek Paśko ***@***.***> Press Any key to continue.

…

On Wed, 13 Aug 2025, 20:36 Cyrus Leung, ***@***.***> wrote: *DarkLight1337* left a comment (vllm-project/vllm#22825) <#22825 (comment)> Also cc @russellb <https://github.com/russellb> regarding this — Reply to this email directly, view it on GitHub <#22825 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABLAWXGFVKDQS6U5LCNG5PL3NOAS3AVCNFSM6AAAAACDZ7ILMKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCOBVGA2TIMBRGI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

DarkLight1337 · 2025-08-14T06:47:18Z

I guess it is possible to have incorrect outputs if two different users use the same UUID for their image by manually editing the metadata. But it's arguable that this is the user's own fault for messing with the metadata like that, since that field is supposed to be unique when auto-generated.

p88h · 2025-08-14T08:02:09Z

The code checks whether the exiftags contain an UUID object - not just metadata that looks like one. The only way this works is with an Engine-level integration where the user supplies the Image objects (look how this works in the unit test).
And it needs some batching capability upstream of VLLM to make it useful.

I've added this integration to https://github.com/p88h/fake-vqa as well which shows how this behaves with real batch multimodal scenarios (multiple prompts referring to the same set of Images)

In the API / interactive chat world, any images need to be fetched using the media connector, and even if you use the same image over time these get re-downloaded, so hashing is not really important -> hence the check for UUID object - it's not possible to just edit metadata on the source file, since that will not produce one.

DarkLight1337

Yeah I think this should be fine then, thanks for contributing!

huachenheli · 2025-08-14T19:51:23Z

vllm/multimodal/hasher.py

+            if Image.ExifTags.Base.ImageID in exif and isinstance(
+                    exif[Image.ExifTags.Base.ImageID], uuid.UUID):
+                # If the image has exif ImageID tag, use that
+                return exif[Image.ExifTags.Base.ImageID].bytes


nit: Explicit if-else separation for better readability:

if Image.ExifTags.Base.ImageID in exif and isinstance(exif[Image.ExifTags.Base.ImageID], uuid.UUID): # If the image has exif ImageID tag, use that return exif[Image.ExifTags.Base.ImageID].bytes else: return cls.item_to_bytes("image", np.asarray(convert_image_mode(obj, "RGBA")))

huachenheli · 2025-08-14T19:55:04Z

I guess it is possible to have incorrect outputs if two different users use the same UUID for their image by manually editing the metadata. But it's arguable that this is the user's own fault for messing with the metadata like that, since that field is supposed to be unique when auto-generated.

It's possible for different users to re-use the same uuid, and it can be useful in the case of repeated media in different requests, so supporting that is useful.

I think your concern on security is a good point! In the case of multi-tenant APIs this can potentially leave vulnerabilities (i.e. one user hacking uuid to read another user's image). Maybe we should by default disable this feature and have it controlled by a flag?

DarkLight1337 · 2025-08-15T08:48:44Z

Can you merge from main to fix the CI?

DarkLight1337 · 2025-08-16T02:17:54Z

I think your concern on security is a good point! In the case of multi-tenant APIs this can potentially leave vulnerabilities (i.e. one user hacking uuid to read another user's image). Maybe we should by default disable this feature and have it controlled by a flag?

FYI, we have discussed this offline and determined this not to be an issue because a potential attacker needs to already have access to the image in the first place in order to determine the hash to use.

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

p88h requested review from DarkLight1337 and ywang96 as code owners August 13, 2025 15:19

p88h force-pushed the main branch from d4520d2 to 7b7dfcc Compare August 13, 2025 15:20

mergify bot added the multi-modality Related to multi-modality (#4194) label Aug 13, 2025

gemini-code-assist bot reviewed Aug 13, 2025

View reviewed changes

vllm/multimodal/hasher.py Show resolved Hide resolved

p88h added 3 commits August 13, 2025 17:32

fix formatting

ae0a570

Signed-off-by: Staszek Pasko <[email protected]>

more fixes

d66267f

Signed-off-by: Staszek Pasko <[email protected]>

argh

1f2af26

Signed-off-by: Staszek Pasko <[email protected]>

DarkLight1337 mentioned this pull request Aug 13, 2025

[RFC]: Optimize Input Media Processing in vLLM #22044

Open

1 task

Merge branch 'vllm-project:main' into main

a6dc520

DarkLight1337 approved these changes Aug 14, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) August 14, 2025 15:39

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 14, 2025

huachenheli reviewed Aug 14, 2025

View reviewed changes

Merge branch 'vllm-project:main' into main

d231922

DarkLight1337 merged commit 22341b9 into vllm-project:main Aug 15, 2025
38 of 39 checks passed

yiliu30 pushed a commit to yiliu30/vllm-fork that referenced this pull request Aug 19, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

a342649

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

68ce862

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

11652f2

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]> Signed-off-by: Duncan Moss <[email protected]>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

2caf061

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

c30d42e

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]> Signed-off-by: Xiao Yu <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

Improve multimodal hasher performance for re-used Image prompts (vllm…

2480328

…-project#22825) Signed-off-by: Staszek Pasko <[email protected]>

Uh oh!

Improve multimodal hasher performance for re-used Image prompts #22825

Improve multimodal hasher performance for re-used Image prompts #22825

Uh oh!

Conversation

p88h commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Aug 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

DarkLight1337 commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Aug 13, 2025

Uh oh!

p88h commented Aug 13, 2025 via email

Uh oh!

DarkLight1337 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p88h commented Aug 14, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

huachenheli Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

huachenheli commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Aug 15, 2025

Uh oh!

Uh oh!

DarkLight1337 commented Aug 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

p88h commented Aug 13, 2025 •

edited

Loading

DarkLight1337 commented Aug 13, 2025 •

edited

Loading

DarkLight1337 commented Aug 14, 2025 •

edited

Loading

huachenheli Aug 14, 2025 •

edited

Loading

huachenheli commented Aug 14, 2025 •

edited

Loading