[Bugfix] Fixing bug inside MultiModalProfiler #3

shenoyvvarun · 2025-07-29T19:21:32Z

Purpose

This PR fixes the issue #21344 where large image requests hangs in vllm waiting for the request to timeout.
Bug: MultiModalProfiler counts only patch tokens but, there are other bookeeping tokens like tile_seperator, image_start, image_end tokens in the input. This causes the encoder_budget to be slightly lower the actual budget. Whenever an image that uses all tiles is sent, VLLM accept the request but, scheduler can never schedule it because there is not enough encoder budget. Silent issue and No error is produced

Fix: Use the length parameter on PlaceHolderRange instead of embedding which is real token budget during profiling.

Test Plan

Large image ( 4k x 4k) - LLama Guard 4
Full Context length Text + Large image + Llama Guard 4
Sanity test with (7k x 4k Image)

Test Result

Large image ( 4k x 4k)

{
    "id": "chatcmpl-464e45d8a7834c41aa96b93c388f5be6",
    "object": "chat.completion",
    "created": 1753798054,
    "model": "vllm-model",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nsafe",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 2669,
        "total_tokens": 2672,
        "completion_tokens": 3,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

Full Context length Text + Large image (2467 is the encoder budget)

{
    "id": "chatcmpl-9406bb1facd54ea194a270fed2577ca8",
    "object": "chat.completion",
    "created": 1753798730,
    "model": "vllm-model",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "0",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": 200001
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 131069,
        "total_tokens": 131071,
        "completion_tokens": 2,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

Sanity Test

curl http: //localhost:8000/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://upload.wikimedia.org/wikipedia/commons/8/85/Portas_da_Cidade%2C_Ponta_Delgada%2C_isla_de_San_Miguel%2C_Azores%2C_Portugal%2C_2020-07-29%2C_DD_123-125_HDR.jpg"
                    }
                },
                {
                    "type": "text",
                    "text": "Describe this image in detail and also specify where this can be found."
                }
            ]
        }
    ],
    "model": "vllm"
}'
{
    "id": "chatcmpl-86908cb63b644decbcfcbc63d26fbb36",
    "object": "chat.completion",
    "created": 1753799599,
    "model": "vllm",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "The image depicts a large, ornate archway in the center of a city square at dusk. The archway is made of stone and features two large arches with a decorative top section. It is illuminated by purple lights on either side.\n\nHere are the main points describing the image:\n\n* **Archway**\n\t+ Made of stone\n\t+ Features two large arches\n\t+ Decorative top section with a crown-like design\n\t+ Illuminated by purple lights on either side\n* **City Square**\n\t+ Located in front of the archway\n\t+ Features a statue in the center\n\t+ Surrounded by buildings on all sides\n\t+ Streetlights and cars visible\n* **Buildings**\n\t+ White with brown trim\n\t+ Feature arched windows and doors\n\t+ Have balconies on the second floor\n\t+ Appear to be old and historic\n* **Statue**\n\t+ Located in the center of the square\n\t+ Depicts a person standing on a pedestal\n\t+ Not clearly visible due to distance\n* **Streetlights and Cars**\n\t+ Streetlights line the square and surrounding streets\n\t+ Cars are parked along the streets and driving through the area\n\t+ Traffic lights visible in the distance\n* **Sky**\n\t+ Dark blue and cloudy\n\t+ Indicates that it is dusk or evening\n\nIn summary, the image shows a beautiful and historic archway in a city square, surrounded by old buildings and a statue. The archway is illuminated by purple lights, and the square is bustling with streetlights and cars. The dark blue sky suggests that it is dusk or evening. \n\nThis archway can be found in Horta, Faial, Azores, Portugal. The archway is known as the Portão da Cidade (City Gate) and is a iconic landmark in Horta.",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 2346,
        "total_tokens": 2732,
        "completion_tokens": 386,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

(Optional) Documentation Update

Add dev container Dockerfile and devcontainer.json

github-actions · 2025-07-29T19:21:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request primarily addresses a bug in MultiModalProfiler that caused large image requests to hang due to an incorrect encoder budget calculation. The fix correctly uses the full placeholder length, which is a sound approach. Additionally, the PR refactors the Llama 3/4 tool parser. While reviewing this refactoring, I found a critical issue where the new implementation unsafely splits JSON strings by semicolons, which can fail if a semicolon is part of a string value within the JSON. I've provided a more robust solution using json.JSONDecoder to correctly parse multiple tool calls.

gemini-code-assist · 2025-07-29T19:22:46Z

vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py

+            json_str = match.group(0)
+            # Split by semicolon and strip whitespace
+            json_objects = [obj.strip() for obj in json_str.split(';')]
+
+            tool_calls: list[ToolCall] = []
+            for json_obj in json_objects:
+                if not json_obj:  # Skip empty strings
+                    continue
+                obj = json.loads(json_obj)
+                tool_calls.append(
+                    ToolCall(
+                        type="function",
+                        function=FunctionCall(
+                            name=obj["name"],
+                            # function call args are JSON but as a string
+                            arguments=json.dumps(obj["arguments"] \
+                                    if "arguments" in obj \
+                                    else obj["parameters"])))
+                )


The use of json_str.split(';') to separate multiple JSON objects is unsafe. A semicolon can legally appear within a string value inside a JSON object, which would cause this split to break the JSON and lead to parsing errors. This could result in failed or incorrect tool calls.

A more robust approach is to iteratively parse JSON objects from the string using json.JSONDecoder().raw_decode(), which correctly handles such cases.

json_str = match.group(0) tool_calls: list[ToolCall] = [] decoder = json.JSONDecoder() idx = 0 while idx < len(json_str): # Skip whitespace and semicolons before the next JSON object while idx < len(json_str) and (json_str[idx].isspace() or json_str[idx] == ';'): idx += 1 if idx >= len(json_str): break try: obj, end = decoder.raw_decode(json_str, idx) idx = end except json.JSONDecodeError: # This can happen if there's trailing text after the JSONs # that the regex matched. We can ignore it. break tool_calls.append( ToolCall( type="function", function=FunctionCall( name=obj["name"], # function call args are JSON but as a string arguments=json.dumps(obj["arguments"] \ if "arguments" in obj \ else obj["parameters"]))) )

…res-based-on-v0.9.1 ⚠️ This commit contains unresolved merge conflicts in files: benchmarks/kernels/benchmark_moe.py tests/v1/core/test_scheduler.py vllm/entrypoints/openai/api_server.py vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py vllm/v1/attention/backends/flash_attn.py Commits being cherry-picked: c9bd9f0 pick opc-request-id middleware and handle_pydantic_validation_error 236db58 [Bugfix] Improve JSON extraction in LlamaToolParser (#2) 4dd7951 Add dev container 1ddce9a Add devcontainer.json f085ef2 Merge pull request #3 from moirai-internal/dev_container 📝 Edit files to remove conflict markers

Add dev container Dockerfile and devcontainer.json

…sed-on-v0.10.0 ⚠️ This commit contains unresolved merge conflicts in files: requirements/common.txt tests/test_utils.py vllm/attention/backends/torch_sdpa.py Commits being cherry-picked: aefc22d Merge pull request #3 from moirai-internal/dev_container 50dadb7 pick [Bugfix] Improve JSON extraction in LlamaToolParser 4373a86 pick opc-request-id middleware 881ad1a cherry pick the fix for non english token in logprobs (#7) 📝 Edit files to remove conflict markers

…0.10.0 (#8) * Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" This reverts commit e202dd2. * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * cherry pick the fix for non english token in logprobs (#7) * [Bugfix]: Fix messy code when using logprobs (#19209) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix]: Fix messy code when using logprobs (#20910) Signed-off-by: chaunceyjiang <[email protected]> * fix transformers compatible issue vllm-project/vllm-ascend#2046 --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: Chauncey <[email protected]> * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * ✅ Resolved: Cherry-pick from features-based-on-v0.9.2 to features-based-on-v0.10.0 * Revert "Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"" This reverts commit a5dd03c. * remove fstring * Delete torch_sdpa.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]>

…v0.10.1 (#10) * Cherry-pick conflicts: features-based-on-v0.9.2 → features-based-on-v0.10.0 (#8) * Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" This reverts commit e202dd2. * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * cherry pick the fix for non english token in logprobs (#7) * [Bugfix]: Fix messy code when using logprobs (#19209) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix]: Fix messy code when using logprobs (#20910) Signed-off-by: chaunceyjiang <[email protected]> * fix transformers compatible issue vllm-project/vllm-ascend#2046 --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: Chauncey <[email protected]> * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * ✅ Resolved: Cherry-pick from features-based-on-v0.9.2 to features-based-on-v0.10.0 * Revert "Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"" This reverts commit a5dd03c. * remove fstring * Delete torch_sdpa.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]> * Resolve merge conflict Cherry-pick from features-based-on-v0.10.0 to features-based-on-v0.10.1 Resolve merge conflict Cherry-pick from features-based-on-v0.10.0 to features-based-on-v0.10.1 * Update llama_tool_parser.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]> Co-authored-by: paxiaatucsdedu <[email protected]>

upfixer and others added 4 commits July 2, 2025 10:15

pick opc-request-id middleware

52d2820

pick [Bugfix] Improve JSON extraction in LlamaToolParser

6e1e9c5

Merge pull request #3 from moirai-internal/dev_container

eb0c356

Add dev container Dockerfile and devcontainer.json

[Bugfix] Fixing bug inside MultiModalProfiler

688fff3

shenoyvvarun closed this Jul 29, 2025

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

CatherineSue pushed a commit that referenced this pull request Aug 7, 2025

Merge pull request #3 from moirai-internal/dev_container

979c587

Add dev container Dockerfile and devcontainer.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Fixing bug inside MultiModalProfiler #3

[Bugfix] Fixing bug inside MultiModalProfiler #3

Uh oh!

shenoyvvarun commented Jul 29, 2025

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix] Fixing bug inside MultiModalProfiler #3

[Bugfix] Fixing bug inside MultiModalProfiler #3

Uh oh!

Conversation

shenoyvvarun commented Jul 29, 2025

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants