-
-
Notifications
You must be signed in to change notification settings - Fork 0
[Bugfix] Fixing bug inside MultiModalProfiler #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add dev container Dockerfile and devcontainer.json
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request primarily addresses a bug in MultiModalProfiler that caused large image requests to hang due to an incorrect encoder budget calculation. The fix correctly uses the full placeholder length, which is a sound approach. Additionally, the PR refactors the Llama 3/4 tool parser. While reviewing this refactoring, I found a critical issue where the new implementation unsafely splits JSON strings by semicolons, which can fail if a semicolon is part of a string value within the JSON. I've provided a more robust solution using json.JSONDecoder to correctly parse multiple tool calls.
| json_str = match.group(0) | ||
| # Split by semicolon and strip whitespace | ||
| json_objects = [obj.strip() for obj in json_str.split(';')] | ||
|
|
||
| tool_calls: list[ToolCall] = [] | ||
| for json_obj in json_objects: | ||
| if not json_obj: # Skip empty strings | ||
| continue | ||
| obj = json.loads(json_obj) | ||
| tool_calls.append( | ||
| ToolCall( | ||
| type="function", | ||
| function=FunctionCall( | ||
| name=obj["name"], | ||
| # function call args are JSON but as a string | ||
| arguments=json.dumps(obj["arguments"] \ | ||
| if "arguments" in obj \ | ||
| else obj["parameters"]))) | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of json_str.split(';') to separate multiple JSON objects is unsafe. A semicolon can legally appear within a string value inside a JSON object, which would cause this split to break the JSON and lead to parsing errors. This could result in failed or incorrect tool calls.
A more robust approach is to iteratively parse JSON objects from the string using json.JSONDecoder().raw_decode(), which correctly handles such cases.
json_str = match.group(0)
tool_calls: list[ToolCall] = []
decoder = json.JSONDecoder()
idx = 0
while idx < len(json_str):
# Skip whitespace and semicolons before the next JSON object
while idx < len(json_str) and (json_str[idx].isspace() or json_str[idx] == ';'):
idx += 1
if idx >= len(json_str):
break
try:
obj, end = decoder.raw_decode(json_str, idx)
idx = end
except json.JSONDecodeError:
# This can happen if there's trailing text after the JSONs
# that the regex matched. We can ignore it.
break
tool_calls.append(
ToolCall(
type="function",
function=FunctionCall(
name=obj["name"],
# function call args are JSON but as a string
arguments=json.dumps(obj["arguments"] \
if "arguments" in obj \
else obj["parameters"])))
)…res-based-on-v0.9.1⚠️ This commit contains unresolved merge conflicts in files: benchmarks/kernels/benchmark_moe.py tests/v1/core/test_scheduler.py vllm/entrypoints/openai/api_server.py vllm/entrypoints/openai/tool_parsers/llama_tool_parser.py vllm/v1/attention/backends/flash_attn.py Commits being cherry-picked: c9bd9f0 pick opc-request-id middleware and handle_pydantic_validation_error 236db58 [Bugfix] Improve JSON extraction in LlamaToolParser (#2) 4dd7951 Add dev container 1ddce9a Add devcontainer.json f085ef2 Merge pull request #3 from moirai-internal/dev_container 📝 Edit files to remove conflict markers
Add dev container Dockerfile and devcontainer.json
…sed-on-v0.10.0⚠️ This commit contains unresolved merge conflicts in files: requirements/common.txt tests/test_utils.py vllm/attention/backends/torch_sdpa.py Commits being cherry-picked: aefc22d Merge pull request #3 from moirai-internal/dev_container 50dadb7 pick [Bugfix] Improve JSON extraction in LlamaToolParser 4373a86 pick opc-request-id middleware 881ad1a cherry pick the fix for non english token in logprobs (#7) 📝 Edit files to remove conflict markers
…0.10.0 (#8) * Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" This reverts commit e202dd2. * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * cherry pick the fix for non english token in logprobs (#7) * [Bugfix]: Fix messy code when using logprobs (#19209) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix]: Fix messy code when using logprobs (#20910) Signed-off-by: chaunceyjiang <[email protected]> * fix transformers compatible issue vllm-project/vllm-ascend#2046 --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: Chauncey <[email protected]> * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * ✅ Resolved: Cherry-pick from features-based-on-v0.9.2 to features-based-on-v0.10.0 * Revert "Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"" This reverts commit a5dd03c. * remove fstring * Delete torch_sdpa.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]>
…v0.10.1 (#10) * Cherry-pick conflicts: features-based-on-v0.9.2 → features-based-on-v0.10.0 (#8) * Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)" This reverts commit e202dd2. * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * cherry pick the fix for non english token in logprobs (#7) * [Bugfix]: Fix messy code when using logprobs (#19209) Signed-off-by: chaunceyjiang <[email protected]> * [Bugfix]: Fix messy code when using logprobs (#20910) Signed-off-by: chaunceyjiang <[email protected]> * fix transformers compatible issue vllm-project/vllm-ascend#2046 --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: Chauncey <[email protected]> * Merge pull request #3 from moirai-internal/dev_container Add dev container Dockerfile and devcontainer.json * pick [Bugfix] Improve JSON extraction in LlamaToolParser * pick opc-request-id middleware * ✅ Resolved: Cherry-pick from features-based-on-v0.9.2 to features-based-on-v0.10.0 * Revert "Revert "[V0 deprecation] Remove V0 CPU/XPU/TPU backends (#20412)"" This reverts commit a5dd03c. * remove fstring * Delete torch_sdpa.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]> * Resolve merge conflict Cherry-pick from features-based-on-v0.10.0 to features-based-on-v0.10.1 Resolve merge conflict Cherry-pick from features-based-on-v0.10.0 to features-based-on-v0.10.1 * Update llama_tool_parser.py --------- Signed-off-by: chaunceyjiang <[email protected]> Co-authored-by: simon-mo <[email protected]> Co-authored-by: Keyang Ru <[email protected]> Co-authored-by: Chao Yang <[email protected]> Co-authored-by: Chauncey <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Varun Shenoy <[email protected]> Co-authored-by: Tejesh Anand <[email protected]> Co-authored-by: paxiaatucsdedu <[email protected]>
Purpose
This PR fixes the issue #21344 where large image requests hangs in vllm waiting for the request to timeout.
Bug: MultiModalProfiler counts only
patchtokens but, there are other bookeeping tokens liketile_seperator,image_start,image_endtokens in the input. This causes the encoder_budget to be slightly lower the actual budget. Whenever an image that uses all tiles is sent, VLLM accept the request but, scheduler can never schedule it because there is not enough encoder budget. Silent issue and No error is producedFix: Use the length parameter on
PlaceHolderRangeinstead of embedding which is real token budget during profiling.Test Plan
Test Result
(Optional) Documentation Update