Skip to content

[Misc] toy_proxy_server handle min_tokens#39706

Merged
DarkLight1337 merged 4 commits intovllm-project:mainfrom
NickLucche:proxy-min-tokens
Apr 16, 2026
Merged

[Misc] toy_proxy_server handle min_tokens#39706
DarkLight1337 merged 4 commits intovllm-project:mainfrom
NickLucche:proxy-min-tokens

Conversation

@NickLucche
Copy link
Copy Markdown
Collaborator

Sending a request with min_tokens to the toy_proxy_server.py currently fails on P as it receives a request with max_tokens=1 and min_tokens>1, crashing on validation.
This small patch allows to skip min_tokens sending to P, while forwarding that to D only.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the toy proxy server to strip 'min_tokens' and 'min_completion_tokens' from requests sent to the prefiller service. The review feedback identifies that the manual re-insertion of these keys is redundant because the request data is already a local copy, and warns that explicitly adding them back as null could cause validation issues in the decoder.

Comment on lines +177 to +178
min_tokens = req_data.pop("min_tokens", None)
min_completion_tokens = req_data.pop("min_completion_tokens", None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Since req_data is a local copy (created at line 161), popping these keys only affects the request sent to the prefiller (P). The original req_data object in the caller (_handle_completions) remains untouched and will still contain these keys when later sent to the decoder (D). You can simply pop them without assigning to variables to avoid unused variable warnings and unnecessary state tracking.

Suggested change
min_tokens = req_data.pop("min_tokens", None)
min_completion_tokens = req_data.pop("min_completion_tokens", None)
req_data.pop("min_tokens", None)
req_data.pop("min_completion_tokens", None)

Comment on lines +193 to +196
# Add back the min_tokens and min_completion_tokens so D can use them
req_data["min_tokens"] = min_tokens
req_data["min_completion_tokens"] = min_completion_tokens

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

These lines are redundant and potentially problematic.

  1. Redundancy: Modifying the local req_data copy here has no effect on the caller, and the caller's original req_data already contains these values.
  2. Logic Error: If the keys were originally absent, this code explicitly adds them with a value of null (None), which may cause validation issues in the decoder if it expects the keys to be missing rather than present with a null value.

Since the goal is to forward the original parameters to the decoder, the existing copy mechanism at line 161 already ensures this behavior without needing to "add back" anything.

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) April 13, 2026 17:40
@github-actions github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 13, 2026
@markmc
Copy link
Copy Markdown
Member

markmc commented Apr 14, 2026

 FAILED v1/kv_connector/nixl_integration/test_multi_connector_edge_cases.py::test_full_decode_gpu_cache_hit_metrics - AssertionError: expected local_cache_hit=128, got 127.0
[2026-04-13T17:57:20Z] assert 127.0 == 128

Fixed by #39709

Signed-off-by: NickLucche <nlucches@redhat.com>
@markmc markmc force-pushed the proxy-min-tokens branch from f677f77 to 3ff2cd8 Compare April 14, 2026 10:04
@DarkLight1337 DarkLight1337 merged commit 3daca38 into vllm-project:main Apr 16, 2026
20 checks passed
vllm-agent pushed a commit to vllm-agent/vllm that referenced this pull request Apr 17, 2026
bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants