[Misc] toy_proxy_server handle min_tokens#39706
[Misc] toy_proxy_server handle min_tokens#39706DarkLight1337 merged 4 commits intovllm-project:mainfrom
toy_proxy_server handle min_tokens#39706Conversation
There was a problem hiding this comment.
Code Review
This pull request modifies the toy proxy server to strip 'min_tokens' and 'min_completion_tokens' from requests sent to the prefiller service. The review feedback identifies that the manual re-insertion of these keys is redundant because the request data is already a local copy, and warns that explicitly adding them back as null could cause validation issues in the decoder.
| min_tokens = req_data.pop("min_tokens", None) | ||
| min_completion_tokens = req_data.pop("min_completion_tokens", None) |
There was a problem hiding this comment.
Since req_data is a local copy (created at line 161), popping these keys only affects the request sent to the prefiller (P). The original req_data object in the caller (_handle_completions) remains untouched and will still contain these keys when later sent to the decoder (D). You can simply pop them without assigning to variables to avoid unused variable warnings and unnecessary state tracking.
| min_tokens = req_data.pop("min_tokens", None) | |
| min_completion_tokens = req_data.pop("min_completion_tokens", None) | |
| req_data.pop("min_tokens", None) | |
| req_data.pop("min_completion_tokens", None) |
| # Add back the min_tokens and min_completion_tokens so D can use them | ||
| req_data["min_tokens"] = min_tokens | ||
| req_data["min_completion_tokens"] = min_completion_tokens | ||
|
|
There was a problem hiding this comment.
These lines are redundant and potentially problematic.
- Redundancy: Modifying the local
req_datacopy here has no effect on the caller, and the caller's originalreq_dataalready contains these values. - Logic Error: If the keys were originally absent, this code explicitly adds them with a value of
null(None), which may cause validation issues in the decoder if it expects the keys to be missing rather than present with a null value.
Since the goal is to forward the original parameters to the decoder, the existing copy mechanism at line 161 already ensures this behavior without needing to "add back" anything.
Fixed by #39709 |
Signed-off-by: NickLucche <nlucches@redhat.com>
f677f77 to
3ff2cd8
Compare
…)" This reverts commit 3daca38.
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
Sending a request with min_tokens to the toy_proxy_server.py currently fails on P as it receives a request with max_tokens=1 and min_tokens>1, crashing on validation.
This small patch allows to skip min_tokens sending to P, while forwarding that to D only.