[Bugfix] Fix correct error message when len(prompt) + max_tokens > max_model_len#33425
[Bugfix] Fix correct error message when len(prompt) + max_tokens > max_model_len#33425sducouedic wants to merge 4 commits intovllm-project:mainfrom
len(prompt) + max_tokens > max_model_len#33425Conversation
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
There was a problem hiding this comment.
Code Review
This pull request provides a helpful bugfix to improve the error message when a user's request exceeds the model's maximum context length due to a long prompt and a large max_tokens value. The previous error message was misleading, and the new one correctly identifies the model's maximum context length and points to both the prompt length and max_tokens as the cause of the issue. The change is well-implemented and significantly improves user experience when encountering this validation error. The implementation is correct for the intended use case, and I have no concerns.
DarkLight1337
left a comment
There was a problem hiding this comment.
Will be addressed as part of #32863
|
This pull request has merge conflicts that must be resolved before it can be |
|
Closing as superseded by #32863, thanks for your efforts though! |
Closes #33418
Fixes the error message, displaying wrong
max_model_lenvllm serve ./llama-194m --max-model-len 2048cc: @yannicks1