Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 5 additions & 3 deletions vllm/entrypoints/renderer.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,9 +396,11 @@ def _create_tokens_prompt(
"""Create validated TokensPrompt."""
if max_length is not None and len(token_ids) > max_length:
raise VLLMValidationError(
f"This model's maximum context length is {max_length} tokens. "
f"However, your request has {len(token_ids)} input tokens. "
"Please reduce the length of the input messages.",
f"The token count of your prompt ({len(token_ids)})"
f"plus request's max_tokens cannot exceed the"
f"model's context length of {self.model_config.max_model_len}. "
f"Maximum allowed input is {max_length} tokens. "
"Please reduce the input length or decrease max_tokens.",
Comment on lines +399 to +403
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The new error message is much more informative, but it has a formatting issue. The separate f-strings on lines 399 and 400 will be concatenated without a space, resulting in a malformed message containing ...prompt (X)plus... and ...exceed themodel's.... To ensure the message is readable, spaces should be added at the end of these lines.

Suggested change
f"The token count of your prompt ({len(token_ids)})"
f"plus request's max_tokens cannot exceed the"
f"model's context length of {self.model_config.max_model_len}. "
f"Maximum allowed input is {max_length} tokens. "
"Please reduce the input length or decrease max_tokens.",
f"The token count of your prompt ({len(token_ids)}) "
f"plus request's max_tokens cannot exceed the "
f"model's context length of {self.model_config.max_model_len}. "
f"Maximum allowed input is {max_length} tokens. "
"Please reduce the input length or decrease max_tokens.",

parameter="input_tokens",
value=len(token_ids),
)
Expand Down