Skip to content

[Bugfix] Fix correct error message when len(prompt) + max_tokens > max_model_len#33425

Closed
sducouedic wants to merge 4 commits intovllm-project:mainfrom
sducouedic:fix_max_model_len
Closed

[Bugfix] Fix correct error message when len(prompt) + max_tokens > max_model_len#33425
sducouedic wants to merge 4 commits intovllm-project:mainfrom
sducouedic:fix_max_model_len

Conversation

@sducouedic
Copy link
Copy Markdown
Contributor

@sducouedic sducouedic commented Jan 30, 2026

Closes #33418

Fixes the error message, displaying wrong max_model_len

  • For vllm instance: vllm serve ./llama-194m --max-model-len 2048
  • Request:
    curl -X 'POST'   'http://localhost:8000/v1/completions' \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '{ "model": "./llama-194m", "prompt": "What is the capital of Paris?",  "max_tokens": 2045 }'
    
  • Previously:
    {"error":{"message":"This model's maximum context length is 3 tokens. However, your request has 8 input tokens. Please reduce the length of the input messages. (parameter=input_tokens, value=8)","type":"BadRequestError","param":"input_tokens","code":400}}
    
  • Now:
    {"error":{"message":"This model's maximum context length is 2048 tokens. However, your request has 8 input tokens plus 2045 'max_tokens'. Please reduce one or the other. (parameter=input_tokens, max_tokens, value=(8, 2045))","type":"BadRequestError","param":"input_tokens, max_tokens","code":400}}
    

cc: @yannicks1

Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
@mergify mergify bot added frontend bug Something isn't working labels Jan 30, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a helpful bugfix to improve the error message when a user's request exceeds the model's maximum context length due to a long prompt and a large max_tokens value. The previous error message was misleading, and the new one correctly identifies the model's maximum context length and points to both the prompt length and max_tokens as the cause of the issue. The change is well-implemented and significantly improves user experience when encountering this validation error. The implementation is correct for the intended use case, and I have no concerns.

Copy link
Copy Markdown
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be addressed as part of #32863

@mergify
Copy link
Copy Markdown

mergify bot commented Jan 31, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @sducouedic.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jan 31, 2026
@DarkLight1337
Copy link
Copy Markdown
Member

Closing as superseded by #32863, thanks for your efforts though!

@sducouedic sducouedic deleted the fix_max_model_len branch February 19, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working frontend needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: wrong error reported when len(prompt) + requested tokens > max_context_len

2 participants