[Frontend] Add max-completion-token option to transcription/translation endpoints#30769
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request adds a max_completion_tokens option to the transcription and translation endpoints. The implementation in vllm/entrypoints/openai/speech_to_text.py has a critical bug that will cause a TypeError when max_completion_tokens is not provided, and an AttributeError for translation requests. I've provided a suggestion to fix this. To fully support this for translations, max_completion_tokens should also be added to the TranslationRequest protocol. Additionally, a test case for the translation endpoint with this new option would be beneficial.
|
Hi @NickLucche, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
…tion endpoints (vllm-project#30769) Signed-off-by: NickLucche <nlucches@redhat.com>
…tion endpoints (vllm-project#30769) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>
…tion endpoints (vllm-project#30769) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
…tion endpoints (vllm-project#30769) Signed-off-by: NickLucche <nlucches@redhat.com>
Straightforward PR to allow users to specify max tokens generated on a STT endpoint, using an additional non-OAI argument.