-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hot-fix] Handle [DONE] signal from TGI + remove logic for "non-TGI servers" #2410
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -860,10 +860,10 @@ def chat_completion( | |||
stream=stream, | |||
) | |||
except HTTPError as e: | |||
if e.response.status_code in (400, 404, 500): | |||
if e.response.status_code in (400, 500): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should really be only 1 error code here.
Can you check on a real deployment or in TGI source code what happens on non existing template models ?
I think it should be a 4xx since it's not really a server error. No idea for which one is more appropriate.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status#client_error_responses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check on a real deployment or in TGI source code what happens on non existing template models ?
It returns a
huggingface_hub.utils._errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/gpt2/v1/chat/completions (Request ID: 3Y0mKxT7AmdMSZsfNjcQA)
Template error: template not found
But in fact this is not even the problem. Since client-side rendering has been completely removed (in #2258), there is no point treating TGI-served and other models differently. I've updated the PR to simplify the logic which will finally avoid hiding errors to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and swift fix @Wauplin
Sorry for the mess, I realized I should have done 2 PRs to fix these separately. @Narsil 's comment #2410 (comment) made me realize that we don't even need 2 different behaviors anymore in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't dive into the changes given the time sensitivity but from a quick look it looks good to me.
…ervers" (#2410) * Handle [DONE] signal from TGI * fix text_generation as well * Handle error 404 correctly * consistency + stop treating transformers-backed models differently * fix test * fix broken test on main * fix test * cleaner
What's in this PR?:
Related to fix: append DONE message to chat stream text-generation-inference#2221. TGI now returns a
b"data: [DONE]"
stop signal when iterating through generated tokens (stream=True). This PR adds support for this stop signal. Once this PR is merged, I'll make a hot-fix release sinceInferenceClient
is currently broken on newest versions of TGI fortext_generation
andchat_completion
.I also removed all the "non-tgi" logic in
chat_completion
since every model is served by TGI now, even when it's a transformers-only model (e.g."microsoft/DialoGPT-small"
. This simplifies a lot the logic and will avoid hiding relevant errors to the users. In case the model is transformers-served and is not compatible with chat completion, a422 unprocessable entity: template not found
is returned.I also took the opportunity to rename some private helpers for consistency.
Better to hide whitespace changes to review this PR.