-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix broken AsyncInferenceClient on [DONE] signal #2458
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* fix trailing newlines * fix cassettes * quality
The hot-fix release has been shipped: https://github.com/huggingface/huggingface_hub/releases/tag/v0.24.6. |
@@ -355,7 +355,7 @@ def _format_chat_completion_stream_output( | |||
|
|||
async def _async_yield_from(client: "ClientSession", response: "ClientResponse") -> AsyncIterable[bytes]: | |||
async for byte_payload in response.content: | |||
yield byte_payload | |||
yield byte_payload.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Wauplin Btw, the two other byte_payload.rstrip()
(above) may not be necessary, as _format_text_generation_stream_output
and _format_chat_completion_stream_output
receive "[DONE]" w/o "\n" from the regular client (which uses .iter_lines
and strips newlines naturally), while the byte_payload
for the async client passes through _async_yield_from
here first.
Maybe just a micro optimization, in case performance ever becomes an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I saw that but better to be safer. Each method taken individually should be able to work with unstripped data and return stripped one. Usually inference time is much higher than the parsing overhead.
Fixed #2455 reported by @cordawyn.
Let's strip bytes messages to avoid newlines. Thanks @cordawyn for noticing and reporting it. I've been able to reproduce it and updated the docs accordingly.
Also a good reminder that cassettes are not ideal for inference tests. Here the bug came from a non-tested TGI update. Surely something to improve in our CI but that will be done in a future CI.