Fix broken AsyncInferenceClient on [DONE] signal #2458

Wauplin · 2024-08-19T13:05:11Z

Let's strip bytes messages to avoid newlines. Thanks @cordawyn for noticing and reporting it. I've been able to reproduce it and updated the docs accordingly.

Also a good reminder that cassettes are not ideal for inference tests. Here the bug came from a non-tested TGI update. Surely something to improve in our CI but that will be done in a future CI.

HuggingFaceDocBuilderDev · 2024-08-19T13:09:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

* fix trailing newlines * fix cassettes * quality

Wauplin · 2024-08-19T15:18:47Z

The hot-fix release has been shipped: https://github.com/huggingface/huggingface_hub/releases/tag/v0.24.6.

cordawyn · 2024-08-19T19:12:49Z

src/huggingface_hub/inference/_common.py

@@ -355,7 +355,7 @@ def _format_chat_completion_stream_output(

 async def _async_yield_from(client: "ClientSession", response: "ClientResponse") -> AsyncIterable[bytes]:
    async for byte_payload in response.content:
-        yield byte_payload
+        yield byte_payload.strip()


@Wauplin Btw, the two other byte_payload.rstrip() (above) may not be necessary, as _format_text_generation_stream_output and _format_chat_completion_stream_output receive "[DONE]" w/o "\n" from the regular client (which uses .iter_lines and strips newlines naturally), while the byte_payload for the async client passes through _async_yield_from here first.

Maybe just a micro optimization, in case performance ever becomes an issue here.

Yes, I saw that but better to be safer. Each method taken individually should be able to work with unstripped data and return stripped one. Usually inference time is much higher than the parsing overhead.

Wauplin added 2 commits August 19, 2024 14:43

fix trailing newlines

12bb543

fix cassettes

0aa1ee7

Wauplin requested a review from LysandreJik August 19, 2024 13:05

quality

366576f

Wauplin mentioned this pull request Aug 19, 2024

AsyncInferenceClient.chat_completion streaming is broken with TGI 2.2.0 #2455

Closed

Wauplin mentioned this pull request Aug 19, 2024

Problem with asyncc #2461

Closed

Wauplin merged commit 46e26fc into main Aug 19, 2024
15 of 17 checks passed

Wauplin deleted the 2455-fix-done-signal-with-newline branch August 19, 2024 15:13

Wauplin added a commit that referenced this pull request Aug 19, 2024

Fix broken AsyncInferenceClient on [DONE] signal (#2458)

b2261a1

* fix trailing newlines * fix cassettes * quality

cordawyn reviewed Aug 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix broken AsyncInferenceClient on [DONE] signal #2458

Fix broken AsyncInferenceClient on [DONE] signal #2458

Wauplin commented Aug 19, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 19, 2024

Wauplin commented Aug 19, 2024

cordawyn Aug 19, 2024 •

edited

Loading

Wauplin Aug 19, 2024

Fix broken AsyncInferenceClient on [DONE] signal #2458

Fix broken AsyncInferenceClient on [DONE] signal #2458

Conversation

Wauplin commented Aug 19, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Aug 19, 2024

Wauplin commented Aug 19, 2024

cordawyn Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

Wauplin Aug 19, 2024

Choose a reason for hiding this comment

Wauplin commented Aug 19, 2024 •

edited

Loading

cordawyn Aug 19, 2024 •

edited

Loading