Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken AsyncInferenceClient on [DONE] signal #2458

Merged
merged 3 commits into from
Aug 19, 2024

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Aug 19, 2024

Fixed #2455 reported by @cordawyn.

Let's strip bytes messages to avoid newlines. Thanks @cordawyn for noticing and reporting it. I've been able to reproduce it and updated the docs accordingly.

Also a good reminder that cassettes are not ideal for inference tests. Here the bug came from a non-tested TGI update. Surely something to improve in our CI but that will be done in a future CI.

@Wauplin Wauplin requested a review from LysandreJik August 19, 2024 13:05
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Wauplin Wauplin mentioned this pull request Aug 19, 2024
@Wauplin Wauplin merged commit 46e26fc into main Aug 19, 2024
15 of 17 checks passed
@Wauplin Wauplin deleted the 2455-fix-done-signal-with-newline branch August 19, 2024 15:13
Wauplin added a commit that referenced this pull request Aug 19, 2024
* fix trailing newlines

* fix cassettes

* quality
@Wauplin
Copy link
Contributor Author

Wauplin commented Aug 19, 2024

@@ -355,7 +355,7 @@ def _format_chat_completion_stream_output(

async def _async_yield_from(client: "ClientSession", response: "ClientResponse") -> AsyncIterable[bytes]:
async for byte_payload in response.content:
yield byte_payload
yield byte_payload.strip()
Copy link

@cordawyn cordawyn Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wauplin Btw, the two other byte_payload.rstrip() (above) may not be necessary, as _format_text_generation_stream_output and _format_chat_completion_stream_output receive "[DONE]" w/o "\n" from the regular client (which uses .iter_lines and strips newlines naturally), while the byte_payload for the async client passes through _async_yield_from here first.

Maybe just a micro optimization, in case performance ever becomes an issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I saw that but better to be safer. Each method taken individually should be able to work with unstripped data and return stripped one. Usually inference time is much higher than the parsing overhead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AsyncInferenceClient.chat_completion streaming is broken with TGI 2.2.0
3 participants