-
Notifications
You must be signed in to change notification settings - Fork 2.3k
fix: handle UTF-8 boundary issues in streaming responses for vLLM models #8788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add UTF8StreamDecoder utility to properly handle multi-byte UTF-8 characters split across chunk boundaries - Integrate decoder into OpenAI and BaseOpenAiCompatibleProvider to fix garbled output with large files - Add comprehensive tests for UTF-8 boundary handling - Fixes issue #8787 where vLLM outputs showed garbled characters with large file outputs
Review SummaryI've reviewed this PR that fixes UTF-8 boundary issues in streaming responses for vLLM models. The implementation is well-structured and correctly handles the core use case of buffering incomplete UTF-8 sequences from byte chunks. Issues Found
Overall AssessmentThe core UTF-8 decoding logic is sound and the comprehensive test suite validates the main use cases. The flagged issue is a minor robustness concern that should be addressed. |
| const decoded = this.decoder.decode(validBytes, { stream: true }) | ||
|
|
||
| // Buffer the incomplete portion for the next chunk | ||
| this.buffer = bytes.slice(lastValid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer is overwritten here without combining with existing content from previous chunks. If a Uint8Array chunk leaves partial bytes in the buffer, and the next chunk is a string, those buffered bytes will be lost. While mixing chunk types is unlikely in practice (OpenAI SDK typically returns consistent types), this should use this.buffer = this.combineBuffers(this.buffer, bytes.slice(lastValid)) for correctness.
| this.buffer = bytes.slice(lastValid) | |
| // Buffer the incomplete portion for the next chunk | |
| this.buffer = this.combineBuffers(this.buffer, bytes.slice(lastValid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete. One minor robustness issue has been flagged regarding buffer handling in handleStringChunk.
Description
This PR fixes issue #8787 where garbled characters appear when analyzing large files (1000+ lines) with vLLM models through the OpenAI-compatible API.
Problem
When streaming large outputs from vLLM models, UTF-8 multi-byte characters can be split across chunk boundaries, resulting in garbled output. This particularly affects users working with non-ASCII content or large codebases.
Solution
Implemented a UTF8StreamDecoder utility that:
Changes
UTF8StreamDecoderutility class for proper UTF-8 boundary handlingOpenAiHandlerandBaseOpenAiCompatibleProviderTesting
Impact
This fix ensures reliable output when using vLLM or other OpenAI-compatible models with large file analysis, improving the experience for users working with:
Fixes #8787
Important
Fixes UTF-8 boundary issues in vLLM model streaming by introducing
UTF8StreamDecoderfor proper character decoding.UTF8StreamDecoderto handle UTF-8 boundary issues in streaming responses.UTF8StreamDecoderintoBaseOpenAiCompatibleProviderandOpenAiHandlerto decode UTF-8 content properly.utf8-stream-decoder.test.tswith 16 test cases for UTF-8 edge cases, including multi-byte character splitting and emoji handling.base-openai-compatible-provider.ts: IntegratesUTF8StreamDecoderfor streaming responses.openai.ts: IntegratesUTF8StreamDecoderfor handling large outputs.utf8-stream-decoder.ts: Implements theUTF8StreamDecoderclass.This description was created by
for 92dd39b. You can customize this summary. It will automatically update as commits are pushed.