fix: Use streaming detokenizer for UTF-8-safe incremental decode#109
Merged
waybarrios merged 3 commits intowaybarrios:mainfrom Mar 31, 2026
Merged
Conversation
Replace per-token tokenizer.decode([token]) with a streaming detokenizer that buffers partial UTF-8 byte sequences. This fixes corrupted multi-byte characters (e.g. Czech 'ď' → '��') during SSE streaming, caused by byte-level tokens being decoded individually instead of accumulated until a complete UTF-8 character boundary. Uses mlx_lm's NaiveStreamingDetokenizer (or the optimized BPEStreamingDetokenizer when available via tokenizer.detokenizer) with a per-request pool that is cleaned up on request completion. Both LLM scheduler and MLLM scheduler are fixed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5 tasks
Owner
|
Pushed a cleanup commit to fix detokenizer pool leaks in all exit paths: scheduler.py:
mllm_scheduler.py:
Also closed #195 as duplicate since this PR covers both schedulers. |
waybarrios
approved these changes
Mar 31, 2026
janhilgard
added a commit
to janhilgard/vllm-mlx
that referenced
this pull request
Apr 1, 2026
Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160), platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227), base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109), and cleanup commits. Conflicts resolved: - scheduler.py: keep make_logits_processors import (fork feature) - mllm_scheduler.py: take upstream stop-token skip in detokenizer - models/mllm.py: keep SHA256 hash (fork fix for collision) - utils/tokenizer.py: merge upstream error message with fork elif chain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sysit
pushed a commit
to sysit/vllm-mlx
that referenced
this pull request
Apr 1, 2026
…detokenizer fix: Use streaming detokenizer for UTF-8-safe incremental decode
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tokenizer.decode([token])withNaiveStreamingDetokenizer(orBPEStreamingDetokenizerwhen available) for UTF-8-safe incremental decoding during SSE streamingď→��, emoji →���) in streaming responsesProblem
When a tokenizer produces byte-level tokens, multi-byte UTF-8 characters (like
ď=0xC4 0x8F) are split across two tokens. Decoding each byte-token individually viatokenizer.decode([single_token])produces invalid UTF-8 (replacement character U+FFFD), which then gets sent to clients in SSE chunks:The word "ďábelské" appears as "��ábelské" in the client.
Fix
Use mlx_lm's streaming detokenizer which buffers byte-tokens until a complete UTF-8 character boundary is reached:
A per-request detokenizer pool is maintained and cleaned up when requests finish.
Changes
vllm_mlx/scheduler.py_detokenizer_pool,_get_detokenizer(),_cleanup_detokenizer(); use streaming detokenizer in_process_batch_responses()vllm_mlx/mllm_scheduler.pyTest plan
uvx black --check vllm_mlx/passestests/test_streaming_detokenizer.pyvalidates the detokenizer logicPříliš žluťoučký kůň úpěl ďábelské ódy) and verify no�in SSE chunksdetok.finalize()+detok.textfor full output)🤖 Generated with Claude Code