fix: streaming detokenizer for UTF-8-safe incremental decode by Thump604 · Pull Request #195 · waybarrios/vllm-mlx

Thump604 · 2026-03-21T21:40:03Z

Summary

Replace raw tokenizer.decode([token]) with NaiveStreamingDetokenizer in the BatchedEngine scheduler. The raw decode splits multi-byte codepoints (emoji, CJK characters) into surrogate pairs (\ud83d\udc4b instead of actual emoji) because individual tokens may represent incomplete UTF-8 byte sequences.

NaiveStreamingDetokenizer (from mlx_lm.tokenizer_utils) buffers incomplete byte sequences and only emits valid UTF-8 segments, matching how mlx-lm's own server handles streaming output.

Changes

Import NaiveStreamingDetokenizer from mlx_lm.tokenizer_utils
Add _detokenizer_pool dict to Scheduler.__init__() for per-request detokenizers
Replace self._decode_tokens([response.token]) with streaming detokenizer in _process_batch_responses()
On request finish, call detok.finalize() and use detok.text for full output
Cleanup in _do_abort_request(), _cleanup_finished(), and reset()

Test plan

Send a chat message that elicits emoji response with --continuous-batching
Verify emoji render correctly (no surrogate pairs like \ud83d\udc4b)
Verify CJK characters render correctly
Verify normal ASCII text is unaffected
Verify aborted requests clean up their detokenizer

Fixes #130

Replace raw tokenizer.decode([token]) with NaiveStreamingDetokenizer in the BatchedEngine scheduler. The raw decode splits multi-byte codepoints (emoji, CJK characters) into surrogate pairs because individual tokens may represent incomplete UTF-8 byte sequences. NaiveStreamingDetokenizer buffers incomplete sequences and only emits valid UTF-8 segments, matching how mlx-lm's own server handles streaming output. Cleanup in abort, finish, and reset paths prevents detokenizer leaks. Fixes waybarrios#130

Thump604 · 2026-03-24T01:34:11Z

CI green. Fixes #130 — streaming detokenizer now handles multi-byte UTF-8 correctly by switching to incremental decode with a byte buffer.

Thump604 · 2026-03-25T01:34:21Z

Evidence from M2 Ultra 128GB, Qwen3.5-122B-A10B, BatchedEngine streaming:

Test	Result
Emoji streaming	PASS -- emoji characters stream correctly without garbling
Mixed multi-byte (CJK + emoji + Latin)	PASS -- all character types preserved
CJK-only	FAIL (model compliance) -- model returned empty response to the constrained prompt, not a streaming bug

The core fix (NaiveStreamingDetokenizer for UTF-8 safe incremental decode) works. The CJK failure is prompt compliance on the 122B, not detokenizer behavior -- the emoji test proves multi-byte streaming is correct.

This is the only PR addressing Unicode streaming in BatchedEngine. Without it, multi-byte characters can be split across SSE chunks, producing replacement characters on the client side.

waybarrios · 2026-03-31T20:41:58Z

Closing in favor of #109 which covers both scheduler.py and mllm_scheduler.py. The cleanup patterns from this PR (abort, reset, cleanup_finished) have been added to #109 as well.

Thump604 force-pushed the fix/streaming-detokenizer-unicode branch from ddd420f to ae703b8 Compare March 22, 2026 12:17

Thump604 mentioned this pull request Mar 23, 2026

feat: memory-aware admission controller for multi-user serving #204

Closed

6 tasks

Thump604 mentioned this pull request Mar 30, 2026

feat: full sampling parameter support (top_k, min_p, presence_penalty, repetition_penalty) #213

Open

5 tasks

waybarrios closed this Mar 31, 2026

waybarrios mentioned this pull request Mar 31, 2026

fix: Use streaming detokenizer for UTF-8-safe incremental decode #109

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: streaming detokenizer for UTF-8-safe incremental decode#195

fix: streaming detokenizer for UTF-8-safe incremental decode#195
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:fix/streaming-detokenizer-unicode

Thump604 commented Mar 21, 2026

Uh oh!

Thump604 commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Thump604 commented Mar 21, 2026

Summary

Changes

Test plan

Uh oh!

Thump604 commented Mar 24, 2026

Uh oh!

Thump604 commented Mar 25, 2026

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants