fix: Use streaming detokenizer for UTF-8-safe incremental decode by janhilgard · Pull Request #109 · waybarrios/vllm-mlx

janhilgard · 2026-02-24T12:48:22Z

Summary

Replace per-token tokenizer.decode([token]) with NaiveStreamingDetokenizer (or BPEStreamingDetokenizer when available) for UTF-8-safe incremental decoding during SSE streaming
Fix corrupted multi-byte characters (e.g. Czech ď → ��, emoji → ��) in streaming responses
Both LLM scheduler and MLLM scheduler are fixed

Problem

When a tokenizer produces byte-level tokens, multi-byte UTF-8 characters (like ď = 0xC4 0x8F) are split across two tokens. Decoding each byte-token individually via tokenizer.decode([single_token]) produces invalid UTF-8 (replacement character U+FFFD), which then gets sent to clients in SSE chunks:

chunk 1: {"content":" \ufffd"}     ← 0xC4 decoded alone → invalid
chunk 2: {"content":"\ufffdáb"}    ← 0x8F decoded alone → invalid

The word "ďábelské" appears as "��ábelské" in the client.

Fix

Use mlx_lm's streaming detokenizer which buffers byte-tokens until a complete UTF-8 character boundary is reached:

# Before (broken):
new_text = tokenizer.decode([response.token])

# After (UTF-8 safe):
detok = self._get_detokenizer(request_id)
detok.add_token(response.token)
new_text = detok.last_segment

A per-request detokenizer pool is maintained and cleaned up when requests finish.

Changes

File	Change
`vllm_mlx/scheduler.py`	Add `_detokenizer_pool`, `_get_detokenizer()`, `_cleanup_detokenizer()`; use streaming detokenizer in `_process_batch_responses()`
`vllm_mlx/mllm_scheduler.py`	Same fix for the MLLM code path

Test plan

uvx black --check vllm_mlx/ passes
Existing tests/test_streaming_detokenizer.py validates the detokenizer logic
Stream a response containing multi-byte characters (e.g. Příliš žluťoučký kůň úpěl ďábelské ódy) and verify no � in SSE chunks
Non-streaming responses still work correctly (uses detok.finalize() + detok.text for full output)

🤖 Generated with Claude Code

Replace per-token tokenizer.decode([token]) with a streaming detokenizer that buffers partial UTF-8 byte sequences. This fixes corrupted multi-byte characters (e.g. Czech 'ď' → '��') during SSE streaming, caused by byte-level tokens being decoded individually instead of accumulated until a complete UTF-8 character boundary. Uses mlx_lm's NaiveStreamingDetokenizer (or the optimized BPEStreamingDetokenizer when available via tokenizer.detokenizer) with a per-request pool that is cleaned up on request completion. Both LLM scheduler and MLLM scheduler are fixed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

waybarrios · 2026-03-31T20:42:07Z

Pushed a cleanup commit to fix detokenizer pool leaks in all exit paths:

scheduler.py:

_do_abort_request now calls _cleanup_detokenizer
_recover_from_generation_error clears the pool after aborting all requests
reset() clears the pool

mllm_scheduler.py:

abort_request now pops from _detokenizer_pool
reset() clears the pool

Also closed #195 as duplicate since this PR covers both schedulers.

…r.py

Brings in: prompt_tokens fix (waybarrios#236), ArraysCache batching (waybarrios#160), platform rename (waybarrios#185), mlx-lm 0.31 compat (waybarrios#183, waybarrios#227), base64 hash fix (waybarrios#206), streaming UTF-8 detokenizer (waybarrios#109), and cleanup commits. Conflicts resolved: - scheduler.py: keep make_logits_processors import (fork feature) - mllm_scheduler.py: take upstream stop-token skip in detokenizer - models/mllm.py: keep SHA256 hash (fork fix for collision) - utils/tokenizer.py: merge upstream error message with fork elif chain Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…detokenizer fix: Use streaming detokenizer for UTF-8-safe incremental decode

janhilgard and others added 2 commits February 24, 2026 13:47

fix: clean up detokenizer pool in abort, reset, and error recovery paths

a19cbac

waybarrios mentioned this pull request Mar 31, 2026

fix: streaming detokenizer for UTF-8-safe incremental decode #195

Closed

5 tasks

fix: skip stop tokens in mllm_scheduler detokenizer to match schedule…

0197873

…r.py

waybarrios approved these changes Mar 31, 2026

View reviewed changes

waybarrios merged commit 4ede902 into waybarrios:main Mar 31, 2026
7 checks passed

sysit pushed a commit to sysit/vllm-mlx that referenced this pull request Apr 1, 2026

Merge pull request waybarrios#109 from janhilgard/fix/streaming-utf8-…

dac2a4e

…detokenizer fix: Use streaming detokenizer for UTF-8-safe incremental decode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Use streaming detokenizer for UTF-8-safe incremental decode#109

fix: Use streaming detokenizer for UTF-8-safe incremental decode#109
waybarrios merged 3 commits intowaybarrios:mainfrom
janhilgard:fix/streaming-utf8-detokenizer

janhilgard commented Feb 24, 2026

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

janhilgard commented Feb 24, 2026

Summary

Problem

Fix

Changes

Test plan

Uh oh!

waybarrios commented Mar 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants