Fix successful MLX tokenizer loads by krystophny · Pull Request #2 · computor-org/vllm-mlx

krystophny · 2026-03-24T07:40:36Z

Summary

fix load_model_with_fallback() so successful mlx_lm.load() calls return the (model, tokenizer) tuple instead of falling through as None
add regression coverage for the successful load path

Why this is independently deployable

pure loader bugfix
no API, CLI, batching, or protocol behavior change
useful regardless of whether the Responses work is merged

Related context

This bug sits beneath several other Apple Silicon model-serving paths.

Relevant surrounding work in waybarrios/vllm-mlx:

#127 merged broader Qwen3.5 text support and streaming fixes: Add Qwen3.5 model support (text-only) and fix reasoning+tool streaming waybarrios/vllm-mlx#127
open hybrid/cache/runtime work continues in #144, #160, #165, #183, and #194: fix: handle 3D KV tensors in prefix cache for Qwen3.5 models waybarrios/vllm-mlx#144 fix: pass size to ArraysCache in BatchMambaCache for Qwen3.5 hybrid models waybarrios/vllm-mlx#160 fix: MLLM continuous batching for hybrid models waybarrios/vllm-mlx#165 fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) waybarrios/vllm-mlx#183 fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0 waybarrios/vllm-mlx#194
our fork’s separate hybrid-cache follow-up lives in krystophny/vllm-mlx#6: Fix Qwen3.5 hybrid paged cache reconstruction #6

This PR deliberately stays below all of that and only fixes the successful return path.

Validation

PYTHONPATH=/Users/ert/code/vllm-mlx /Users/ert/code/.venv/bin/python -m pytest tests/test_tokenizer_utils.py -q
python3 -m compileall vllm_mlx

What could still improve

broader loader-path coverage for strict/strict-false fallbacks and hybrid model families
explicit end-to-end smoke tests for each benchmark model alias used by FortBench

fix: return successful mlx-lm loads

3ed3a69

krystophny changed the title ~~Return successful mlx-lm loads~~ Fix successful MLX tokenizer loads Mar 24, 2026

krystophny mentioned this pull request Mar 24, 2026

Fix Qwen3.5 hybrid paged cache reconstruction #6

Merged

krystophny merged commit fc0608b into main Mar 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix successful MLX tokenizer loads#2

Fix successful MLX tokenizer loads#2
krystophny merged 1 commit intomainfrom
feature/fix-tokenizer-load-return

krystophny commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is independently deployable

Related context

Validation

What could still improve

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Mar 24, 2026 •

edited

Loading