Skip to content

fix: add missing return in load_model_with_fallback#243

Closed
wayne-o wants to merge 1 commit intowaybarrios:mainfrom
wayne-o:fix/load-model-missing-return
Closed

fix: add missing return in load_model_with_fallback#243
wayne-o wants to merge 1 commit intowaybarrios:mainfrom
wayne-o:fix/load-model-missing-return

Conversation

@wayne-o
Copy link
Copy Markdown

@wayne-o wayne-o commented Apr 1, 2026

Summary

  • load_model_with_fallback() in vllm_mlx/utils/tokenizer.py is missing a return statement after the successful mlx_lm.load() call
  • When load() succeeds without raising ValueError, the function falls through the try/except and implicitly returns None
  • The caller (MLXLanguageModel.load()) then crashes with TypeError: cannot unpack non-iterable NoneType object
  • This affects all models that load successfully on the first try (i.e. most models that don't need the Nemotron/vision fallback paths)

Fix

One line: return model, tokenizer after the successful load() call.

Test plan

  • Verified vllm-mlx serve mlx-community/Mistral-Small-3.2-24B-Instruct-2506-4bit starts successfully
  • Verified vllm-mlx serve mlx-community/Qwen3.5-9B-4bit starts successfully
  • Both models previously crashed with the NoneType error on v0.2.7

🤖 Generated with Claude Code

When `mlx_lm.load()` succeeds without raising a `ValueError`,
`load_model_with_fallback()` falls through the try/except block
without returning the (model, tokenizer) tuple, causing the caller
to receive `None` and crash with:

    TypeError: cannot unpack non-iterable NoneType object

This affects all models that load successfully on the first try
(i.e., most models that don't need the Nemotron/vision fallback paths).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Thump604 Thump604 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed this bug on current main (4ede902). Line 54 is an assignment, not a return — the success path falls through and returns None. The caller in batched.py:257 crashes with TypeError.

Three independent PRs (#230, #235, #237) and two issue reports (#211, #212) all identified the same bug. This fix is correct.

@perry2of5
Copy link
Copy Markdown
Contributor

I confirmed that this allows vllm-mlx serve mlx-community/Qwen3.5-35B-A3B-4bit --port 7979 --continuous-batching to succeed from v0.2.7

@waybarrios waybarrios added the bug Something isn't working label Apr 1, 2026
@waybarrios
Copy link
Copy Markdown
Owner

Code review for PR #243

The fix is correct. The function load_model_with_fallback() in vllm_mlx/utils/tokenizer.py was doing this:

try:
    model, tokenizer = load(model_name, tokenizer_config=tokenizer_config)
except ValueError as e:
    ...

Without the return model, tokenizer, when load() succeeds the function returns None, and callers like llm.py:89 and batched.py:257 that do self.model, self.tokenizer = load_model_with_fallback(...) crash with TypeError: cannot unpack non-iterable NoneType object.

The bug was introduced in commit c70b80b when return load(...) was changed to model, tokenizer = load(...) to support _try_inject_mtp_post_load(), but the return was never added back.

Checked git blame, previous PRs (#215, #230, #235, #237), and code comments in the file. No additional bugs introduced by this change.

Could you also add a regression test for this? PR #215 included one that reviewers found useful. Something that verifies load_model_with_fallback() actually returns a (model, tokenizer) tuple on the happy path instead of None. That would prevent this from regressing in future refactors.

@waybarrios
Copy link
Copy Markdown
Owner

@wayne-o let me know if the tests are feasible to add on this PR.

@npomfret
Copy link
Copy Markdown

npomfret commented Apr 7, 2026

I patched my local install and it works for me

@Thump604
Copy link
Copy Markdown
Collaborator

Thump604 commented Apr 7, 2026

This PR closes the bug described in issues #211, #212, #249, and #252, all of which are open and describe the same root cause: load_model_with_fallback in vllm_mlx/utils/tokenizer.py falls through after a successful mlx_lm.load() call and returns None, causing the caller to crash with TypeError: cannot unpack non-iterable NoneType object. Your one-line fix at line 55 resolves all four.

@wayne-o, would you mind adding Closes #211, #212, #249, #252 to the PR description so all four issues auto-close on merge? Saves @waybarrios the manual cleanup pass.

@jtatum
Copy link
Copy Markdown

jtatum commented Apr 9, 2026

Is this a dupe of #215 ?

@Thump604
Copy link
Copy Markdown
Collaborator

Thump604 commented Apr 9, 2026

Yes and no. #243 is a minimal 1-line fix for the same root cause as #215 (krystophny), but #215 is strictly broader: same 1-line return in vllm_mlx/utils/tokenizer.py plus a 50-line regression test in tests/test_tokenizer_utils.py plus an Apple Silicon CI entry that runs the tokenizer regression.

#215's PR body explicitly says "This overlaps with #243; this PR keeps the regression test with the fix." Both PRs are pointing at the same bug. The practical decision is:

Either order closes the bug. Both touch the same single line.

The underlying bug is also tracked in issues #211, #212, #249, and #252, all of which will close once either PR lands.

@perry2of5
Copy link
Copy Markdown
Contributor

I could validate a local model with #243 if it helps. Logically it has the same outcome, but seeing is believing.

@Thump604
Copy link
Copy Markdown
Collaborator

Thump604 commented Apr 9, 2026

Second validation is welcome. You already proved #243 works end-to-end on Qwen3.5-35B-A3B-4bit continuous-batching from v0.2.7, so a second model widens the empirical envelope for the one-line fix, especially on a non-continuous-batching model class where the fallback path is exercised differently.

If #215 is also easy to pull locally, validating both branches would be the strongest signal for @waybarrios — same fix, but #215 ships with the tokenizer regression test that would prevent the bug from reappearing on a future refactor. Either PR closes the root cause in #211 / #212 / #249 / #252.

@perry2of5
Copy link
Copy Markdown
Contributor

unsurprisingly, 243 works also and the test completes successfully when merged onto v0.2.7.

@waybarrios
Copy link
Copy Markdown
Owner

Hey Wayne, good eye on this one. That missing return was biting everyone. We ended up merging #268 which includes the same fix along with the Gemma 4 work, so this is covered now. Appreciate you taking the time to track it down.

@waybarrios waybarrios closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants