batch generator crashes with prompt_progress_callback on mlx-lm 0.31.x by waybarrios · Pull Request #294 · waybarrios/vllm-mlx

waybarrios · 2026-04-12T13:23:20Z

Problem

Reported in #293 — after the backport in f61d34e, running vllm-mlx bench or vllm-mlx serve with batching enabled crashes immediately on mlx-lm 0.31.x (the latest release).

This affects all users on v0.2.7 installed via pip or uv.

Error 1: `prompt_progress_callback` not a constructor parameter

TypeError: BatchGenerator.__init__() got an unexpected keyword argument 'prompt_progress_callback'

  File ".../vllm_mlx/scheduler.py", line 1252, in _create_batch_generator
    bg = BatchGenerator(
         ^^^^^^^^^^^^^^^
TypeError: BatchGenerator.__init__() got an unexpected keyword argument 'prompt_progress_callback'

Error 2: `_process_prompts` removed in mlx-lm 0.31.x

AttributeError: 'BatchGenerator' object has no attribute '_process_prompts'

The _install_chunked_prefill monkey-patch relies on internal BatchGenerator APIs (_process_prompts, active_batch, _step) that were refactored in mlx-lm 0.31.x.

Error 3: `next()` return format changed

AttributeError: 'list' object has no attribute 'uid'

BatchGenerator.next() now returns a (prompt_responses, generation_responses) tuple instead of a flat list. The scheduler was iterating over the tuple and trying to access .uid on a list.

Error 4: `active_batch` no longer exists

AttributeError: 'BatchGenerator' object has no attribute 'active_batch'

The periodic cache eval in step() references self.batch_generator.active_batch without checking if it exists.

Root cause

The backport commit f61d34e was written against a version of mlx-lm with different internal APIs than the released 0.31.x series. No released version of mlx-lm (up to 0.31.2) has these internal methods/attributes.

Fix

Three changes in scheduler.py:

1. Set prompt_progress_callback as an instance attribute instead of constructor argument

The callback is only consumed by the _install_chunked_prefill monkey-patch (lines 343, 566), not by BatchGenerator itself.

         bg = BatchGenerator(
             ...
-            prompt_progress_callback=_prefill_progress,
         )
+        bg.prompt_progress_callback = _prefill_progress

2. Guard _install_chunked_prefill with a compatibility check

Skip the monkey-patch gracefully when the required BatchGenerator internals are absent, and log a warning:

chunked_compatible = hasattr(bg, "_process_prompts") and hasattr(bg, "active_batch")

if need_chunked and chunked_compatible:
    _install_chunked_prefill(...)
elif need_chunked and not chunked_compatible:
    logger.warning("Chunked prefill disabled: mlx-lm BatchGenerator lacks ...")

3. Handle next() return format change

-                    responses = self.batch_generator.next()
+                    result = self.batch_generator.next()
+                    # mlx-lm >=0.31.x returns (prompt_responses, generation_responses)
+                    if isinstance(result, tuple):
+                        responses = result[1]
+                    else:
+                        responses = result

4. Add hasattr guard for active_batch

             if (
                 self.batch_generator is not None
+                and hasattr(self.batch_generator, "active_batch")
                 and self.batch_generator.active_batch is not None

Test results

tests/test_batching.py         — 22 passed
tests/test_memory_stability.py — 15 passed
======================= 37 passed, 2 deselected in 3.29s =======================

Smoke test: `vllm-mlx bench` (Llama-3.2-1B-Instruct-4bit)

Fresh conda environment with mlx-lm==0.31.2:

$ vllm-mlx bench mlx-community/Llama-3.2-1B-Instruct-4bit --max-tokens 100 --num-prompts 10

Chunked prefill disabled: mlx-lm BatchGenerator lacks required internals
(_process_prompts, active_batch). Upgrade mlx-lm or check compatibility.

Loading model: mlx-community/Llama-3.2-1B-Instruct-4bit

Running benchmark with 10 prompts, max_tokens=100
--------------------------------------------------

Results:
  Total time: 2.38s
  Prompts: 10
  Prompts/second: 4.19
  Total prompt tokens: 80
  Total completion tokens: 960
  Total tokens: 1040
  Tokens/second: 402.52
  Throughput: 436.06 tok/s

Test plan

pytest tests/test_batching.py — all chunked prefill tests pass
pytest tests/test_memory_stability.py — all BatchGenerator lifecycle tests pass
Verified BatchGenerator.__init__ signature in fresh mlx-lm==0.31.2 install (conda env)
vllm-mlx bench mlx-community/Llama-3.2-1B-Instruct-4bit — 10 prompts, 960 tokens, 0 errors

Notes

Chunked prefill is disabled as a degradation (not a crash) on mlx-lm 0.31.x. When mlx-lm adds back compatible internals or a public chunked prefill API, the guard will automatically re-enable it.
MTP (_install_mtp) also references removed internals (_step) but is disabled by default (enable_mtp: bool = False), so it's not part of this fix.

Closes #293

The backport in f61d34e assumed internal BatchGenerator APIs that were refactored in mlx-lm 0.31.x. This breaks bench and serve for all users on v0.2.7. Changes: - Set prompt_progress_callback as instance attribute instead of passing it to BatchGenerator constructor (not a valid parameter) - Guard _install_chunked_prefill with hasattr check and log warning when skipped (relies on removed _process_prompts, active_batch) - Handle next() returning (prompt_responses, generation_responses) tuple instead of flat list - Add hasattr guard for active_batch in periodic cache eval Benchmark (Llama-3.2-1B-Instruct-4bit, mlx-lm 0.31.2): Total time: 2.38s Prompts: 10 Prompts/second: 4.19 Total prompt tokens: 80 Total completion tokens: 960 Total tokens: 1040 Tokens/second: 402.52 Throughput: 436.06 tok/s Closes #293

Thump604

I reviewed this against the current mlx-lm 0.31.x crash surface and I do not see a blocking issue. The constructor-argument fix, tuple-return handling, and active_batch guard address the reported failures, and degrading chunked prefill to a warning is a reasonable compatibility fallback for now.

janhilgard

Review

Overall a good fix — minimal, defensive, and backwards-compatible. A few notes:

Backwards compatibility verification

Checked against mlx-lm==0.31.1 (our production):

BatchGenerator.__init__ in 0.31.1 still accepts prompt_progress_callback — moving it to an attribute post-construction is fine, but the constructor in 0.31.1 internally consumes it (sets self.prompt_progress_callback). Moving it to an attribute after construction in 0.31.1 effectively overwrites the value set by the constructor. Works, but worth noting.
_process_prompts, active_batch, _next() — all exist in 0.31.1 → chunked_compatible = True → no behavior change.
next() in 0.31.1 returns a flat list → isinstance(result, tuple) = False → no behavior change.

Conclusion: backwards-compatible with 0.31.1. ✅

Question about prompt_responses

if isinstance(result, tuple):
    responses = result[1]  # generation_responses only

If result[0] contains prompt_responses — are they safe to ignore? In the original flat-list format, prompt responses and generation responses were mixed together and _process_batch_responses() processed them uniformly (response.uid + response.token).

If 0.31.2 returns prompt_responses in result[0] and those contain data that previously went through _process_batch_responses(), silently dropping them could cause missing tokens or incomplete requests.

The smoke test (10 prompts, 960 tokens) works — but it would be worth confirming that prompt_responses in 0.31.2 don't carry generated tokens, only metadata/progress info.

MTP note

The PR correctly notes that MTP (_install_mtp) has the same issue with _step but is disabled by default. That's fine for this fix, but will need to be addressed separately for --enable-mtp users.

Verdict

LGTM for merge. Defensive guards (hasattr, isinstance) are the right approach for cross-version mlx-lm compatibility. Graceful degradation of chunked prefill with a warning is better than a crash.

waybarrios force-pushed the fix-prompt-progress-callback branch from ef2e904 to 980d092 Compare April 12, 2026 14:48

waybarrios requested review from Thump604 and janhilgard April 12, 2026 14:49

waybarrios added the bug Something isn't working label Apr 12, 2026

Thump604 approved these changes Apr 12, 2026

View reviewed changes

bump to 0.2.8

f9dcde9

waybarrios merged commit d2e7f88 into main Apr 12, 2026
7 checks passed

janhilgard reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch generator crashes with prompt_progress_callback on mlx-lm 0.31.x#294

batch generator crashes with prompt_progress_callback on mlx-lm 0.31.x#294
waybarrios merged 2 commits intomainfrom
fix-prompt-progress-callback

waybarrios commented Apr 12, 2026 •

edited

Loading

Uh oh!

Thump604 left a comment

Uh oh!

Uh oh!

janhilgard left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

waybarrios commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Error 1: prompt_progress_callback not a constructor parameter

Error 2: _process_prompts removed in mlx-lm 0.31.x

Error 3: next() return format changed

Error 4: active_batch no longer exists

Root cause

Fix

Test results

Smoke test: vllm-mlx bench (Llama-3.2-1B-Instruct-4bit)

Test plan

Notes

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

janhilgard left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Review

Backwards compatibility verification

Question about prompt_responses

MTP note

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

waybarrios commented Apr 12, 2026 •

edited

Loading

Error 1: `prompt_progress_callback` not a constructor parameter

Error 2: `_process_prompts` removed in mlx-lm 0.31.x

Error 3: `next()` return format changed

Error 4: `active_batch` no longer exists

Smoke test: `vllm-mlx bench` (Llama-3.2-1B-Instruct-4bit)

janhilgard left a comment •

edited

Loading