Skip to content

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0#194

Closed
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:fix/chunked-prefill-tuple-unpack
Closed

fix: unpack prompt_checkpoints in _chunked_next for mlx-lm >= 0.31.0#194
Thump604 wants to merge 1 commit intowaybarrios:mainfrom
Thump604:fix/chunked-prefill-tuple-unpack

Conversation

@Thump604
Copy link
Copy Markdown
Collaborator

Summary

mlx-lm 0.31.0 added prompt_checkpoints as the 7th element in the BatchGenerator.insert() tuple. _chunked_next only unpacked 6 elements, causing ValueError: too many values to unpack (expected 6) when prefix cache triggers the chunked prefill path.

One-line fix: add _prompt_checkpoints to the zip(*batch_prompts) unpack.

Same class of bug as the _process_prompts tuple fix — different code path (_chunked_next bypasses _process_prompts entirely when prefix cache mid-prefill is active).

Reproduction

  1. Start server with --enable-prefix-cache and mlx-lm >= 0.31.0
  2. Send any request with a prompt long enough to trigger chunked prefill
  3. Server crashes with ValueError: too many values to unpack (expected 6)

Test plan

  • Start server with --enable-prefix-cache --continuous-batching on mlx-lm 0.31.x
  • Send a request with >8K token prompt
  • Verify no ValueError on chunked prefill path
  • Verify prefix cache hit on subsequent request with same prefix

Fixes #178

mlx-lm 0.31.0 added prompt_checkpoints as the 7th element in the
BatchGenerator.insert() tuple. _chunked_next only unpacked 6 elements,
causing ValueError when prefix cache triggers the chunked prefill path.

Fixes waybarrios#178
@Thump604
Copy link
Copy Markdown
Collaborator Author

CI green. Small fix — unpack prompt_checkpoints tuple in _chunked_next for mlx-lm >= 0.31.0. Fixes #178.

@Thump604
Copy link
Copy Markdown
Collaborator Author

Closing in favor of #221, which fixes the same tuple unpack crash but also preserves prompt checkpoint semantics through the chunked prefill lifecycle. Our fix here just discards the checkpoint data silently.

Three other PRs (#183, #156, #221) address the same issue. #221 is the most complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prefix-cache + large prompts: mid_prefill_cache re-enables chunked_prefill causing crash

1 participant