fix: compatibility with mlx_lm 0.31.0 prompt checkpoints#126
Merged
jundot merged 1 commit intojundot:mainfrom Mar 10, 2026
Merged
fix: compatibility with mlx_lm 0.31.0 prompt checkpoints#126jundot merged 1 commit intojundot:mainfrom
jundot merged 1 commit intojundot:mainfrom
Conversation
mlx_lm 0.31.0 added prompt checkpoint support to BatchGenerator, changing insert() tuple arity from 6 to 7 fields and replacing hardcoded prefill boundaries with a variable prompt_checkpoint. Changes to _BoundarySnapshotBatchGenerator._process_prompts: - Accept 7th tuple field (prompt_checkpoints) from insert() - Compute effective prompt_checkpoint matching upstream semantics - Replace hardcoded prefill split (1) with prompt_checkpoint in both left-pad and right-pad paths (loop bounds, last_inputs slice, cache prepare lengths) - Add prompt_checkpoint_callback support for upstream parity - Process remaining prompt_checkpoint-1 tokens before _step when checkpoint > 1, with VLM embed slicing - Defensive clamp in cache prepare to prevent negative lengths - Materialize checkpoint callback cache extracts (tuple vs generator) When prompt_checkpoints are not supplied (default), prompt_checkpoint computes to 1 and all code paths behave identically to pre-patch. Tested with mlx_lm 0.31.0 on M3 Ultra 256GB (bolo): - Chat completions: working - Benchmarks: working, ~15% gen speed improvement (Python 3.14) - Continuous batching: 1.4x at 2x batch, stable
Owner
|
Thanks for this, really appreciate the contribution! I verified the diff against upstream's Merged and working well on my end. Also pushed a follow-up commit (6a9b264) on top of this to finish the rest of the 0.31.1 migration (presence/frequency penalty integration, dependency refs update). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #110
mlx_lm 0.31.0 added prompt checkpoint support to BatchGenerator, changing insert() tuple arity from 6 to 7 fields and replacing hardcoded prefill boundaries with a variable prompt_checkpoint.
Changes to _BoundarySnapshotBatchGenerator._process_prompts:
Accept 7th tuple field (prompt_checkpoints) from insert()
Compute effective prompt_checkpoint matching upstream semantics
Replace hardcoded prefill split (1) with prompt_checkpoint in both left-pad and right-pad paths
Add prompt_checkpoint_callback support for upstream parity
Defensive clamp in cache prepare to prevent negative lengths
Backward compatible: When prompt_checkpoints are not supplied (default), prompt_checkpoint computes to 1 and all code paths behave identically to pre-patch.
Tested on M3 Ultra 256GB with mlx_lm 0.31.0 — chat completions, benchmarks, continuous batching all working. Soaking since March 8.