fix: compatibility with mlx_lm 0.31.0 prompt checkpoints by rsnow · Pull Request #126 · jundot/omlx

rsnow · 2026-03-10T02:05:37Z

Closes #110

mlx_lm 0.31.0 added prompt checkpoint support to BatchGenerator, changing insert() tuple arity from 6 to 7 fields and replacing hardcoded prefill boundaries with a variable prompt_checkpoint.

Changes to _BoundarySnapshotBatchGenerator._process_prompts:

Accept 7th tuple field (prompt_checkpoints) from insert()
Compute effective prompt_checkpoint matching upstream semantics
Replace hardcoded prefill split (1) with prompt_checkpoint in both left-pad and right-pad paths
Add prompt_checkpoint_callback support for upstream parity
Defensive clamp in cache prepare to prevent negative lengths
Backward compatible: When prompt_checkpoints are not supplied (default), prompt_checkpoint computes to 1 and all code paths behave identically to pre-patch.

Tested on M3 Ultra 256GB with mlx_lm 0.31.0 — chat completions, benchmarks, continuous batching all working. Soaking since March 8.

mlx_lm 0.31.0 added prompt checkpoint support to BatchGenerator, changing insert() tuple arity from 6 to 7 fields and replacing hardcoded prefill boundaries with a variable prompt_checkpoint. Changes to _BoundarySnapshotBatchGenerator._process_prompts: - Accept 7th tuple field (prompt_checkpoints) from insert() - Compute effective prompt_checkpoint matching upstream semantics - Replace hardcoded prefill split (1) with prompt_checkpoint in both left-pad and right-pad paths (loop bounds, last_inputs slice, cache prepare lengths) - Add prompt_checkpoint_callback support for upstream parity - Process remaining prompt_checkpoint-1 tokens before _step when checkpoint > 1, with VLM embed slicing - Defensive clamp in cache prepare to prevent negative lengths - Materialize checkpoint callback cache extracts (tuple vs generator) When prompt_checkpoints are not supplied (default), prompt_checkpoint computes to 1 and all code paths behave identically to pre-patch. Tested with mlx_lm 0.31.0 on M3 Ultra 256GB (bolo): - Chat completions: working - Benchmarks: working, ~15% gen speed improvement (Python 3.14) - Continuous batching: 1.4x at 2x batch, stable

jundot · 2026-03-10T07:52:27Z

Thanks for this, really appreciate the contribution! I verified the diff against upstream's _process_prompts and everything lines up correctly. The defensive max(0, l - prompt_checkpoint) clamp and the VLM embedding handling during checkpoint processing are nice touches that upstream doesn't need but omlx definitely does.

Merged and working well on my end. Also pushed a follow-up commit (6a9b264) on top of this to finish the rest of the 0.31.1 migration (presence/frequency penalty integration, dependency refs update).

jundot merged commit dfe3bda into jundot:main Mar 10, 2026

hkstrongside mentioned this pull request Mar 20, 2026

fix: compatibility with mlx-lm 0.31.x (prompt_checkpoints tuple) waybarrios/vllm-mlx#183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: compatibility with mlx_lm 0.31.0 prompt checkpoints#126

fix: compatibility with mlx_lm 0.31.0 prompt checkpoints#126
jundot merged 1 commit intojundot:mainfrom
rsnow:mlx-lm-0.31.0-compat-pr

rsnow commented Mar 10, 2026

Uh oh!

jundot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rsnow commented Mar 10, 2026

Uh oh!

jundot commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants