Fix batch generation and adopt mlx-lm batch improvements by Blaizzy · Pull Request #911 · Blaizzy/mlx-vlm

Blaizzy · 2026-04-04T01:02:01Z

Summary

Per-sequence samplers & logits processors: BatchGenerator.insert() now accepts per-sequence samplers and logits_processors lists, enabling mixed-temperature/top-p serving in batch mode. Falls back to shared sampler when not provided — fully backward compatible.
Token tracking: Batch tracks generated tokens per sequence in the new tokens field.
Cache interface fix: Added missing nbytes property and empty() method to SlidingWindowCache and StaticKVCache to satisfy _BaseCache abstract interface from mlx-lm, preventing breakage on future mlx-lm updates.

Test plan

batch_generate() with Qwen2.5-VL-3B produces correct output
Single generate() unaffected
SlidingWindowCache and StaticKVCache pass empty() and nbytes checks

🤖 Generated with Claude Code

- Add `tokens`, `samplers`, and `logits_processors` fields to Batch class with proper filter/extend support - BatchGenerator.insert() now accepts per-sequence samplers and logits_processors for fine-grained control (e.g. mixed temperature) - _step() applies per-sequence logits processors and samplers during generation, falling back to shared sampler when not provided - Add missing `nbytes` and `empty()` to SlidingWindowCache and StaticKVCache to satisfy _BaseCache interface from mlx-lm Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…oken handling - Introduced a new `_right_pad_prompts` function for right padding of prompts. - Integrated `SequenceStateMachine` to manage stop token detection, allowing for multi-token sequences. - Updated `Batch` class to support state machine states, ensuring proper handling during filtering and merging. - Modified `BatchGenerator` to utilize the state machine for improved stop detection logic. - Ensured backward compatibility with legacy stopping criteria while enhancing functionality.

…offset Three bugs that caused garbage output when batch_size > 1: 1. Vision tower flattened all batch image tokens into [1, total, dim], losing the batch dimension. Now preserves [B, tokens_per_image, dim]. 2. masked_scatter flattened all batches causing cross-batch index contamination via modulo wrapping. Now processes per-batch for B>1. 3. BatchRotatingKVCache.offset is a mutable mx.array that gets modified by update_and_fetch. The attention code captured this reference before the update but it mutated, causing queries to get wrong RoPE positions. Fixed by snapshotting the offset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

CI uses released mlx-lm which doesn't have these yet. Gracefully falls back to legacy stopping_criteria when unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Blaizzy and others added 4 commits April 4, 2026 03:01

format

6db0c3b

Blaizzy changed the title ~~Add per-sequence samplers and fix cache interface~~ Fix Gemma 4 batch generation and add SequenceStateMachine support Apr 4, 2026

Blaizzy changed the title ~~Fix Gemma 4 batch generation and add SequenceStateMachine support~~ Fix batch generation and adopt mlx-lm batch improvements Apr 4, 2026

Make SequenceStateMachine and dynamic_roll imports optional

0b13be0

CI uses released mlx-lm which doesn't have these yet. Gracefully falls back to legacy stopping_criteria when unavailable. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix batch generation and adopt mlx-lm batch improvements#911

Fix batch generation and adopt mlx-lm batch improvements#911
Blaizzy wants to merge 5 commits intomainfrom
pc/batch-improvements

Blaizzy commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Blaizzy commented Apr 4, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant