fix(mllm_scheduler): add adaptive periodic cache clearing by kol22 · Pull Request #157 · waybarrios/vllm-mlx

kol22 · 2026-03-13T01:22:11Z

Summary

Add adaptive, step-based mx.clear_cache() to MLLMScheduler.step()
Clear periodically even when no requests complete
Use the same concurrency-scaled interval strategy as the LLM Scheduler

Context

Follow-up to #154 and @waybarrios' suggestion.

MLLMScheduler currently clears the Metal buffer pool only when requests finish. That covers cleanup at completion time, but long multimodal generations can run for many scheduler steps without completions and freed memory can accumulate in the MLX Metal buffer pool.

The LLM Scheduler already addresses this with adaptive periodic cache clearing. This change ports that interval logic to the MLLM path: increment a step counter and call mx.clear_cache() every N steps, with N decreasing as concurrency rises.

The existing clear on finish behavior remains in place. This change adds a second cleanup path for long running generations that stay active across many scheduler steps.

waybarrios

perfect!

waybarrios · 2026-03-20T15:48:22Z

vllm_mlx/mllm_scheduler.py

…sion Cherry-pick upstream merged fixes + local improvements: - PR waybarrios#92: Eager eval batch.tokens after mx.concatenate() to release Metal AGXAllocation handles; adaptive cache clear interval scales with concurrency - PR waybarrios#154: Drain self.requests dict in MLLM _cleanup_finished() to prevent linear memory growth; add mx.clear_cache() after cleanup - PR waybarrios#157: Port adaptive periodic mx.clear_cache() from LLM scheduler to MLLM scheduler (interval scales inversely with active sequences) - PR waybarrios#124: Forward tool definitions through MLLM chat/stream_chat paths in SimpleEngine and MLXMultimodalLM (get_chat_template) - PR waybarrios#126: Hash full base64 string with SHA-256 instead of MD5 on first 1000 chars to prevent cross-request image cache collisions Additional fixes: - batched.py: Disable thinking mode for coder models, promote MLLM stats - mllm_batch_generator.py: Downgrade prompt size guard from error to warning - qwen3_parser.py: Treat tagless output as reasoning (max_tokens truncation) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ection, served-model-name Merge 16 upstream commits (22dcbf8..d235c37) into our fork: - feat: SpecPrefill — attention-based sparse prefill for TTFT reduction (waybarrios#180) - feat: native Qwen3-VL video pipeline with temporal 3D conv + M-RoPE (waybarrios#150) - fix: Disable MambaCache monkey-patch for hybrid models, add MTP auto-injection (waybarrios#97) - feat: Add --served-model-name CLI parameter (waybarrios#125) - feat: Add Qwen3.5 text-only loading and dynamic memory threshold (waybarrios#127) - fix(mllm_scheduler): add adaptive periodic cache clearing (waybarrios#157) - fix: Metal resource leak under high concurrency (waybarrios#92) Conflict resolution strategy: keep all fork features (DeltaNet snapshots, fast SSE templates, tool injection, cloud routing, prompt cache, etc.) while incorporating upstream's new functionality. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(mllm_scheduler): add adaptive periodic cache clearing

7505841

waybarrios self-requested a review March 20, 2026 15:47

waybarrios approved these changes Mar 20, 2026

View reviewed changes

vllm_mlx/mllm_scheduler.py

Copy link
Copy Markdown

Owner

waybarrios Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean!

waybarrios merged commit 80c6849 into waybarrios:main Mar 20, 2026
7 checks passed

kol22 deleted the feat/mllm-periodic-cache-clear branch March 20, 2026 15:50

raullenchai mentioned this pull request Mar 26, 2026

Sync upstream: SpecPrefill, native video, MTP injection raullenchai/Rapid-MLX#58

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mllm_scheduler): add adaptive periodic cache clearing#157

fix(mllm_scheduler): add adaptive periodic cache clearing#157
waybarrios merged 1 commit intowaybarrios:mainfrom
kol22:feat/mllm-periodic-cache-clear

kol22 commented Mar 13, 2026

Uh oh!

waybarrios left a comment

Uh oh!

waybarrios Mar 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kol22 commented Mar 13, 2026

Summary

Context

Uh oh!

waybarrios left a comment

Choose a reason for hiding this comment

Uh oh!

waybarrios Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants