Skip to content

fix(mllm_scheduler): add adaptive periodic cache clearing#157

Merged
waybarrios merged 1 commit intowaybarrios:mainfrom
kol22:feat/mllm-periodic-cache-clear
Mar 20, 2026
Merged

fix(mllm_scheduler): add adaptive periodic cache clearing#157
waybarrios merged 1 commit intowaybarrios:mainfrom
kol22:feat/mllm-periodic-cache-clear

Conversation

@kol22
Copy link
Copy Markdown
Contributor

@kol22 kol22 commented Mar 13, 2026

Summary

  • Add adaptive, step-based mx.clear_cache() to MLLMScheduler.step()
  • Clear periodically even when no requests complete
  • Use the same concurrency-scaled interval strategy as the LLM Scheduler

Context

Follow-up to #154 and @waybarrios' suggestion.

MLLMScheduler currently clears the Metal buffer pool only when requests finish. That covers cleanup at completion time, but long multimodal generations can run for many scheduler steps without completions and freed memory can accumulate in the MLX Metal buffer pool.

The LLM Scheduler already addresses this with adaptive periodic cache clearing. This change ports that interval logic to the MLLM path: increment a step counter and call mx.clear_cache() every N steps, with N decreasing as concurrency rises.

The existing clear on finish behavior remains in place. This change adds a second cleanup path for long running generations that stay active across many scheduler steps.

@waybarrios waybarrios self-requested a review March 20, 2026 15:47
Copy link
Copy Markdown
Owner

@waybarrios waybarrios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect!

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean!

@waybarrios waybarrios merged commit 80c6849 into waybarrios:main Mar 20, 2026
7 checks passed
@kol22 kol22 deleted the feat/mllm-periodic-cache-clear branch March 20, 2026 15:50
janhilgard added a commit to janhilgard/vllm-mlx that referenced this pull request Mar 21, 2026
…sion

Cherry-pick upstream merged fixes + local improvements:

- PR waybarrios#92: Eager eval batch.tokens after mx.concatenate() to release Metal
  AGXAllocation handles; adaptive cache clear interval scales with concurrency
- PR waybarrios#154: Drain self.requests dict in MLLM _cleanup_finished() to prevent
  linear memory growth; add mx.clear_cache() after cleanup
- PR waybarrios#157: Port adaptive periodic mx.clear_cache() from LLM scheduler to
  MLLM scheduler (interval scales inversely with active sequences)
- PR waybarrios#124: Forward tool definitions through MLLM chat/stream_chat paths
  in SimpleEngine and MLXMultimodalLM (get_chat_template)
- PR waybarrios#126: Hash full base64 string with SHA-256 instead of MD5 on first
  1000 chars to prevent cross-request image cache collisions

Additional fixes:
- batched.py: Disable thinking mode for coder models, promote MLLM stats
- mllm_batch_generator.py: Downgrade prompt size guard from error to warning
- qwen3_parser.py: Treat tagless output as reasoning (max_tokens truncation)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
raullenchai pushed a commit to raullenchai/Rapid-MLX that referenced this pull request Mar 26, 2026
…ection, served-model-name

Merge 16 upstream commits (22dcbf8..d235c37) into our fork:

- feat: SpecPrefill — attention-based sparse prefill for TTFT reduction (waybarrios#180)
- feat: native Qwen3-VL video pipeline with temporal 3D conv + M-RoPE (waybarrios#150)
- fix: Disable MambaCache monkey-patch for hybrid models, add MTP auto-injection (waybarrios#97)
- feat: Add --served-model-name CLI parameter (waybarrios#125)
- feat: Add Qwen3.5 text-only loading and dynamic memory threshold (waybarrios#127)
- fix(mllm_scheduler): add adaptive periodic cache clearing (waybarrios#157)
- fix: Metal resource leak under high concurrency (waybarrios#92)

Conflict resolution strategy: keep all fork features (DeltaNet snapshots,
fast SSE templates, tool injection, cloud routing, prompt cache, etc.)
while incorporating upstream's new functionality.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants