Skip to content

fix: run MLLM prefill in thread executor to keep event loop responsive#275

Closed
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:fix/mllm-executor-prefill
Closed

fix: run MLLM prefill in thread executor to keep event loop responsive#275
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:fix/mllm-executor-prefill

Conversation

@janhilgard
Copy link
Copy Markdown
Collaborator

Summary

  • MLLM scheduler's _process_loop() was calling self.step() directly on the asyncio event loop, blocking all HTTP handlers (including /health) during prefill
  • Apply the same hybrid executor pattern already used by engine_core: prefill-likely steps run via run_in_executor, decode-only steps remain inline
  • This keeps /health and other endpoints responsive during long prefills

Problem

When a large prompt arrives, prefill can block the event loop for several seconds. During this time, /health returns no response, causing monitoring dashboards to report the server as unavailable.

Test plan

  • Start server with --mllm --continuous-batching
  • Send a long prompt request and simultaneously call /health — confirms 200 responses in ~65-73ms during active inference
  • Verify generation output is correct and streaming works

🤖 Generated with Claude Code

The MLLM scheduler's _process_loop() was calling self.step() directly
on the asyncio event loop. During prefill (which can block for seconds
on large prompts), this blocked all other HTTP handlers including
/health, causing monitoring dashboards to report the server as
unavailable.

Apply the same hybrid executor pattern already used by engine_core:
prefill-likely steps (waiting requests or partial prefill) run via
run_in_executor, while fast decode-only steps remain inline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@janhilgard
Copy link
Copy Markdown
Collaborator Author

@Thump604 Superseded — MLLM prefill thread executor (run_in_executor) is already in main via #278. Closing.

@janhilgard janhilgard closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant