Add per-request status info for MLLM scheduler by janhilgard · Pull Request #257 · waybarrios/vllm-mlx

janhilgard · 2026-04-06T08:55:40Z

Summary

MLLM scheduler was missing get_running_requests_info(), so /v1/status always returned "requests": [] for MLLM models (all --mllm servers)
This meant monitoring dashboards could not see real-time per-request progress during streaming — throughput appeared as 0 tok/s until the request completed, then spiked
Added get_running_requests_info() to MLLMScheduler (mirrors the existing implementation in scheduler.py) and promoted scheduler stats to top-level in BatchedEngine.get_stats()

Changes

mllm_scheduler.py: Add first_token_time to MLLMRequest, record it on first generated token, add get_running_requests_info() returning per-request completion_tokens, tokens_per_second, progress, ttft_s, phase
engine/batched.py: Promote num_running, num_waiting, total_prompt_tokens, total_completion_tokens, requests (and other keys) from mllm_stats to top-level stats

Test plan

Start an MLLM server (--mllm flag)
Send a streaming request with high max_tokens
Poll /v1/status during generation — verify requests array contains per-request details with increasing completion_tokens and non-null tokens_per_second
Verify num_running and total_completion_tokens are visible at top-level in /v1/status

🤖 Generated with Claude Code

The MLLM scheduler was missing get_running_requests_info(), so /v1/status always returned an empty requests array for MLLM models. This made it impossible to see real-time per-request progress (completion_tokens, tokens_per_second, TTFT) during streaming. Changes: - Add first_token_time field to MLLMRequest for TTFT tracking - Add get_running_requests_info() to MLLMScheduler (mirrors scheduler.py) - Include requests in get_stats() output - Promote scheduler stats (num_running, num_waiting, total tokens, requests) to top-level in BatchedEngine.get_stats() Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

janhilgard · 2026-04-11T14:18:24Z

@Thump604 Superseded — per-request RequestStatus tracking is already in main via #278. Closing.

Thump604 mentioned this pull request Apr 8, 2026

MLLMScheduler total_prompt_tokens stays at 0 even after processing image requests #260

Closed

janhilgard closed this Apr 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-request status info for MLLM scheduler#257

Add per-request status info for MLLM scheduler#257
janhilgard wants to merge 1 commit intowaybarrios:mainfrom
janhilgard:fix/mllm-running-requests-info

janhilgard commented Apr 6, 2026

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

janhilgard commented Apr 6, 2026

Summary

Changes

Test plan

Uh oh!

janhilgard commented Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant