fix: populate tokens field in BatchedEngine.generate() by mmcaulif · Pull Request #229 · waybarrios/vllm-mlx

mmcaulif · 2026-03-28T03:07:05Z

Encountered this while first investigating vllm-mlx for for RL training, I want to be able to see the tokens produced via BatchedEngine.generate. This PR was written exclusively with Claude so please let me know if you have any suggestions and I will implement them manually 😃. I also will try to implement and upstream any RL-related features that are missing.

Thank you for the project, it has been very useful!

The below is AI-generated:

Summary

BatchedEngine.generate() always returned tokens=[] despite AsyncEngineCore tracking output_token_ids per request
The fix passes output.output_token_ids through to GenerationOutput
Adds tests/test_batched_engine.py with three unit tests covering the tokens field and other output fields

Root cause

In engine/batched.py, the GenerationOutput was constructed without the tokens field:

# before
return GenerationOutput(
    text=text,
    prompt_tokens=output.prompt_tokens,
    completion_tokens=output.completion_tokens,
    finish_reason=output.finish_reason,
)

# after
return GenerationOutput(
    text=text,
    tokens=output.output_token_ids,
    prompt_tokens=output.prompt_tokens,
    completion_tokens=output.completion_tokens,
    finish_reason=output.finish_reason,
)

Test plan

test_tokens_field_is_populated — verifies token IDs are forwarded correctly
test_tokens_field_empty_when_no_tokens_generated — verifies empty list is handled
test_other_output_fields_still_populated — verifies no regression on existing fields

🤖 Generated with Claude Code

The output_token_ids from AsyncEngineCore were tracked internally but never forwarded to GenerationOutput, leaving tokens always []. Also adds tests for the generate() output fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Thump604

Summary

This PR correctly fixes the missing tokens field in BatchedEngine.generate() for the LLM (text-only) code path. Good catch on the root cause.

What's Good

Fix is minimal and correct: line 493 in batched.py now passes tokens=output.output_token_ids to GenerationOutput.
Tests are well-structured: three cases covering normal output, empty tokens, and field regression.
Test mocks are properly constructed to simulate AsyncEngineCore behavior.

Issues Found

1. Incomplete Fix: MLLM Path Still Missing Tokens

The PR only fixes the LLM path (line 493), but BatchedEngine.generate() also returns GenerationOutput on line 469 for MLLM output (when media is present):

if has_media or self._is_mllm:
    output = await self._mllm_scheduler.generate(...)  # line ~459
    return GenerationOutput(
        text=clean_output_text(output.output_text),
        # tokens field missing here too
        prompt_tokens=output.prompt_tokens,
        ...
    )

The output from MLLMScheduler.generate() also returns a RequestOutput with output_token_ids. This path should also include tokens=output.output_token_ids.

2. Tests Don't Cover MLLM Path

All three tests call engine.generate() with only text (no images/videos), so they only exercise the fixed LLM path. The MLLM path (with media) is completely untested.

3. Missing Test: Verify Tokens Field Default Value

The GenerationOutput dataclass has tokens: list[int] = field(default_factory=list). The tests don't verify that when omitted from the constructor, it defaults correctly. This is less critical since the fix adds it explicitly, but would be safer.

Recommendation

Approve with request to fix:

Apply the same tokens=output.output_token_ids fix to line 469 (MLLM path)
Add a test that exercises the MLLM path (with images/videos)
Optional: add a test verifying the default value

The one-line fix is correct and low-risk. The incomplete coverage is the only concern.

Thump604

Test comment to verify posting works

Thump604

Summary

This PR correctly fixes the missing tokens field in BatchedEngine.generate() for the LLM (text-only) code path. Good catch on the root cause.

What's Good

Fix is minimal and correct: line 493 in batched.py now passes tokens=output.output_token_ids to GenerationOutput.
Tests are well-structured: three cases covering normal output, empty tokens, and field regression.
Test mocks are properly constructed to simulate AsyncEngineCore behavior.

Issues Found

Incomplete Fix: MLLM Path Still Missing Tokens

The PR only fixes the LLM path (line 493), but BatchedEngine.generate() also returns GenerationOutput on line 469 for MLLM output (when media is present):

if has_media or self._is_mllm:
    output = await self._mllm_scheduler.generate(...)  # line ~459
    return GenerationOutput(
        text=clean_output_text(output.output_text),
        # tokens field missing here too
        prompt_tokens=output.prompt_tokens,
        ...
    )

The output from MLLMScheduler.generate() also returns a RequestOutput with output_token_ids. This path should also include tokens=output.output_token_ids.

Tests Don't Cover MLLM Path

All three tests call engine.generate() with only text (no images/videos), so they only exercise the fixed LLM path. The MLLM path (with media) is completely untested.

Recommendation

Minor issue: apply the same tokens=output.output_token_ids fix to line 469 (MLLM path) and add a test for it. The one-line fix is correct and low-risk. Incomplete coverage is the only concern.

Thump604 · 2026-04-07T23:55:02Z

@waybarrios, @mmcaulif: brief endorsement.

The use case is real. BatchedEngine.generate() not populating tokens is a real gap for downstream consumers (RL training, custom samplers, anything that needs the token IDs in addition to the decoded text). The diff is small enough to verify by reading: the token list was already being assembled in the streaming loop, this PR plumbs it through to the final GenerationOutput. Sound shape.

Mergeable on current main.

Thump604

I'd been grepping for consumers of GenerationOutput.tokens in-tree as part of an unrelated SimpleEngine refactor and was about to conclude the field was unused and propose dropping it. Your PR is evidence I was wrong: grep doesn't see out-of-tree consumers like RL training pipelines. The populated-but-empty-as-default-factory shape keeps it drop-in compatible.

Question, not blocking: for RL training use, would per-token logprobs also be useful on GenerationOutput, or do you compute them separately? Happy to file a follow-up if there's a gap there, since the work of plumbing through token IDs is similar to plumbing through logprobs.

Approving as-is.

…l generate+stream_generate Pre-existing regression from an earlier rebase that dropped bdf7dcc's llm.py additions. The server.py request handlers still pass top_k, min_p, presence_penalty, repetition_penalty through to SimpleEngine, which forwards them via **kwargs to MLXLanguageModel.chat() (which accepts **kwargs) which then calls self.generate(..., **kwargs). But MLXLanguageModel.generate() and stream_generate() had been left with only (temperature, top_p, repetition_penalty) in their signatures, so any non-MLLM SimpleEngine request crashed with: TypeError: MLXLanguageModel.stream_generate() got an unexpected keyword argument 'top_k' Observed as 0/6 on simple-base, simple-mtp, and simple-spec profiles in the feature matrix regression sweep after the Session 87 cherry-picks of PRs waybarrios#248, waybarrios#229, waybarrios#218, waybarrios#222 landed. The cherry-picks did not cause this regression — they exposed it by finally running the LLM-path tests that no one had exercised since the rebase happened. Confirmed via stderr.log: TypeError: MLXLanguageModel.generate() got an unexpected keyword argument 'top_k' TypeError: MLXLanguageModel.stream_generate() got an unexpected keyword argument 'top_k' Fix: restore the signatures and bodies of _create_sampler, _create_logits_processors, generate, and stream_generate to match bdf7dcc's original intent. Preserves PR waybarrios#248's prompt_cache parameter and non-str prompt support on stream_generate. Adds **kwargs to both generate and stream_generate so future param additions degrade gracefully instead of crashing. This is a runtime-local fix. The equivalent upstream fix lives in bdf7dcc which was never upstreamed (confirmed via git merge-base --is-ancestor bdf7dcc upstream/main). A follow-up PR to upstream could carry this forward. Verification: bin/verify-patches: 33/33 clean Full feature matrix regression sweep pending re-run after this commit. Related: runtime PR waybarrios#265 (waybarrios#265) fixed the CompletionRequest schema side of the same bdf7dcc drop; this commit fixes the engine-model side.

…is mllm

mmcaulif · 2026-04-10T22:47:49Z

@Thump604

Populated the tokens in GenerationOutput for multimodal models, on reflection I am not convinced at all my the AI generated tests so I will try and improve these. I think that would be best is more sort of dummy model that just returns the tokens [0, 1, 2, ..., i] when max_tokens=i.

And yes returning per-token log_probs would be great!

Thump604 · 2026-04-10T22:50:44Z

Thanks — the missing MLLM path was the only real hole I was worried about, and that is closed now that tokens=output.output_token_ids is wired through both return sites in BatchedEngine.generate().

I still agree the current tests are weaker than ideal because they only exercise the text path, but for a fix this small I would treat the stronger dummy-model / MLLM-path test shape as a follow-up rather than a merge blocker. Same for per-token logprobs: useful, but separate scope.

From my side this now looks clean enough to merge.

mmcaulif · 2026-04-10T23:42:42Z

Thanks — the missing MLLM path was the only real hole I was worried about, and that is closed now that tokens=output.output_token_ids is wired through both return sites in BatchedEngine.generate().

I still agree the current tests are weaker than ideal because they only exercise the text path, but for a fix this small I would treat the stronger dummy-model / MLLM-path test shape as a follow-up rather than a merge blocker. Same for per-token logprobs: useful, but separate scope.

From my side this now looks clean enough to merge.

Sounds good to me, feel free to merge as it doesn't look like I am able to

Thump604 reviewed Mar 31, 2026

View reviewed changes

Thump604 approved these changes Apr 8, 2026

View reviewed changes

fix: include tokens in GenerationOutput for BatchedEngine when model …

fc25c8c

…is mllm

Thump604 merged commit b062186 into waybarrios:main Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: populate tokens field in BatchedEngine.generate()#229

fix: populate tokens field in BatchedEngine.generate()#229
Thump604 merged 2 commits intowaybarrios:mainfrom
mmcaulif:fix/batched-engine-tokens-field

mmcaulif commented Mar 28, 2026 •

edited

Loading

Uh oh!

Thump604 left a comment

Uh oh!

Thump604 left a comment

Uh oh!

Thump604 left a comment

Uh oh!

Thump604 commented Apr 7, 2026

Uh oh!

Thump604 left a comment

Uh oh!

mmcaulif commented Apr 10, 2026

Uh oh!

Thump604 commented Apr 10, 2026

Uh oh!

mmcaulif commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mmcaulif commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Test plan

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Summary

What's Good

Issues Found

1. Incomplete Fix: MLLM Path Still Missing Tokens

2. Tests Don't Cover MLLM Path

3. Missing Test: Verify Tokens Field Default Value

Recommendation

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Summary

What's Good

Issues Found

Incomplete Fix: MLLM Path Still Missing Tokens

Tests Don't Cover MLLM Path

Recommendation

Uh oh!

Thump604 commented Apr 7, 2026

Uh oh!

Thump604 left a comment

Choose a reason for hiding this comment

Uh oh!

mmcaulif commented Apr 10, 2026

Uh oh!

Thump604 commented Apr 10, 2026

Uh oh!

mmcaulif commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mmcaulif commented Mar 28, 2026 •

edited

Loading