fix: populate tokens field in BatchedEngine.generate()#229
fix: populate tokens field in BatchedEngine.generate()#229Thump604 merged 2 commits intowaybarrios:mainfrom
Conversation
The output_token_ids from AsyncEngineCore were tracked internally but never forwarded to GenerationOutput, leaving tokens always []. Also adds tests for the generate() output fields. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thump604
left a comment
There was a problem hiding this comment.
Summary
This PR correctly fixes the missing tokens field in BatchedEngine.generate() for the LLM (text-only) code path. Good catch on the root cause.
What's Good
- Fix is minimal and correct: line 493 in
batched.pynow passestokens=output.output_token_idsto GenerationOutput. - Tests are well-structured: three cases covering normal output, empty tokens, and field regression.
- Test mocks are properly constructed to simulate AsyncEngineCore behavior.
Issues Found
1. Incomplete Fix: MLLM Path Still Missing Tokens
The PR only fixes the LLM path (line 493), but BatchedEngine.generate() also returns GenerationOutput on line 469 for MLLM output (when media is present):
if has_media or self._is_mllm:
output = await self._mllm_scheduler.generate(...) # line ~459
return GenerationOutput(
text=clean_output_text(output.output_text),
# tokens field missing here too
prompt_tokens=output.prompt_tokens,
...
)The output from MLLMScheduler.generate() also returns a RequestOutput with output_token_ids. This path should also include tokens=output.output_token_ids.
2. Tests Don't Cover MLLM Path
All three tests call engine.generate() with only text (no images/videos), so they only exercise the fixed LLM path. The MLLM path (with media) is completely untested.
3. Missing Test: Verify Tokens Field Default Value
The GenerationOutput dataclass has tokens: list[int] = field(default_factory=list). The tests don't verify that when omitted from the constructor, it defaults correctly. This is less critical since the fix adds it explicitly, but would be safer.
Recommendation
Approve with request to fix:
- Apply the same
tokens=output.output_token_idsfix to line 469 (MLLM path) - Add a test that exercises the MLLM path (with images/videos)
- Optional: add a test verifying the default value
The one-line fix is correct and low-risk. The incomplete coverage is the only concern.
Thump604
left a comment
There was a problem hiding this comment.
Test comment to verify posting works
Thump604
left a comment
There was a problem hiding this comment.
Summary
This PR correctly fixes the missing tokens field in BatchedEngine.generate() for the LLM (text-only) code path. Good catch on the root cause.
What's Good
- Fix is minimal and correct: line 493 in
batched.pynow passestokens=output.output_token_idsto GenerationOutput. - Tests are well-structured: three cases covering normal output, empty tokens, and field regression.
- Test mocks are properly constructed to simulate AsyncEngineCore behavior.
Issues Found
Incomplete Fix: MLLM Path Still Missing Tokens
The PR only fixes the LLM path (line 493), but BatchedEngine.generate() also returns GenerationOutput on line 469 for MLLM output (when media is present):
if has_media or self._is_mllm:
output = await self._mllm_scheduler.generate(...) # line ~459
return GenerationOutput(
text=clean_output_text(output.output_text),
# tokens field missing here too
prompt_tokens=output.prompt_tokens,
...
)The output from MLLMScheduler.generate() also returns a RequestOutput with output_token_ids. This path should also include tokens=output.output_token_ids.
Tests Don't Cover MLLM Path
All three tests call engine.generate() with only text (no images/videos), so they only exercise the fixed LLM path. The MLLM path (with media) is completely untested.
Recommendation
Minor issue: apply the same tokens=output.output_token_ids fix to line 469 (MLLM path) and add a test for it. The one-line fix is correct and low-risk. Incomplete coverage is the only concern.
|
@waybarrios, @mmcaulif: brief endorsement. The use case is real. Mergeable on current main. |
Thump604
left a comment
There was a problem hiding this comment.
I'd been grepping for consumers of GenerationOutput.tokens in-tree as part of an unrelated SimpleEngine refactor and was about to conclude the field was unused and propose dropping it. Your PR is evidence I was wrong: grep doesn't see out-of-tree consumers like RL training pipelines. The populated-but-empty-as-default-factory shape keeps it drop-in compatible.
Question, not blocking: for RL training use, would per-token logprobs also be useful on GenerationOutput, or do you compute them separately? Happy to file a follow-up if there's a gap there, since the work of plumbing through token IDs is similar to plumbing through logprobs.
Approving as-is.
…l generate+stream_generate Pre-existing regression from an earlier rebase that dropped bdf7dcc's llm.py additions. The server.py request handlers still pass top_k, min_p, presence_penalty, repetition_penalty through to SimpleEngine, which forwards them via **kwargs to MLXLanguageModel.chat() (which accepts **kwargs) which then calls self.generate(..., **kwargs). But MLXLanguageModel.generate() and stream_generate() had been left with only (temperature, top_p, repetition_penalty) in their signatures, so any non-MLLM SimpleEngine request crashed with: TypeError: MLXLanguageModel.stream_generate() got an unexpected keyword argument 'top_k' Observed as 0/6 on simple-base, simple-mtp, and simple-spec profiles in the feature matrix regression sweep after the Session 87 cherry-picks of PRs waybarrios#248, waybarrios#229, waybarrios#218, waybarrios#222 landed. The cherry-picks did not cause this regression — they exposed it by finally running the LLM-path tests that no one had exercised since the rebase happened. Confirmed via stderr.log: TypeError: MLXLanguageModel.generate() got an unexpected keyword argument 'top_k' TypeError: MLXLanguageModel.stream_generate() got an unexpected keyword argument 'top_k' Fix: restore the signatures and bodies of _create_sampler, _create_logits_processors, generate, and stream_generate to match bdf7dcc's original intent. Preserves PR waybarrios#248's prompt_cache parameter and non-str prompt support on stream_generate. Adds **kwargs to both generate and stream_generate so future param additions degrade gracefully instead of crashing. This is a runtime-local fix. The equivalent upstream fix lives in bdf7dcc which was never upstreamed (confirmed via git merge-base --is-ancestor bdf7dcc upstream/main). A follow-up PR to upstream could carry this forward. Verification: bin/verify-patches: 33/33 clean Full feature matrix regression sweep pending re-run after this commit. Related: runtime PR waybarrios#265 (waybarrios#265) fixed the CompletionRequest schema side of the same bdf7dcc drop; this commit fixes the engine-model side.
|
Populated the tokens in GenerationOutput for multimodal models, on reflection I am not convinced at all my the AI generated tests so I will try and improve these. I think that would be best is more sort of dummy model that just returns the tokens And yes returning per-token log_probs would be great! |
|
Thanks — the missing MLLM path was the only real hole I was worried about, and that is closed now that I still agree the current tests are weaker than ideal because they only exercise the text path, but for a fix this small I would treat the stronger dummy-model / MLLM-path test shape as a follow-up rather than a merge blocker. Same for per-token logprobs: useful, but separate scope. From my side this now looks clean enough to merge. |
Sounds good to me, feel free to merge as it doesn't look like I am able to |
Encountered this while first investigating vllm-mlx for for RL training, I want to be able to see the tokens produced via BatchedEngine.generate. This PR was written exclusively with Claude so please let me know if you have any suggestions and I will implement them manually 😃. I also will try to implement and upstream any RL-related features that are missing.
Thank you for the project, it has been very useful!
The below is AI-generated:
Summary
BatchedEngine.generate()always returnedtokens=[]despiteAsyncEngineCoretrackingoutput_token_idsper requestoutput.output_token_idsthrough toGenerationOutputtests/test_batched_engine.pywith three unit tests covering thetokensfield and other output fieldsRoot cause
In
engine/batched.py, theGenerationOutputwas constructed without thetokensfield:Test plan
test_tokens_field_is_populated— verifies token IDs are forwarded correctlytest_tokens_field_empty_when_no_tokens_generated— verifies empty list is handledtest_other_output_fields_still_populated— verifies no regression on existing fields🤖 Generated with Claude Code