[qoc] Make concatenate_generator_outputs linear instead of O(K^2)#1535
Merged
CharlieFRuan merged 3 commits intomainfrom Apr 20, 2026
Merged
[qoc] Make concatenate_generator_outputs linear instead of O(K^2)#1535CharlieFRuan merged 3 commits intomainfrom
CharlieFRuan merged 3 commits intomainfrom
Conversation
The previous ``sum([go[key] for go in gens], [])`` pattern repeatedly rebuilds the running concatenation, making the flattening step O(K^2 * L̄) in the number of GeneratorOutputs. Replace with an explicit extend loop (O(N_total)). No behavior change, no signature change. Benchmarked with 8 trajectories per GeneratorOutput, 64k-token response_ids / loss_masks / rollout_logprobs, six flatten calls per concat: K=128 (1024 trajectories): 0.6ms -> 0.1ms ( 7.6x) K=512 (4096 trajectories): 8.6ms -> 0.2ms (49.8x) The speedup grows quadratically with K, which matters when concat is called on per-trajectory chunks (e.g. the prefix-aware merging in #1532). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c8d3259 to
4854b93
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
`concatenate_generator_outputs` used the `sum([go[key] for go in gens], [])` pattern to flatten each list-valued field, which is O(K² · L̄) — every `+` copies the running result. Replace with an explicit extend loop (O(N_total)). No signature change; no behavior change (existing test passes unmodified).
Also move related tests to
tests/train/generators/test_generator_output_utils.pyBenchmark
Config: 8 trajectories per GeneratorOutput, 64k-token `response_ids`/`loss_masks`/`rollout_logprobs`, six flatten calls per concat (as currently done).
Speedup grows quadratically with K. This matters when concat is called on per-trajectory chunks — e.g. the prefix-aware merging work in #1532 calls `concatenate_generator_outputs` with K = number of trajectories (~2560 in a typical SearchR1 run), which would extrapolate to ~200 ms under the old path and sub-millisecond under the new one.
Test plan
🤖 Generated with Claude Code