[harbor][step-wise] Make Harbor use step-wise training#1542
Merged
CharlieFRuan merged 8 commits intomainfrom Apr 23, 2026
Merged
[harbor][step-wise] Make Harbor use step-wise training#1542CharlieFRuan merged 8 commits intomainfrom
CharlieFRuan merged 8 commits intomainfrom
Conversation
This was referenced Apr 21, 2026
97704e9 to
6f9de61
Compare
3 tasks
802d468 to
949a8aa
Compare
1d3df8c to
1d59d25
Compare
Member
Author
|
/gemini review |
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements step-wise training for Harbor integrations, transitioning from chat history extraction to per-turn rollout detail collection. It introduces a fully asynchronous training entry point, updates the HarborGenerator to handle step-wise outputs, and refines metric aggregation to preserve custom generator statistics. Review feedback highlights the need for more robust metric aggregation logic that avoids string-based heuristics and suggests handling multi-segment rollouts more flexibly to prevent potential crashes from hard assertions.
This was referenced Apr 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before this PR, when training with Harbor, we rely on Harbor returning an
all_messagesfield which contains a string chat history. We then re-tokenize it in SkyRL, compute loss masks, and feed it to the trainer viaGeneratorOutput.This causes re-tokenization issues, and prevent us from doing fully async training (which requires
logprobsfor algorithmic correction, andlogprobswill not match upon re-tokenization drift).This PR makes
HarborGeneratorperform step-wise training.We rely on setting
collect_rollout_details: true(already did inharbor_trial_config/default.yaml). Harbor will then do per-turn book keeping. For each LLM invocation (i.e. turn), Harbor will recordprompt_token_ids,completion_token_ids, andlogprobs.Then, by setting the following configs in SkyRL, we can perform step-wise training while merging when possible:
We also add a fully async script:
examples/train_integrations/harbor/run_codecontest_fully_async.shCurve comparison
https://wandb.ai/sky-posttraining-uc-berkeley/harbor/reports/PR1542-Harbor-step-wise-training-in-SkyRL--VmlldzoxNjYzNDU0Mg
token_mean)seq_mean_token_sum_norm, everything else the same)train_batch_size=mini_batch_size=16)Besides, with
merge_stepwise_output, we can shrink ~1000 sequences to ~300 sequences, improving training efficiencies. For sync run, it is 256 sequences (batch size 32 * 8) if everything can be merged. Number of sequences unmerged is roughly256 * avg_num_turns. For fully async, similar things except it has 128 sequences (batch size 16 * 8). Related PR: #1538. Note this merging is much better than PR 1538's test on Qwen2.5 for search-r1 (which suffered a lot from retokenization drift for which Qwen2.5 is not familiar with). Also, for where merging fails, here is a report: https://gist.github.com/CharlieFRuan/b91cecfe891f9458c455b6f5e2f6af1d