[harbor][step-wise] Make Harbor use step-wise training by CharlieFRuan · Pull Request #1542 · NovaSky-AI/SkyRL

CharlieFRuan · 2026-04-21T02:43:42Z

Before this PR, when training with Harbor, we rely on Harbor returning an all_messages field which contains a string chat history. We then re-tokenize it in SkyRL, compute loss masks, and feed it to the trainer via GeneratorOutput.

This causes re-tokenization issues, and prevent us from doing fully async training (which requires logprobs for algorithmic correction, and logprobs will not match upon re-tokenization drift).

This PR makes HarborGenerator perform step-wise training.

We rely on setting collect_rollout_details: true (already did in harbor_trial_config/default.yaml). Harbor will then do per-turn book keeping. For each LLM invocation (i.e. turn), Harbor will record prompt_token_ids, completion_token_ids, and logprobs.

Then, by setting the following configs in SkyRL, we can perform step-wise training while merging when possible:

  generator.step_wise_trajectories=true \
  generator.merge_stepwise_output=true \

We also add a fully async script: examples/train_integrations/harbor/run_codecontest_fully_async.sh

Curve comparison

https://wandb.ai/sky-posttraining-uc-berkeley/harbor/reports/PR1542-Harbor-step-wise-training-in-SkyRL--VmlldzoxNjYzNDU0Mg

Blue: this PR's sync training (with token_mean)
Pink: before this PR (with seq_mean_token_sum_norm, everything else the same)
Red: this PR fully async (with train_batch_size=mini_batch_size=16)

Besides, with merge_stepwise_output, we can shrink ~1000 sequences to ~300 sequences, improving training efficiencies. For sync run, it is 256 sequences (batch size 32 * 8) if everything can be merged. Number of sequences unmerged is roughly 256 * avg_num_turns. For fully async, similar things except it has 128 sequences (batch size 16 * 8). Related PR: #1538. Note this merging is much better than PR 1538's test on Qwen2.5 for search-r1 (which suffered a lot from retokenization drift for which Qwen2.5 is not familiar with). Also, for where merging fails, here is a report: https://gist.github.com/CharlieFRuan/b91cecfe891f9458c455b6f5e2f6af1d

CharlieFRuan · 2026-04-23T00:35:30Z

/gemini review

gemini-code-assist

Code Review

This pull request implements step-wise training for Harbor integrations, transitioning from chat history extraction to per-turn rollout detail collection. It introduces a fully asynchronous training entry point, updates the HarborGenerator to handle step-wise outputs, and refines metric aggregation to preserve custom generator statistics. Review feedback highlights the need for more robust metric aggregation logic that avoids string-based heuristics and suggests handling multi-segment rollouts more flexibly to prevent potential crashes from hard assertions.

This comment was marked as resolved.

Sign in to view

This was referenced Apr 21, 2026

[WIP][Harbor] Add step-wise training for Harbor #1257

Closed

[train][StepWise] Feature parity of step-wise training and default (non step-wise) training, and improvements #1278

Open

CharlieFRuan force-pushed the harbor-stepwise branch 3 times, most recently from 97704e9 to 6f9de61 Compare April 21, 2026 23:05

This comment was marked as resolved.

Sign in to view

CharlieFRuan mentioned this pull request Apr 22, 2026

[train] Fix rollout metrics for step-wise and custom generators (sync / fully async) #1556

Draft

3 tasks

CharlieFRuan force-pushed the harbor-stepwise branch from 802d468 to 949a8aa Compare April 23, 2026 00:09

CharlieFRuan added 7 commits April 23, 2026 00:21

[Harbor] Make harbor generator step-wise

f5a92b1

lint

b4e65da

format

37783ff

updates

7766b73

updates for fully async

b44cf29

misc

3f48bbe

rebase

1d59d25

CharlieFRuan force-pushed the harbor-stepwise branch from 1d3df8c to 1d59d25 Compare April 23, 2026 00:22

CharlieFRuan mentioned this pull request Apr 23, 2026

support trajectory-based agents and OpenHands (#1184) #1548

Merged

doc updates

22c4ecb

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread skyrl/train/generators/utils.py

Comment thread examples/train_integrations/harbor/harbor_generator.py

CharlieFRuan merged commit a90cd1d into main Apr 23, 2026
5 of 6 checks passed

CharlieFRuan deleted the harbor-stepwise branch April 23, 2026 01:07

This was referenced Apr 23, 2026

[Harbor] Support fully async training and step-wise training #1206

Closed

[Tracker] SkyRL-train + Harbor integration #744

Closed

[Harbor] Support agents beyond terminus 2 such as open hands #1184

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[harbor][step-wise] Make Harbor use step-wise training#1542

[harbor][step-wise] Make Harbor use step-wise training#1542
CharlieFRuan merged 8 commits intomainfrom
harbor-stepwise

CharlieFRuan commented Apr 21, 2026 •

edited

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

CharlieFRuan commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CharlieFRuan commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Curve comparison

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

CharlieFRuan commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CharlieFRuan commented Apr 21, 2026 •

edited

Loading