[train] Add validation for step-wise GeneratorOutput by CharlieFRuan · Pull Request #1281 · NovaSky-AI/SkyRL

CharlieFRuan · 2026-03-05T07:34:50Z

Alternatively we do not care the step order and not use the cumsum trick in advantage broadcast

Summary

Previously, validate_generator_output() was skipped entirely when step_wise_trajectories=True:

if not self.cfg.generator.step_wise_trajectories:
    validate_generator_output(len(input_batch["prompts"]), generator_output)

This meant step-wise generator outputs had no validation at all — malformed is_last_step, missing trajectory_ids, or non-contiguous trajectory ordering would silently produce wrong training results.

The non-contiguous case is particularly dangerous: the trainer's advantage broadcast uses a cumsum trick that assumes all steps of the same trajectory are adjacent in the batch. If steps are interleaved across trajectories, advantages are silently mapped to the wrong steps with no error.

Changes

skyrl/train/utils/trainer_utils.py

Added step_wise: bool = False parameter to validate_generator_output() (backward compatible — existing callers are unaffected)
Extracted _validate_step_wise_fields() for step-wise specific checks:
- is_last_step and trajectory_ids are present and correctly sized
- is_last_step[-1] is True (last sample must be a trajectory's final step)
- Contiguous ordering: all steps of the same trajectory are adjacent (catches the silent cumsum bug)
- Boundary alignment: is_last_step[i] is True wherever (and only when) trajectory_ids changes between consecutive samples
In step-wise mode, num_prompts != num_responses is allowed (step expansion is expected)

skyrl/train/trainer.py

Changed from skipping validation to calling with step_wise=True:

validate_generator_output(
    len(input_batch["prompts"]),
    generator_output,
    step_wise=self.cfg.generator.step_wise_trajectories,
)

tests/train/test_trainer_utils.py

9 new tests covering all step-wise validation cases

Test plan

pytest tests/train/test_trainer_utils.py — all 44 tests pass (35 existing + 9 new)
Existing non-step-wise validation tests unaffected (backward compatible step_wise=False default)
New tests cover: valid output, single-step trajectories, missing fields, length mismatches, non-contiguous ordering, boundary misalignment, all-False is_last_step

E2E test

Ran the multi-turn gsm8k example E2E. Made sure it is indeed multi-turn since generate/batch_num_seq is ~6800 rather than 2560 (512 * 5)

  # Run training (script defaults to 1 GPU, override for 8 GPU + step-wise multi-turn)                             
  bash examples/train/turn_level_rewards/run_gsm8k_multi_turn.sh \
    generator.step_wise_trajectories=true \
    generator.use_conversation_multi_turn=true \
    generator.max_turns=5 \

gemini-code-assist

Code Review

The pull request introduces crucial validation for step-wise generator outputs, which was previously missing. This addresses potential silent errors in training results due to malformed or non-contiguous trajectory data, particularly important for the advantage broadcast mechanism. The changes involve modifying validate_generator_output to handle step-wise specific checks and extracting these into a new helper function _validate_step_wise_fields. Comprehensive unit tests have been added to cover various validation scenarios, significantly improving the robustness of the training pipeline.

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-03-19T16:46:44Z

🟡 Step-wise avg_response_length divides by total steps instead of number of trajectories

At skyrl/train/trainer.py:679-683, the step-wise avg_response_length calculation sums response lengths only for steps where is_last_step=True, but divides by len(response_ids) (total number of steps across all trajectories). For example, with 2 trajectories having 3 and 2 steps respectively, and last-step response lengths of 10 and 8: numerator = 18, denominator = 5, result = 3.6 — instead of the correct 9.0. The denominator should be the number of trajectories (i.e., sum(1 for x in generator_output["is_last_step"] if x)). Compare with the non-step-wise version at line 685-687 which correctly uses len(response_ids) as denominator when summing all responses.

(Refers to lines 679-683)

Was this helpful? React with 👍 or 👎 to provide feedback.

will fix in a separate PR

Fixes comment #1281 (comment) There was no good reason to only account for `is_last_step` since response IDs are not cumulative in step-wise training (unlike input tokens, which are cumulative)

Alternatively we do not care the step order and not use the `cumsum` trick in advantage broadcast ### Summary Previously, `validate_generator_output()` was **skipped entirely** when `step_wise_trajectories=True`: ```python if not self.cfg.generator.step_wise_trajectories: validate_generator_output(len(input_batch["prompts"]), generator_output) ``` This meant step-wise generator outputs had no validation at all — malformed `is_last_step`, missing `trajectory_ids`, or non-contiguous trajectory ordering would silently produce wrong training results. The non-contiguous case is particularly dangerous: the trainer's advantage broadcast uses a `cumsum` trick that assumes all steps of the same trajectory are adjacent in the batch. If steps are interleaved across trajectories, advantages are silently mapped to the wrong steps with no error. ### Changes **`skyrl/train/utils/trainer_utils.py`** - Added `step_wise: bool = False` parameter to `validate_generator_output()` (backward compatible — existing callers are unaffected) - Extracted `_validate_step_wise_fields()` for step-wise specific checks: - `is_last_step` and `trajectory_ids` are present and correctly sized - `is_last_step[-1]` is `True` (last sample must be a trajectory's final step) - **Contiguous ordering**: all steps of the same trajectory are adjacent (catches the silent `cumsum` bug) - **Boundary alignment**: `is_last_step[i]` is `True` wherever (and only when) `trajectory_ids` changes between consecutive samples - In step-wise mode, `num_prompts != num_responses` is allowed (step expansion is expected) **`skyrl/train/trainer.py`** - Changed from skipping validation to calling with `step_wise=True`: ```python validate_generator_output( len(input_batch["prompts"]), generator_output, step_wise=self.cfg.generator.step_wise_trajectories, ) ``` **`tests/train/test_trainer_utils.py`** - 9 new tests covering all step-wise validation cases ### Test plan - [x] `pytest tests/train/test_trainer_utils.py` — all 44 tests pass (35 existing + 9 new) - [x] Existing non-step-wise validation tests unaffected (backward compatible `step_wise=False` default) - [x] New tests cover: valid output, single-step trajectories, missing fields, length mismatches, non-contiguous ordering, boundary misalignment, all-False `is_last_step` ### E2E test Ran the multi-turn gsm8k example E2E. Made sure it is indeed multi-turn since `generate/batch_num_seq` is ~6800 rather than 2560 (512 * 5) ```bash # Run training (script defaults to 1 GPU, override for 8 GPU + step-wise multi-turn) bash examples/train/turn_level_rewards/run_gsm8k_multi_turn.sh \ generator.step_wise_trajectories=true \ generator.use_conversation_multi_turn=true \ generator.max_turns=5 \ ```

Fixes comment #1281 (comment) There was no good reason to only account for `is_last_step` since response IDs are not cumulative in step-wise training (unlike input tokens, which are cumulative)

CharlieFRuan force-pushed the feature/stepwise-validation branch 3 times, most recently from 3eee05b to 5e94393 Compare March 19, 2026 06:29

[StepWise] Validate GeneratorOutput for step wise training

b19a9b5

CharlieFRuan force-pushed the feature/stepwise-validation branch from 5e94393 to b19a9b5 Compare March 19, 2026 16:35

CharlieFRuan marked this pull request as ready for review March 19, 2026 16:44

gemini-code-assist Bot reviewed Mar 19, 2026

View reviewed changes

Comment thread skyrl/train/evaluate.py

devin-ai-integration Bot reviewed Mar 19, 2026

View reviewed changes

CharlieFRuan merged commit 25d097a into NovaSky-AI:main Mar 19, 2026
4 of 7 checks passed

CharlieFRuan deleted the feature/stepwise-validation branch March 19, 2026 17:01

This was referenced Mar 19, 2026

[train][StepWise] Feature parity of step-wise training and default (non step-wise) training, and improvements #1278

Open

[StepWise] Trivial fix to avg_response_length metric #1351

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[train] Add validation for step-wise GeneratorOutput#1281

[train] Add validation for step-wise GeneratorOutput#1281
CharlieFRuan merged 1 commit intoNovaSky-AI:mainfrom
CharlieFRuan:feature/stepwise-validation

CharlieFRuan commented Mar 5, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Mar 19, 2026

Uh oh!

CharlieFRuan Mar 19, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CharlieFRuan commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

E2E test

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CharlieFRuan commented Mar 5, 2026 •

edited

Loading