[stepwise] Plumb through step-wise training for fully async by CharlieFRuan · Pull Request #1536 · NovaSky-AI/SkyRL

CharlieFRuan · 2026-04-20T17:36:34Z

Some minimal changes to enable step-wise training + fully async.

The only change needed is to make uids.extend([cur_generated_output_group.uid] * group_size)'s group_size per-group. Since each group can have variable number of generator output entries as the number of turns vary.

In addition, we add step_wise flag to concatenate_generator_outputs so that the validate_generator_output call can validate step_wise constraints when applicable.

We ran search-r1 with fully async and step-wise, with the following commands on 1x8xH100s

STEP_WISE=true \
USE_CONVERSATION_MULTI_TURN=true \
bash examples/train/search/run_search_fully_async.sh

Retrieval server (launched first, on GPUs 0-3):

CUDA_VISIBLE_DEVICES=0,1,2,3 python
  examples/train/search/retriever/retrieval_server.py \
  --index_path /path/to/searchR1/e5_Flat.index \
  --corpus_path /path/to/searchR1/wiki-18.jsonl \
  --topk 3 --retriever_name e5 --retriever_model intfloat/e5-base-v2 --faiss_gpu

The grey curve is what we have above.

This is compared against the curves we had in #1529. The curve grows slower because the grey one has train_batch_size=256 while the other ones have 512. So the learning is at a similar pace.

gemini-code-assist

Code Review

This pull request introduces support for step-wise trajectories by adjusting group size calculations in the trainer and propagating a step_wise flag through the generator output concatenation and validation logic. Feedback indicates that the rollout_metrics calculation should also be updated to respect this flag, as it currently produces turn-level statistics instead of the expected trajectory-level metrics when in step-wise mode.

gemini-code-assist · 2026-04-20T17:39:14Z


    num_prompts = len(result["prompt_token_ids"])
-    validate_generator_output(num_prompts, result)
+    validate_generator_output(num_prompts, result, step_wise=step_wise)


While the step_wise flag is now correctly passed to validate_generator_output, the rollout_metrics re-calculation performed just above (at line 276 in the source) does not account for step-wise trajectories. In step-wise mode, result["response_ids"] contains individual turns, meaning get_rollout_metrics will compute turn-level statistics (e.g., average tokens per turn) instead of trajectory-level statistics.

Consider leveraging the new step_wise flag to filter or aggregate metrics appropriately (e.g., by filtering for is_last_step or grouping by trajectory_id) to ensure that the reported rollout_metrics are consistent with trajectory-level expectations.

will leave as future TODO

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

[stepwise] Plumb through step-wise training for fully async

5e6ffc6

gemini-code-assist Bot reviewed Apr 20, 2026

View reviewed changes

devin-ai-integration Bot reviewed Apr 20, 2026

View reviewed changes

CharlieFRuan mentioned this pull request Apr 20, 2026

[train][StepWise] Feature parity of step-wise training and default (non step-wise) training, and improvements #1278

Open

CharlieFRuan merged commit 4c7d6d3 into NovaSky-AI:main Apr 20, 2026
5 of 6 checks passed

CharlieFRuan deleted the async-stepwise branch April 20, 2026 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[stepwise] Plumb through step-wise training for fully async#1536

[stepwise] Plumb through step-wise training for fully async#1536
CharlieFRuan merged 1 commit intoNovaSky-AI:mainfrom
CharlieFRuan:async-stepwise

CharlieFRuan commented Apr 20, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Uh oh!

CharlieFRuan Apr 20, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

CharlieFRuan commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

CharlieFRuan Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

CharlieFRuan commented Apr 20, 2026 •

edited

Loading