Skip to content

[stepwise] Plumb through step-wise training for fully async#1536

Merged
CharlieFRuan merged 1 commit intoNovaSky-AI:mainfrom
CharlieFRuan:async-stepwise
Apr 20, 2026
Merged

[stepwise] Plumb through step-wise training for fully async#1536
CharlieFRuan merged 1 commit intoNovaSky-AI:mainfrom
CharlieFRuan:async-stepwise

Conversation

@CharlieFRuan
Copy link
Copy Markdown
Member

@CharlieFRuan CharlieFRuan commented Apr 20, 2026

Some minimal changes to enable step-wise training + fully async.

The only change needed is to make uids.extend([cur_generated_output_group.uid] * group_size)'s group_size per-group. Since each group can have variable number of generator output entries as the number of turns vary.

In addition, we add step_wise flag to concatenate_generator_outputs so that the validate_generator_output call can validate step_wise constraints when applicable.

We ran search-r1 with fully async and step-wise, with the following commands on 1x8xH100s

STEP_WISE=true \
USE_CONVERSATION_MULTI_TURN=true \
bash examples/train/search/run_search_fully_async.sh

Retrieval server (launched first, on GPUs 0-3):

CUDA_VISIBLE_DEVICES=0,1,2,3 python
  examples/train/search/retriever/retrieval_server.py \
  --index_path /path/to/searchR1/e5_Flat.index \
  --corpus_path /path/to/searchR1/wiki-18.jsonl \
  --topk 3 --retriever_name e5 --retriever_model intfloat/e5-base-v2 --faiss_gpu

The grey curve is what we have above.

image

This is compared against the curves we had in #1529. The curve grows slower because the grey one has train_batch_size=256 while the other ones have 512. So the learning is at a similar pace.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for step-wise trajectories by adjusting group size calculations in the trainer and propagating a step_wise flag through the generator output concatenation and validation logic. Feedback indicates that the rollout_metrics calculation should also be updated to respect this flag, as it currently produces turn-level statistics instead of the expected trajectory-level metrics when in step-wise mode.


num_prompts = len(result["prompt_token_ids"])
validate_generator_output(num_prompts, result)
validate_generator_output(num_prompts, result, step_wise=step_wise)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the step_wise flag is now correctly passed to validate_generator_output, the rollout_metrics re-calculation performed just above (at line 276 in the source) does not account for step-wise trajectories. In step-wise mode, result["response_ids"] contains individual turns, meaning get_rollout_metrics will compute turn-level statistics (e.g., average tokens per turn) instead of trajectory-level statistics.

Consider leveraging the new step_wise flag to filter or aggregate metrics appropriately (e.g., by filtering for is_last_step or grouping by trajectory_id) to ensure that the reported rollout_metrics are consistent with trajectory-level expectations.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will leave as future TODO

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@CharlieFRuan CharlieFRuan merged commit 4c7d6d3 into NovaSky-AI:main Apr 20, 2026
5 of 6 checks passed
@CharlieFRuan CharlieFRuan deleted the async-stepwise branch April 20, 2026 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant