Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/content/docs/tutorials/step-wise-training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ class GeneratorConfig(BaseConfig):

## GeneratorOutput Format

Normally, each element in `GeneratorOutput` (i.e. `response_ids[i]`, `prompt_token_ids[i]`, `rewards[i]`, etc.) represents a single trajectory. With step-wise training, each element instead represents a single **step** (one LLM turn within a trajectory). A trajectory with 3 turns produces 3 elements rather than 1.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This explanation is very helpful. To make it even more precise and avoid potential confusion, you could clarify that this per-step/per-trajectory structure applies specifically to the list-based fields in GeneratorOutput. The GeneratorOutput TypedDict also contains non-list fields like rollout_metrics, which are aggregated for the entire batch and don't follow this pattern. Specifying this distinction will make the documentation more robust.

Normally, for the list-based fields in `GeneratorOutput` (e.g., `response_ids`, `prompt_token_ids`, `rewards`), each element represents a single trajectory. With step-wise training, each element instead represents a single **step** (one LLM turn within a trajectory). A trajectory with 3 turns produces 3 elements rather than 1.


The `GeneratorOutput` TypedDict is defined in [skyrl/train/generators/base.py](https://github.com/NovaSky-AI/SkyRL/blob/main/skyrl/train/generators/base.py):

```python
Expand Down