Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
e7cdce1
refactor(rollout): migrate to nested config structure without backwar…
szrlee Nov 2, 2025
7331f00
fix(config): add required blank line after rollout_correction
szrlee Nov 2, 2025
f763140
fix(rollout_corr): prevent silent failure when rollout_rs enabled wit…
szrlee Nov 2, 2025
0fe72c5
feat(rollout_corr): add bypass mode to skip old_log_prob computation
szrlee Nov 3, 2025
88a96e3
refactor(rollout_corr): migrate to typed config with preset methods
szrlee Nov 3, 2025
87ebeff
feat(rollout_corr): add χ² divergence and remove mismatch prefix
szrlee Nov 3, 2025
dbce569
docs(rollout_corr): add mathematical formulations for general off-pol…
szrlee Nov 3, 2025
dd47353
fix(rollout_corr): improve table precision and LaTeX consistency in m…
szrlee Nov 3, 2025
8d0bf4b
docs(rollout_corr): fix diagnostic metrics notation
szrlee Nov 3, 2025
5cacf20
docs(rollout_corr): remove bias/variance claims and improve precision
szrlee Nov 3, 2025
17b288e
docs(rollout_corr_math): update backend example from PyTorch to SGLang
szrlee Nov 6, 2025
8b12aaf
fix(config): add _target_ to rollout_correction example, set geo_mis …
szrlee Nov 4, 2025
ec84848
docs(rollout_corr): improve math doc - fix notation, add decoupled PP…
szrlee Nov 4, 2025
563c1ba
docs(rollout_corr): align usage guide terminology with math foundations
szrlee Nov 4, 2025
5fa5208
docs(rollout_corr): clarify decoupled PPO as batch size invariance so…
szrlee Nov 4, 2025
f1764e4
docs(rollout_corr): replace subjective recommendations with objective…
szrlee Nov 4, 2025
567fd71
docs(rollout_corr_math): fix notation - use prox and C_* constants
szrlee Nov 6, 2025
29a99ca
docs(rollout_corr): fix documentation issues and align with math form…
szrlee Nov 5, 2025
5289e57
docs(rollout_corr): fix links, terminology, and clarify PPO is not at…
szrlee Nov 5, 2025
81f6cb1
docs(rollout_corr): align usage guide with mathematical formulations
szrlee Nov 5, 2025
9b3468d
fix(rollout_corr): compute metrics before truncation and improve conf…
szrlee Nov 5, 2025
72d6f3f
config(ppo_megatron_trainer): align rollout_correction with ppo_trainer
szrlee Nov 5, 2025
5afac7a
feat(rollout_corr): implement use_pure_rollout_correction parameter
szrlee Nov 5, 2025
bc48668
docs(rollout_corr): add missing geometric mode in mode combinations t…
szrlee Nov 5, 2025
c7a8305
docs(rollout_corr): fix bypass mode example nesting
szrlee Nov 5, 2025
1d720d1
docs(rollout_corr): remove subjective 'Applicable scenarios' claims
szrlee Nov 5, 2025
106b4e6
docs(rollout_corr): remove subjective guidance from metrics documenta…
szrlee Nov 5, 2025
67b1375
feat: move algorithm.rollout_correction to a config snippet
tongyx361 Nov 6, 2025
1b873fc
chore: pre-commit
tongyx361 Nov 6, 2025
e869d29
Merge pull request #2 from tongyx361/tyx/feat/off-policy-config-snippet
szrlee Nov 6, 2025
b402bb7
refactor(rollout_corr): remove unnecessary config restoration in pure…
szrlee Nov 6, 2025
60be593
fix: rollout_correction
tongyx361 Nov 6, 2025
5426daa
feat: clean ray_trainer
tongyx361 Nov 6, 2025
f7b2993
refactor(rollout_corr): simplify geometric mode metrics and clarify m…
szrlee Nov 7, 2025
a264c85
docs(rollout_corr): emphasize general off-policy problems and cite mo…
szrlee Nov 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/advance/fully_async.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,10 +166,10 @@ https://github.com/ArronHZG/verl-community/blob/recipe/async_policy/docs/fully_a

During the training process, we observed that metrics and response lengths may become unstable in the later
stages of training. To mitigate this issue, we can use
the [Rollout Importance Sampling](https://verl.readthedocs.io/en/latest/advance/rollout_is.html)
technique for importance sampling. To utilize Rollout Importance Sampling, we need to compute log_prob using
the [Rollout Correction](https://verl.readthedocs.io/en/latest/advance/rollout_corr.html)
technique for importance sampling and rejection sampling. To utilize Rollout Correction, we need to compute log_prob using
the training engine, which requires enabling this switch.
Additionally, when compute_prox_log_prob and Rollout Importance Sampling are enabled under mode d
Additionally, when compute_prox_log_prob and Rollout Correction are enabled under mode d
(async stream pipeline with partial rollout), our implementation approximates `Areal's Decoupled PPO`.

### Supported Modes
Expand Down
Loading
Loading