verl-project · ISEEKYAN · Nov 7, 2025 · Nov 2, 2025 · Nov 2, 2025 · Nov 2, 2025
@@ -166,10 +166,10 @@ https://github.com/ArronHZG/verl-community/blob/recipe/async_policy/docs/fully_a
 
   During the training process, we observed that metrics and response lengths may become unstable in the later
   stages of training. To mitigate this issue, we can use
-  the [Rollout Importance Sampling](https://verl.readthedocs.io/en/latest/advance/rollout_is.html)
-  technique for importance sampling. To utilize Rollout Importance Sampling, we need to compute log_prob using
+  the [Rollout Correction](https://verl.readthedocs.io/en/latest/advance/rollout_corr.html)
+  technique for importance sampling and rejection sampling. To utilize Rollout Correction, we need to compute log_prob using
   the training engine, which requires enabling this switch.
-  Additionally, when compute_prox_log_prob and Rollout Importance Sampling are enabled under mode d
+  Additionally, when compute_prox_log_prob and Rollout Correction are enabled under mode d
   (async stream pipeline with partial rollout), our implementation approximates `Areal's Decoupled PPO`.
 
 ### Supported Modes