fix step batch rollout#1
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces step_scheduler and post_decode methods to vllm_omni_rollout_adapter.py to support trajectory collection (latents, log-probabilities, and timesteps) during step-by-step execution, ensuring consistency with the standard rollout behavior. It also updates prepare_encode to initialize SDE-related parameters and state variables. A review comment identifies that the default sde_window end index excludes the final denoising step, which would lead to an incomplete trajectory for RL training, and suggests extending the range to include all timesteps.
| ).item() | ||
| sde_window = (start, start + sde_window_size) | ||
| else: | ||
| sde_window = (0, len(timesteps) - 1) |
There was a problem hiding this comment.
The default sde_window end index is set to len(timesteps) - 1, which excludes the final denoising step from the collected trajectory. In the diffuse loop (and the new step_scheduler), the condition i < sde_window[1] is used to collect latents and log-probabilities. By ending at len(timesteps) - 1, the last latent (the final output) and the last step's log-probability are not captured in all_latents and all_log_probs, leading to an incomplete trajectory for RL training. This should be changed to len(timesteps) to cover the full range of steps. Note that this same logic exists in the forward method (line 772) and should ideally be updated there as well for consistency, although that line is outside the current diff hunks.
| sde_window = (0, len(timesteps) - 1) | |
| sde_window = (0, len(timesteps)) |
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,vllm_omni,rollout,trainer,ci,training_utils,recipe,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,diffusion,omni,tests,docker,like[diffusion, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][diffusion, fsdp] feat: new rollout schedulerTest
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always