[doc] chore: supply documentation for flowgrpo training#2
Conversation
There was a problem hiding this comment.
Code Review
This pull request adds support for Flow-GRPO, a technique for online reinforcement learning in flow matching models like Stable Diffusion 3 and FLUX. The contribution includes detailed documentation on the algorithm and configuration, a quickstart guide for OCR tasks using Qwen-Image, and example scripts for both synchronous and asynchronous reward evaluation. Reviewer feedback suggests updating the documentation to accurately reflect GPU requirements for the asynchronous variant and clarifying that throughput metrics in the performance table are reported on a per-GPU basis.
|
may add |
Added, also added |
|
|
||
| 2. **Denoising Reduction**: Training on all denoising steps is expensive. Flow-GRPO reduces the number of *training* steps while keeping the original number of *inference* steps, significantly improving sampling efficiency without sacrificing reward performance. | ||
|
|
||
| Empirically, RL-tuned SD3.5-M with Flow-GRPO raises GenEval accuracy from 63% to 95% and visual text rendering accuracy from 59% to 92%. |
There was a problem hiding this comment.
I drawed an algorithm figure. Let me attach later
| $HOME/models/Qwen/Qwen-Image | ||
| $HOME/models/Qwen/Qwen-Image/tokenizer | ||
| $HOME/models/Qwen/Qwen3-VL-8B-Instruct |
There was a problem hiding this comment.
Why do we have to save the models manually to $HOME folder? Can we juse load pretraiend weights from HF cache folder? btw, for some docker env, the home folder has limited space, we should also change to $WORKSPACE
There was a problem hiding this comment.
moved to $WORKSPACE.
For the loading:
(1) reward model (i.e. Qwen3-VL-8B-Instruct): can be loaded from cache folder directly. The docs/scripts have been updated to reflect this.
(2) Qwen-Image model: a quick investigation indicated that an upstream function from verl (utils/fs.py copy_to_local) use_shm logic blocks the cache loading. Thus here we instruct the user to predownload the model for now
There was a problem hiding this comment.
but why we can just actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct as mentioned in https://verl.readthedocs.io/en/latest/start/quickstart.html ?
|
|
||
| git clone https://github.com/verl-project/verl-omni.git | ||
| cd verl-omni | ||
| pip install -e . |
There was a problem hiding this comment.
I think we should install vllm and vllm-omni before this.
The correct installtion order should be vllm -> vllm-omni -> verl -> verl-omni (after #3 )
There was a problem hiding this comment.
Updated, per communication with @zhtmike . No need for a separate verl installment
There was a problem hiding this comment.
For CI: vllm -> vllm-omni -> verl (main) -> verl-omni (main)
For Users: vllm -> vllm-omni -> verl-omni (certain version e.g., 0.8.0, then it will install verl==0.8.0 as dependency automatically )
There was a problem hiding this comment.
And I think vllm -> vllm-omni is an annoying bug, should be fixed in vllm-omni side
There was a problem hiding this comment.
why not pip install pip install git+https://xxxx @AndyZhou952 ?
There was a problem hiding this comment.
why not pip install
pip install git+https://xxxx@AndyZhou952 ?
Good catch, just made the update
There was a problem hiding this comment.
Update 0423: Temporary in the doc we instruct the user to install via the following order vllm (0.18) -> vllm-omni (0.18) -> verl (latest commit branch here) -> verl-omni (main).
| reward_path=examples/flowgrpo_trainer/reward_fn.py | ||
| reward_model_name=$HOME/models/Qwen/Qwen3-VL-8B-Instruct | ||
| # Can also be an HF Hub model ID, e.g. "Qwen/Qwen3-VL-8B-Instruct" | ||
| reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct} |
There was a problem hiding this comment.
| reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct} | |
| reward_model_name=Qwen/Qwen3-VL-8B-Instruct |
There was a problem hiding this comment.
it's more common and cleaner. users can modify it to custom folder if they want
There was a problem hiding this comment.
If doing this way, I think the training model can be load from huggingface cache also, just set use_shm=False ?
There was a problem hiding this comment.
For the training model you mean Qwen-Image? I think based on the current logic need to set use_shm=True, see https://github.com/verl-project/verl/blob/main/verl/utils/fs.py#L214 from DiffusionModelConfig.__post_init__
There was a problem hiding this comment.
0423 update: set reward_model_name=Qwen/Qwen3-VL-8B-Instruct and updated the READMEs.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces comprehensive documentation and example scripts for FlowGRPO training within the verl-omni framework. Key additions include a full Sphinx-based documentation suite, a quickstart guide for OCR-style image generation tasks, and example scripts for both synchronous and asynchronous reward processing. The review feedback correctly identifies a module naming inconsistency in the asynchronous training script and outdated command-line arguments in the README that do not match the updated data preprocessing script.
| REWARD_TP=1 | ||
|
|
||
|
|
||
| python3 -m verl.trainer.main_flowgrpo \ |
There was a problem hiding this comment.
The module name in the python3 -m command is incorrect. It should be verl_omni.trainer.main_flowgrpo to match the package structure defined in this repository and used in the synchronous example script (run_qwen_image_ocr_lora.sh).
| python3 -m verl.trainer.main_flowgrpo \ | |
| python3 -m verl_omni.trainer.main_flowgrpo \ |
There was a problem hiding this comment.
updated & updated in docs
| python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \ | ||
| --local_dataset_path $HOME/data/ocr \ | ||
| --local_save_dir $HOME/data/ocr |
There was a problem hiding this comment.
The example command for data preprocessing uses argument names (--local_dataset_path, --local_save_dir) that were renamed to --input_dir and --output_dir in the qwenimage_ocr.py script within this same PR. Using the old names will result in an error.
| python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \ | |
| --local_dataset_path $HOME/data/ocr \ | |
| --local_save_dir $HOME/data/ocr | |
| python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \ | |
| --input_dir $HOME/data/ocr \ | |
| --output_dir $HOME/data/ocr |
There was a problem hiding this comment.
updated & updated from $HOME to $WORKSPACE in the README.md as well
What does this PR do?
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,vllm_omni,rollout,trainer,ci,training_utils,recipe,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,diffusion,omni,tests,docker,like[diffusion, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][diffusion, fsdp] feat: new rollout schedulerTest
NA
API and Usage Example
NA
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always