[doc] chore: supply documentation for flowgrpo training by AndyZhou952 · Pull Request #2 · verl-project/verl-omni

AndyZhou952 · 2026-04-22T07:36:22Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Add documentation and a quickstart guide for the diffusion FlowGRPO training.
Add async example training scripts and README for the Qwen-Image OCR task.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, vllm_omni, rollout, trainer, ci, training_utils, recipe, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, diffusion, omni, tests, docker
- If this PR involves multiple modules, separate them with , like [diffusion, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][diffusion, fsdp] feat: new rollout scheduler

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

NA

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

NA

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...

gemini-code-assist

Code Review

This pull request adds support for Flow-GRPO, a technique for online reinforcement learning in flow matching models like Stable Diffusion 3 and FLUX. The contribution includes detailed documentation on the algorithm and configuration, a quickstart guide for OCR tasks using Qwen-Image, and example scripts for both synchronous and asynchronous reward evaluation. Reviewer feedback suggests updating the documentation to accurately reflect GPU requirements for the asynchronous variant and clarifying that throughput metrics in the performance table are reported on a per-GPU basis.

zhtmike · 2026-04-22T07:46:34Z

may add doc.yaml from verl

AndyZhou952 · 2026-04-22T08:08:53Z

may add doc.yaml from verl

Added, also added .readthedocs.yaml etc. related to doc rendering. PTAL

SamitHuang · 2026-04-22T07:56:00Z

+
+2. **Denoising Reduction**: Training on all denoising steps is expensive. Flow-GRPO reduces the number of *training* steps while keeping the original number of *inference* steps, significantly improving sampling efficiency without sacrificing reward performance.
+
+Empirically, RL-tuned SD3.5-M with Flow-GRPO raises GenEval accuracy from 63% to 95% and visual text rendering accuracy from 59% to 92%.


I drawed an algorithm figure. Let me attach later

SamitHuang · 2026-04-22T08:11:17Z

+   $HOME/models/Qwen/Qwen-Image
+   $HOME/models/Qwen/Qwen-Image/tokenizer
+   $HOME/models/Qwen/Qwen3-VL-8B-Instruct


Why do we have to save the models manually to $HOME folder? Can we juse load pretraiend weights from HF cache folder? btw, for some docker env, the home folder has limited space, we should also change to $WORKSPACE

moved to $WORKSPACE.

For the loading:
(1) reward model (i.e. Qwen3-VL-8B-Instruct): can be loaded from cache folder directly. The docs/scripts have been updated to reflect this.
(2) Qwen-Image model: a quick investigation indicated that an upstream function from verl (utils/fs.py copy_to_local) use_shm logic blocks the cache loading. Thus here we instruct the user to predownload the model for now

but why we can just actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct as mentioned in https://verl.readthedocs.io/en/latest/start/quickstart.html ?

SamitHuang · 2026-04-22T09:50:20Z

+
+   git clone https://github.com/verl-project/verl-omni.git
+   cd verl-omni
+   pip install -e .


I think we should install vllm and vllm-omni before this.
The correct installtion order should be vllm -> vllm-omni -> verl -> verl-omni (after #3 )

Updated, per communication with @zhtmike . No need for a separate verl installment

For CI: vllm -> vllm-omni -> verl (main) -> verl-omni (main)
For Users: vllm -> vllm-omni -> verl-omni (certain version e.g., 0.8.0, then it will install verl==0.8.0 as dependency automatically )

And I think vllm -> vllm-omni is an annoying bug, should be fixed in vllm-omni side

why not pip install pip install git+https://xxxx @AndyZhou952 ?

why not pip install pip install git+https://xxxx @AndyZhou952 ?

Good catch, just made the update

Update 0423: Temporary in the doc we instruct the user to install via the following order vllm (0.18) -> vllm-omni (0.18) -> verl (latest commit branch here) -> verl-omni (main).

SamitHuang · 2026-04-22T16:38:11Z

 reward_path=examples/flowgrpo_trainer/reward_fn.py
-reward_model_name=$HOME/models/Qwen/Qwen3-VL-8B-Instruct
+# Can also be an HF Hub model ID, e.g. "Qwen/Qwen3-VL-8B-Instruct"
+reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct}


Suggested change

reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct}

reward_model_name=Qwen/Qwen3-VL-8B-Instruct

it's more common and cleaner. users can modify it to custom folder if they want

If doing this way, I think the training model can be load from huggingface cache also, just set use_shm=False ?

For the training model you mean Qwen-Image? I think based on the current logic need to set use_shm=True, see https://github.com/verl-project/verl/blob/main/verl/utils/fs.py#L214 from DiffusionModelConfig.__post_init__

use use_shm=True, then?

0423 update: set reward_model_name=Qwen/Qwen3-VL-8B-Instruct and updated the READMEs.

SamitHuang · 2026-04-23T02:55:22Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces comprehensive documentation and example scripts for FlowGRPO training within the verl-omni framework. Key additions include a full Sphinx-based documentation suite, a quickstart guide for OCR-style image generation tasks, and example scripts for both synchronous and asynchronous reward processing. The review feedback correctly identifies a module naming inconsistency in the asynchronous training script and outdated command-line arguments in the README that do not match the updated data preprocessing script.

gemini-code-assist · 2026-04-23T02:57:13Z

+REWARD_TP=1
+
+
+python3 -m verl.trainer.main_flowgrpo \


The module name in the python3 -m command is incorrect. It should be verl_omni.trainer.main_flowgrpo to match the package structure defined in this repository and used in the synchronous example script (run_qwen_image_ocr_lora.sh).

Suggested change

python3 -m verl.trainer.main_flowgrpo \

python3 -m verl_omni.trainer.main_flowgrpo \

updated & updated in docs

gemini-code-assist · 2026-04-23T02:57:13Z

+python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \
+  --local_dataset_path $HOME/data/ocr \
+  --local_save_dir $HOME/data/ocr


The example command for data preprocessing uses argument names (--local_dataset_path, --local_save_dir) that were renamed to --input_dir and --output_dir in the qwenimage_ocr.py script within this same PR. Using the old names will result in an error.

Suggested change

python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \

--local_dataset_path $HOME/data/ocr \

--local_save_dir $HOME/data/ocr

python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \

--input_dir $HOME/data/ocr \

--output_dir $HOME/data/ocr

updated & updated from $HOME to $WORKSPACE in the README.md as well

[doc] feat: add FlowGRPO algorithm doc, quickstart, and example README

97fe76d

gemini-code-assist Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/flowgrpo_trainer/README.md

Comment thread examples/flowgrpo_trainer/README.md

zhtmike reviewed Apr 22, 2026

View reviewed changes

Comment thread docs/algo/flowgrpo.md

AndyZhou952 added 2 commits April 22, 2026 08:02

render doc, workflow

ae5dbb1

fix license

12fdc83

SamitHuang reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/flowgrpo_trainer/README.md Outdated

AndyZhou952 added 2 commits April 22, 2026 09:35

address comments

ff4313f

linting

ffed006

SamitHuang reviewed Apr 22, 2026

View reviewed changes

Comment thread docs/start/flowgrpo_quickstart.rst Outdated

SamitHuang reviewed Apr 22, 2026

View reviewed changes

AndyZhou952 added 2 commits April 22, 2026 09:58

.md update install instruction, add OCR description

db470f5

fix linting

17a5354

zhtmike reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh

consistent script for async, update installation guideline

bdec6bb

zhtmike reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora.sh Outdated

Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh Outdated

zhtmike reviewed Apr 22, 2026

View reviewed changes

Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh Outdated

SamitHuang reviewed Apr 22, 2026

View reviewed changes

AndyZhou952 and others added 5 commits April 23, 2026 01:47

update scripts (ROLLOUT_TP, clear reward model)

72fd380

update the readme based on the updated installation guide and script

951fe91

Merge branch 'main' into init-doc

c2981da

MODEL_PATH, update docs

6cc7a2d

uniformly update the naming from verl-omni to VeRL-Omni

41053f8

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

AndyZhou952 added 2 commits April 23, 2026 03:01

update readme to match the changes in scripts

264dc90

update docs to use verl_omni instead of verl

8163a5c

SamitHuang merged commit b40bb74 into main Apr 23, 2026
5 checks passed

zhtmike deleted the init-doc branch April 23, 2026 03:09

SamitHuang mentioned this pull request May 14, 2026

[rollout, omni] feat: enable batched B=N diffusion rollout in FlowGRPO #83

Closed

4 tasks


		2. Denoising Reduction: Training on all denoising steps is expensive. Flow-GRPO reduces the number of training steps while keeping the original number of inference steps, significantly improving sampling efficiency without sacrificing reward performance.

		Empirically, RL-tuned SD3.5-M with Flow-GRPO raises GenEval accuracy from 63% to 95% and visual text rendering accuracy from 59% to 92%.

	reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct}
	reward_model_name=Qwen/Qwen3-VL-8B-Instruct

	python3 -m verl.trainer.main_flowgrpo \
	python3 -m verl_omni.trainer.main_flowgrpo \

Conversation

AndyZhou952 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

zhtmike commented Apr 22, 2026

Uh oh!

Uh oh!

AndyZhou952 commented Apr 22, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SamitHuang Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhtmike Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhtmike Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AndyZhou952 Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SamitHuang commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

AndyZhou952 commented Apr 22, 2026 •

edited

Loading

SamitHuang Apr 22, 2026 •

edited

Loading

zhtmike Apr 22, 2026 •

edited

Loading

zhtmike Apr 23, 2026 •

edited

Loading

AndyZhou952 Apr 23, 2026 •

edited

Loading