Skip to content

[doc] chore: supply documentation for flowgrpo training#2

Merged
SamitHuang merged 15 commits into
mainfrom
init-doc
Apr 23, 2026
Merged

[doc] chore: supply documentation for flowgrpo training#2
SamitHuang merged 15 commits into
mainfrom
init-doc

Conversation

@AndyZhou952
Copy link
Copy Markdown
Collaborator

@AndyZhou952 AndyZhou952 commented Apr 22, 2026

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

  • Add documentation and a quickstart guide for the diffusion FlowGRPO training.
  • Add async example training scripts and README for the Qwen-Image OCR task.
image

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, vllm_omni, rollout, trainer, ci, training_utils, recipe, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, diffusion, omni, tests, docker
    • If this PR involves multiple modules, separate them with , like [diffusion, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][diffusion, fsdp] feat: new rollout scheduler

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

NA

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

NA

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

  • Read the Contribute Guide.
  • Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
  • Add / Update the documentation.
  • Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for Flow-GRPO, a technique for online reinforcement learning in flow matching models like Stable Diffusion 3 and FLUX. The contribution includes detailed documentation on the algorithm and configuration, a quickstart guide for OCR tasks using Qwen-Image, and example scripts for both synchronous and asynchronous reward evaluation. Reviewer feedback suggests updating the documentation to accurately reflect GPU requirements for the asynchronous variant and clarifying that throughput metrics in the performance table are reported on a per-GPU basis.

Comment thread examples/flowgrpo_trainer/README.md
Comment thread examples/flowgrpo_trainer/README.md
@zhtmike
Copy link
Copy Markdown
Collaborator

zhtmike commented Apr 22, 2026

may add doc.yaml from verl

Comment thread docs/algo/flowgrpo.md
@AndyZhou952
Copy link
Copy Markdown
Collaborator Author

may add doc.yaml from verl

Added, also added .readthedocs.yaml etc. related to doc rendering. PTAL

Comment thread docs/algo/flowgrpo.md

2. **Denoising Reduction**: Training on all denoising steps is expensive. Flow-GRPO reduces the number of *training* steps while keeping the original number of *inference* steps, significantly improving sampling efficiency without sacrificing reward performance.

Empirically, RL-tuned SD3.5-M with Flow-GRPO raises GenEval accuracy from 63% to 95% and visual text rendering accuracy from 59% to 92%.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I drawed an algorithm figure. Let me attach later

Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment on lines +75 to +77
$HOME/models/Qwen/Qwen-Image
$HOME/models/Qwen/Qwen-Image/tokenizer
$HOME/models/Qwen/Qwen3-VL-8B-Instruct
Copy link
Copy Markdown
Collaborator

@SamitHuang SamitHuang Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to save the models manually to $HOME folder? Can we juse load pretraiend weights from HF cache folder? btw, for some docker env, the home folder has limited space, we should also change to $WORKSPACE

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to $WORKSPACE.

For the loading:
(1) reward model (i.e. Qwen3-VL-8B-Instruct): can be loaded from cache folder directly. The docs/scripts have been updated to reflect this.
(2) Qwen-Image model: a quick investigation indicated that an upstream function from verl (utils/fs.py copy_to_local) use_shm logic blocks the cache loading. Thus here we instruct the user to predownload the model for now

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but why we can just actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct as mentioned in https://verl.readthedocs.io/en/latest/start/quickstart.html ?

Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread examples/flowgrpo_trainer/README.md Outdated
Comment thread docs/start/flowgrpo_quickstart.rst Outdated
Comment thread docs/start/install.rst Outdated

git clone https://github.com/verl-project/verl-omni.git
cd verl-omni
pip install -e .
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should install vllm and vllm-omni before this.
The correct installtion order should be vllm -> vllm-omni -> verl -> verl-omni (after #3 )

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, per communication with @zhtmike . No need for a separate verl installment

Copy link
Copy Markdown
Collaborator

@zhtmike zhtmike Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For CI: vllm -> vllm-omni -> verl (main) -> verl-omni (main)
For Users: vllm -> vllm-omni -> verl-omni (certain version e.g., 0.8.0, then it will install verl==0.8.0 as dependency automatically )

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I think vllm -> vllm-omni is an annoying bug, should be fixed in vllm-omni side

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pip install pip install git+https://xxxx @AndyZhou952 ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not pip install pip install git+https://xxxx @AndyZhou952 ?

Good catch, just made the update

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update 0423: Temporary in the doc we instruct the user to install via the following order vllm (0.18) -> vllm-omni (0.18) -> verl (latest commit branch here) -> verl-omni (main).

Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh
Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora.sh Outdated
Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh Outdated
Comment thread examples/flowgrpo_trainer/run_qwen_image_ocr_lora_async_reward.sh Outdated
reward_path=examples/flowgrpo_trainer/reward_fn.py
reward_model_name=$HOME/models/Qwen/Qwen3-VL-8B-Instruct
# Can also be an HF Hub model ID, e.g. "Qwen/Qwen3-VL-8B-Instruct"
reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reward_model_name=${REWARD_MODEL_PATH:-$WORKSPACE/models/Qwen/Qwen3-VL-8B-Instruct}
reward_model_name=Qwen/Qwen3-VL-8B-Instruct

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more common and cleaner. users can modify it to custom folder if they want

Copy link
Copy Markdown
Collaborator

@zhtmike zhtmike Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doing this way, I think the training model can be load from huggingface cache also, just set use_shm=False ?

Copy link
Copy Markdown
Collaborator Author

@AndyZhou952 AndyZhou952 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the training model you mean Qwen-Image? I think based on the current logic need to set use_shm=True, see https://github.com/verl-project/verl/blob/main/verl/utils/fs.py#L214 from DiffusionModelConfig.__post_init__

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use use_shm=True, then?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0423 update: set reward_model_name=Qwen/Qwen3-VL-8B-Instruct and updated the READMEs.

@SamitHuang
Copy link
Copy Markdown
Collaborator

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive documentation and example scripts for FlowGRPO training within the verl-omni framework. Key additions include a full Sphinx-based documentation suite, a quickstart guide for OCR-style image generation tasks, and example scripts for both synchronous and asynchronous reward processing. The review feedback correctly identifies a module naming inconsistency in the asynchronous training script and outdated command-line arguments in the README that do not match the updated data preprocessing script.

REWARD_TP=1


python3 -m verl.trainer.main_flowgrpo \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The module name in the python3 -m command is incorrect. It should be verl_omni.trainer.main_flowgrpo to match the package structure defined in this repository and used in the synchronous example script (run_qwen_image_ocr_lora.sh).

Suggested change
python3 -m verl.trainer.main_flowgrpo \
python3 -m verl_omni.trainer.main_flowgrpo \

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated & updated in docs

Comment thread examples/flowgrpo_trainer/README.md Outdated
Comment on lines +38 to +40
python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \
--local_dataset_path $HOME/data/ocr \
--local_save_dir $HOME/data/ocr
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example command for data preprocessing uses argument names (--local_dataset_path, --local_save_dir) that were renamed to --input_dir and --output_dir in the qwenimage_ocr.py script within this same PR. Using the old names will result in an error.

Suggested change
python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \
--local_dataset_path $HOME/data/ocr \
--local_save_dir $HOME/data/ocr
python3 examples/flowgrpo_trainer/data_process/qwenimage_ocr.py \
--input_dir $HOME/data/ocr \
--output_dir $HOME/data/ocr

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated & updated from $HOME to $WORKSPACE in the README.md as well

@SamitHuang SamitHuang merged commit b40bb74 into main Apr 23, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants