[diffusion, rollout, trainer] feat: add BAGEL FlowGRPO support by timzsu · Pull Request #66 · verl-project/verl-omni

timzsu · 2026-05-10T07:42:12Z

What does this PR do?

Ports BAGEL FlowGRPO support from verl-project/verl#5947 into verl-omni and aligns the integration with the existing Qwen image FlowGRPO path.

This PR adds BAGEL-specific pipeline adapters, rollout tests, example FlowGRPO training config/scripts, and shared diffusion rollout/training plumbing for models that do not use the Qwen image prompt-embedding path.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, vllm_omni, rollout, trainer, ci, training_utils, recipe, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward, diffusion, omni, tests, docker
- If this PR involves multiple modules, separate them with , like [diffusion, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][diffusion, fsdp] feat: new rollout scheduler

Test

GPU rollout tests:

python -m pytest \
  tests/workers/rollout/rollout_vllm/test_vllm_omni_bagel_generate.py \
  tests/workers/rollout/rollout_vllm/test_vllm_omni_generate.py::test_generate

Result:

5 passed, 25 warnings in 43.40s

API and Usage Example

Design & Code Changes

Add verl_omni.pipelines.bagel_flow_grpo with BAGEL model loading, rollout, and diffusers training adapters.
Register BAGEL pipeline exports alongside the existing Qwen image FlowGRPO pipeline.
Add BAGEL FlowGRPO example config, reward function, training script, and local smoke script under examples/flowgrpo_trainer.
Extend shared diffusion model/rollout paths so BAGEL can provide model-specific inputs and outputs without breaking Qwen image behavior.
Update FSDP diffusers engine input preparation to support optional prompt embeddings while preserving the newer Ulysses sequence-parallel padding path from current main.
Add rollout coverage for BAGEL generation, scheduler behavior, LoRA generation, and the existing Qwen generate path.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...

Some training diagram

gemini-code-assist

Code Review

This pull request introduces support for the BAGEL (Mixture-of-Thought) model within the FlowGRPO training pipeline. Key additions include the BagelForTraining module, corresponding training and rollout adapters, and example configurations and scripts for OCR-based reward training. The PR also integrates a global profiling system into the diffusion trainer and updates various components to handle multi-stage model configurations and renamed prompt parameters. Feedback focuses on improving the efficiency of the Bagel forward pass by avoiding per-sample loops, ensuring correct handling of multiple samples per prompt in trajectory metadata, avoiding hardcoded token IDs, and optimizing network session management in the reward function.

zhtmike

Thank you for your PR!
Looks good with few comments.

Btw can you show the reward curve in your PR descriptions

zhtmike · 2026-05-10T12:11:12Z

@knlnguyen1802 Please take a look of vllm-omni related change. Thanks!

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

* fsdp/diffusers_impl: extract registry-based custom model loading into ``_build_module_from_registry`` helper; ``_build_module`` now simply delegates to it and falls back to ``AutoModel`` when no custom loader is registered. * vllm_rollout/utils: drop the ``VERL_OMNI_ENABLE_WORKER_DEATH_SIGNAL`` env gate and always call ``set_death_signal()`` (restores the original upstream behavior). Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

- Drop examples/flowgrpo_trainer/reward_fn.py: superseded by verl_omni/utils/reward_score/genrm_ocr.py. - Drop examples/flowgrpo_trainer/test_bagel_train.py: private FSDP + CFG smoke test, not maintained as a recipe. - run_bagel_flowgrpo.sh: point reward_path at the new genrm_ocr.py. - examples/flowgrpo_trainer/README.md: add a "BAGEL recipe" section describing prerequisites, launch command and what differs from the Qwen-Image recipe. - tests/.../test_vllm_omni_bagel_generate.py: collapse the three non-LoRA tests (test_generate / test_generate_with_logprobs / test_generate_concurrent) into one concurrent SDE+logprobs test. The LoRA test is kept separate since it exercises a distinct adapter-loading code path. - workers/engine/fsdp/diffusers_impl.py::_build_module_from_registry: add a docstring warning that hooks (attention processors, gradient-checkpointing, LoRA, dtype upcast) may be partially effective or silently inactive on custom-loaded modules, plus a TODO to migrate registered architectures into a first-class training engine and drop this escape hatch. Emit a runtime warning log line when a custom loader is taken. Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Conflicts: - examples/flowgrpo_trainer/README.md: kept upstream's new Ulysses-SP and full-weight Qwen-Image variant blurbs together with our BAGEL recipe section. Additional fix: - verl_omni/pipelines/bagel_flow_grpo/diffusers_training_adapter.py: ``forward_and_sample_previous_step`` now returns the new 4-tuple ``(log_prob, prev_sample_mean, std_dev_t, sqrt_dt)`` to match the GRPO-Guard plumbing introduced upstream in verl-project#48 (BAGEL still trains with ``loss_mode=flow_grpo`` so ``sqrt_dt`` is unused, but the engine layer now unpacks 4-tuples unconditionally). Co-authored-by: GitHub Copilot Signed-off-by: princepride <wangzhipeng628@gmail.com>

- examples/flowgrpo_trainer/run_bagel_flowgrpo_local.sh: removed. Personal workstation launcher (hard-coded /proj-tango-pvc paths, debug-only env vars). Kept locally via .git/info/exclude, the same way test_bagel_train.py is handled per reviewer feedback. - .pre-commit-config.yaml: reverted; adding ``.venv`` to the check-naming-conventions grep is unrelated to BAGEL FlowGRPO. - tests/workers/rollout/rollout_vllm/test_vllm_omni_generate.py: reverted; switching the Qwen-Image fixture to ``scope='module'`` is an unrelated test optimization. Per AGENTS.md "No low-value busywork PRs" — these mechanical/personal changes should land in their own PR if needed. Co-authored-by: GitHub Copilot Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride · 2026-05-13T12:22:09Z

Two coupled changes, both required to keep Qwen-Image working alongside BAGEL:

Rename prompt_ids → prompt_token_ids. vllm-omni 0.20+'s OmniCustomPrompt standardizes on prompt_token_ids (matching vLLM's TokensPrompt). The server-side patch writes that key now; without the matching rename here, custom_prompt.get("prompt_ids", ...) is None and forward() silently falls into the warmup/dummy branch, returning an empty DiffusionOutput — Qwen-Image rollout would degrade to empty batches without raising.

Move the [0] batch-dim squeeze from server to pipeline. BAGEL is multi-stage and its custom_output isn't shaped [1, ...], so the server can't blindly index [0]. New contract: each pipeline returns per-sample tensors; the server passes them through. Net shape for Qwen-Image consumers is unchanged.

princepride · 2026-05-13T12:26:24Z

@zhtmike @SamitHuang PTAL

zhtmike

looks good, with few small suggestions
And one question for moficatiion on ‎verl_omni/trainer/diffusion/ray_diffusion_trainer.py.

zhtmike · 2026-05-13T12:33:35Z


        extra_fields["raw_prompt"] = kwargs["raw_prompt"]

+        # ``return_attention_mask=True`` is required by token-aware adapters (e.g. BAGEL).


Suggested change

# ``return_attention_mask=True`` is required by token-aware adapters (e.g. BAGEL).

zhtmike · 2026-05-13T12:34:54Z

-    def _compute_reward_colocate(self, batch: DataProto) -> tuple[torch.Tensor, dict[str, Any]] | torch.Tensor:
-        """
-        compute reward use colocate reward model
+    def _compute_reward_colocate(self, batch: DataProto) -> DataProto:
+        """Compute per-sample diffusion reward via the colocated reward loop.
+
+        Bypasses ``RewardLoopManager.compute_rm_score`` (LLM-only: assumes
+        ``responses`` has a token axis and reads ``attention_mask``) and
+        assembles a ``[B, 1]`` ``rm_scores`` tensor directly.
        """
        assert self.reward_loop_manager is not None, "RewardLoopManager is None"
-        batch_reward = self.reward_loop_manager.compute_rm_score(batch)
-        return batch_reward
+        manager = self.reward_loop_manager
+
+        if manager.reward_model_manager is not None:
+            manager.reward_model_manager.wake_up()
+
+        chunks = batch.chunk(len(manager.reward_loop_workers))
+        outputs = ray.get(
+            [
+                worker.compute_score_batch.remote(chunk)
+                for worker, chunk in zip(manager.reward_loop_workers, chunks, strict=True)
+            ]
+        )
+        outputs_flat = [item for sublist in outputs for item in sublist]
+
+        scores = [item["reward_score"] for item in outputs_flat]
+        rm_scores = torch.tensor(scores, dtype=torch.float32).unsqueeze(-1)
+        reward_batch = TensorDict({"rm_scores": rm_scores}, batch_size=len(batch))
+
+        reward_extra_infos = [output.get("reward_extra_info", {}) for output in outputs_flat]
+        reward_extra_keys = list(reward_extra_infos[0].keys()) if reward_extra_infos else []
+        non_tensor_batch = {
+            key: np.array([info[key] for info in reward_extra_infos]) for key in reward_extra_keys
+        }
+
+        if manager.reward_model_manager is not None:
+            manager.reward_model_manager.sleep()
+
+        return DataProto(
+            batch=reward_batch,
+            non_tensor_batch=non_tensor_batch,
+            meta_info={"reward_extra_keys": reward_extra_keys},
+        )


what happens here?

zhtmike · 2026-05-13T12:35:59Z

+        (attention processors, gradient checkpointing, LoRA, dtype upcast)
+        may be silently inactive on the returned module.
+
+        TODO: drop this function once the model is integrated into a


Suggested change

TODO: drop this function once the model is integrated into a

# TODO (princepride): drop this function once the model is integrated into a

zhtmike · 2026-05-13T12:36:27Z

    def __new__(cls, **kwargs):
-        set_death_signal()
-
+        # Do NOT call verl's ``set_death_signal``: ``PR_SET_PDEATHSIG`` is


@knlnguyen1802 please take a look

This is fixed in vllm-omni main branch

zhtmike · 2026-05-13T12:37:25Z

+    def _preprocess_engine_kwargs(self, engine_kwargs: dict) -> None:
+        # No-op: ``deploy_config`` is a vllm-omni CLI flag and must reach the parser.
+        return


Suggested change

def _preprocess_engine_kwargs(self, engine_kwargs: dict) -> None:

# No-op: ``deploy_config`` is a vllm-omni CLI flag and must reach the parser.

return

SamitHuang · 2026-05-13T14:11:39Z

it's better to report the reference performance in the Performance Reference doc

SamitHuang · 2026-05-13T14:14:44Z

+- Passes the deploy-config YAML to vllm-omni via
+  `+actor_rollout_ref.rollout.engine_kwargs.vllm_omni.deploy_config`. The
+  legacy `stage_configs_path` entrypoint is **not** supported: it routes
+  through vllm-omni 0.20's deprecated stage-args loader, which silently


Should we update vllm-omni version pin for 0.20 in the installation doc?

Should we update vllm-omni version pin for 0.20 in the installation doc?

Let us do it in separate PR

knlnguyen1802 · 2026-05-12T03:21:16Z

+def _to_token_list(token_ids: Any) -> list[int] | None:
+    if token_ids is None:
+        return None
+    if isinstance(token_ids, torch.Tensor):
+        token_ids = token_ids.detach().cpu().tolist()
+    if token_ids and isinstance(token_ids[0], list):
+        token_ids = token_ids[0]
+    return [int(token_id) for token_id in token_ids]
+
+
+def _extract_prompt_text(decoded: str) -> str:
+    if "<|im_start|>" in decoded:
+        user_chunks = []
+        for segment in decoded.split("<|im_start|>"):
+            if not segment.startswith("user"):
+                continue
+            content = segment[len("user") :].lstrip("\n")
+            content = content.split("<|im_end|>", 1)[0]
+            user_chunks.append(content)
+        if user_chunks:
+            decoded = user_chunks[-1]
+
+    for marker in _CHAT_MARKERS:
+        decoded = decoded.replace(marker, "")
+    return decoded.replace("<|im_start|>", "").replace("<|im_end|>", "").strip()
+
+
+def _to_cpu_tensor(v):
+    """Convert to a single CPU tensor, stacking a list of tensors if needed."""
+    if isinstance(v, torch.Tensor):
+        return v.detach().cpu()
+    if isinstance(v, list):
+        tensors = [x.detach().cpu() if isinstance(x, torch.Tensor) else torch.tensor(x) for x in v]
+        return torch.stack(tensors) if tensors else None
+    return v


Please move this into a utils.py file

knlnguyen1802 · 2026-05-15T10:07:02Z

    """

    def __new__(cls, **kwargs):
-        set_death_signal()


It is not necessary to remove this anymore since the bug is fixed on vllm-omni. If it is for stable run with vllm-omni 0.20.0 please leave it as TODO to add it back later

Port BAGEL FlowGRPO rollout

e3f8dbf

timzsu requested review from SamitHuang and zhtmike as code owners May 10, 2026 07:42

gemini-code-assist Bot reviewed May 10, 2026

View reviewed changes

Comment thread verl_omni/pipelines/bagel_flow_grpo/bagel_model.py Outdated

Comment thread verl_omni/pipelines/qwen_image_flow_grpo/vllm_omni_rollout_adapter.py

Comment thread verl_omni/pipelines/bagel_flow_grpo/bagel_model.py

Comment thread examples/flowgrpo_trainer/reward_fn.py Outdated

zhtmike reviewed May 10, 2026

View reviewed changes

zhtmike requested a review from knlnguyen1802 May 10, 2026 12:11

princepride and others added 10 commits May 11, 2026 09:44

remove profiler related code

6e30553

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

xxx

dfd504f

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

remove duplicate blank line

72b7d23

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

Merge branch 'main' into port-pr5947-bagel

4b04e63

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

add algorithm in decorate

e23ed42

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

xxx

4e30fd0

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

xxx

91811ee

Signed-off-by: Wang, Zhipeng | RASIA <zhipeng.wang@rakuten.com>

fix some bug

196f0f7

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

e02f22c

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride force-pushed the port-pr5947-bagel branch from 65cd845 to e02f22c Compare May 12, 2026 02:44

zhtmike reviewed May 12, 2026

View reviewed changes

Comment thread verl_omni/workers/engine/fsdp/diffusers_impl.py

fix some bug

8418b8d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

SamitHuang mentioned this pull request May 12, 2026

[RFC] v0.1 Release Tracker #47

Open

27 tasks

princepride added 5 commits May 13, 2026 06:51

fix some bug

3be3e8f

Signed-off-by: princepride <wangzhipeng628@gmail.com>

simplify func comments

8b9457d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

d8fbb14

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride reviewed May 13, 2026

View reviewed changes

zhtmike reviewed May 13, 2026

View reviewed changes

SamitHuang reviewed May 13, 2026

View reviewed changes

knlnguyen1802 reviewed May 13, 2026

View reviewed changes

knlnguyen1802 reviewed May 15, 2026

View reviewed changes

SamitHuang mentioned this pull request May 16, 2026

[Bugfix] Enable step-wise execution #81

Open

6 tasks


		extra_fields["raw_prompt"] = kwargs["raw_prompt"]

		# ``return_attention_mask=True`` is required by token-aware adapters (e.g. BAGEL).

	TODO: drop this function once the model is integrated into a
	# TODO (princepride): drop this function once the model is integrated into a

	def _preprocess_engine_kwargs(self, engine_kwargs: dict) -> None:
	# No-op: ``deploy_config`` is a vllm-omni CLI flag and must reach the parser.
	return

Conversation

timzsu commented May 10, 2026 • edited by princepride Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Some training diagram

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhtmike left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zhtmike commented May 10, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

princepride commented May 13, 2026

Uh oh!

zhtmike left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

timzsu commented May 10, 2026 •

edited by princepride

Loading