Skip to content

feat: support qwen-omni grpo training recipe#2073

Open
yuekaizhang wants to merge 8 commits intoNVIDIA-NeMo:mainfrom
yuekaizhang:qwen_omni
Open

feat: support qwen-omni grpo training recipe#2073
yuekaizhang wants to merge 8 commits intoNVIDIA-NeMo:mainfrom
yuekaizhang:qwen_omni

Conversation

@yuekaizhang
Copy link
Copy Markdown

@yuekaizhang yuekaizhang commented Mar 6, 2026

Conditional PR: NVIDIA-NeMo/Megatron-Bridge#2634, NVIDIA-NeMo/Megatron-Bridge#2342

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features
    • Added audio support for multimodal environments and data processing pipelines.
    • Introduced AISHELL dataset for automatic speech recognition training.
    • Introduced AVQA dataset for audio question-answering fine-tuning.
    • Added example configurations for audio GRPO and audio language model training with Megatron backend.
    • Enhanced multimodal content handling to process audio alongside images and videos.

Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang yuekaizhang requested review from a team as code owners March 6, 2026 04:41
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 6, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Walkthrough

Adds audio training support by introducing AISHELL and AVQA dataset wrappers with audio preprocessing, audio-enabled configuration files for GRPO and SFT training, and extends multimodal data handling across collation, processing, and generation pipelines to support audio modality alongside images and text.

Changes

Cohort / File(s) Summary
Audio Dataset Implementations
nemo_rl/data/datasets/response_datasets/aishell.py, nemo_rl/data/datasets/response_datasets/avqa.py, nemo_rl/data/datasets/response_datasets/__init__.py
Adds AishellDataset and AVQADataset classes with audio resampling, question parsing, and OpenAI-style message formatting. Registers both datasets in DATASET_REGISTRY and exports them via all.
Audio Training Configurations
examples/configs/audio_grpo_3B_megatron.yaml, examples/configs/sft_audio_lm_megatron.yaml, examples/configs/sft_openmathinstruct2.yaml
Introduces comprehensive GRPO and SFT configuration files for audio-based training with Megatron backend (Qwen2.5Omni and Qwen2-Audio), plus minor processor specification update to OpenMathInstruct config.
Audio Data Pipeline
nemo_rl/data/collate_fn.py, nemo_rl/data/processors.py, nemo_rl/data/multimodal_utils.py
Extends collation and processing logic to collect and forward vllm_audios; adds audio content handling in vlm_hf_data_processor alongside images/text; includes processor.model_input_names in multimodal key aggregation.
Audio in Generation & Rollouts
nemo_rl/experience/rollouts.py, nemo_rl/models/generation/vllm/utils.py
Propagates vllm_audios through rollout generation and generalizes vLLM multimodal data handling to support both images and audios in a unified multi_modal_data dictionary.
Infrastructure & Utilities
nemo_rl/environments/utils.py, nemo_rl/models/megatron/setup.py, nemo_rl/utils/logger.py, examples/prompts/avqa_cot.txt
Registers "avqa" environment in ENV_REGISTRY; adds VLM wrapper unwrapping for thinker module access in MoE router setup; improves numpy array serialization in JSONL logging; adds empty AVQA prompt template file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

Possibly related PRs

  • PR #2016 — Modifies the same multimodal data-loading and vLLM audio handling codepaths (processors, multimodal_utils, vLLM generation).
  • PR #1649 — Refactors dataset registry and loader interfaces in response_datasets, directly affected by new dataset registrations in this PR.
  • PR #1334 — Both modify vLLM integration code for multimodal handling (generation/vllm modules).

Suggested labels

CI:L1

Suggested reviewers

  • yuki-97
  • terrykong
  • cuichenx
🚥 Pre-merge checks | ✅ 2 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR contains major changes (~666 lines) with new datasets and audio processing, but lacks experiment results, logs, and documentation despite being marked WIP with incomplete TODOs. Complete comprehensive testing of new datasets and training recipes, document test results in PR description, attach experiment logs as planned, and fix identified bugs before merging.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main purpose of the pull request: adding support for Qwen-Omni GRPO training recipe with new audio datasets, configurations, and processors.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (3)
examples/prompts/avqa_cot.txt (1)

1-1: Clarify the intent of the empty prompt template.

The file contains only {} which provides no prompt formatting. If this is intentional (e.g., AVQA dataset already contains formatted prompts), consider adding a comment explaining this. If it's a placeholder, the TODO in the PR checklist should track completing it.

📝 Proposed documentation
-{}
+{
+  // Empty template: AVQA dataset messages are pre-formatted.
+  // The user message content is passed through without additional prompt wrapping.
+}

Or if JSON comments aren't supported, create a companion README or use the prompt file itself:

-{}
+{question}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/prompts/avqa_cot.txt` at line 1, The file
examples/prompts/avqa_cot.txt currently contains only "{}", which is ambiguous;
update the file to clarify intent by either replacing "{}" with the intended
prompt template for AVQA chain-of-thought (or a clear placeholder template) or
add a top-line comment explaining that "{}" is intentional because prompts are
provided externally by the AVQA dataset and link to the dataset/source; if this
is a temporary placeholder, add a TODO with an issue/PR reference in the file
(or create a companion README) to indicate who will complete the template and
when.
nemo_rl/models/megatron/setup.py (1)

696-700: Consider adding thinker unwrapping to MoEFloat16Module.re_enable_float32_expert_bias() for consistency.

The freeze_moe_router function now unwraps models with a thinker attribute (line 696-697) before accessing language_model. However, MoEFloat16Module.re_enable_float32_expert_bias() (lines 1051-1054) only checks for language_model:

# Line 1051-1054
if hasattr(module, "language_model"):
    module = module.language_model

If this wrapper is used with Qwen2.5-Omni models, it may fail to properly access the decoder layers.

♻️ Proposed fix for consistency
 def re_enable_float32_expert_bias(self) -> None:
     ...
     module = self.module
+    # Handle VLM models where thinker wraps the language model
+    if hasattr(module, "thinker"):
+        module = module.thinker
     # Handle VLM models where language model is nested
     if hasattr(module, "language_model"):
         module = module.language_model
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/megatron/setup.py` around lines 696 - 700, The method
MoEFloat16Module.re_enable_float32_expert_bias currently only unwraps modules
via the language_model attribute but freeze_moe_router also unwraps a thinker
wrapper first; update re_enable_float32_expert_bias to mirror that logic by
checking hasattr(module, "thinker") and setting module = module.thinker before
the existing hasattr(module, "language_model") unwrap so it reliably reaches
module.decoder.layers for wrapped models (e.g., Qwen2.5-Omni).
nemo_rl/data/datasets/response_datasets/avqa.py (1)

103-107: Verify that list rendering for choices is intentional.

_parse_question returns choices as a list (e.g., ["3", "One", "4", "2"]), and DEFAULT_TEMPLATE.format(choices=choices) will render it as "['3', 'One', '4', '2']" in the prompt. This might produce awkward prompts like:

"How many animals...? Please choose from: ['3', 'One', '4', '2']."

Consider formatting choices explicitly:

Suggested fix
+        choices_str = ", ".join(choices) if choices else ""
-        prompt_text = DEFAULT_TEMPLATE.format(question=question, choices=choices)
+        prompt_text = DEFAULT_TEMPLATE.format(question=question, choices=choices_str)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/data/datasets/response_datasets/avqa.py` around lines 103 - 107, The
prompt currently inserts the raw list returned by _parse_question into
DEFAULT_TEMPLATE, producing Python-list style output (e.g., "['3','One',...]");
before formatting the template convert choices into a human-friendly string
(e.g., choices_str = ", ".join(choices) or another desired separator/labeling)
and use that string when building prompt_text (i.e., pass choices=choices_str to
DEFAULT_TEMPLATE.format), keeping the rest of the logic (question replacement
and prompt_text creation) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/configs/audio_grpo_3B_megatron.yaml`:
- Around line 63-66: The config hardcodes a local path for policy.model_name
which is user-specific; update policy.model_name to a HuggingFace model
identifier or a clear placeholder (e.g., "qwen/qwen-2.5-omni" or
"<HF_MODEL_ID>") and ensure tokenizer.name references the same identifier
(tokenizer.name: ${policy.model_name}) so others can run the example without the
local filesystem path.
- Line 140: The YAML sets converter_type: Qwen2_5OmniForConditionalGeneration
which is unsupported by Megatron-Bridge; update the converter_type entry to a
supported converter (e.g., Qwen2, Qwen2.5, Qwen2.5-VL, or a Qwen3 variant) or
remove the converter_type line and wire in a custom bridge implementation if
Omni (audio/video/speech) support is required; look for the converter_type key
in the file and replace Qwen2_5OmniForConditionalGeneration with the appropriate
supported converter name or add a note to implement a custom Megatron-Bridge
converter for Omni models.

In `@examples/configs/sft_audio_lm_megatron.yaml`:
- Around line 24-26: The config's policy.model_name is set to a user-local path
(/workspace_yuekai/HF/Qwen2-Audio-7B); replace it with a reproducible
HuggingFace model identifier or a clear placeholder (e.g., "Qwen2-Audio-7B" or
"<HF_MODEL_ID>") so other users can run the example, and ensure the
corresponding tokenizer field under policy is set to a matching tokenizer ID or
placeholder as well.

In `@nemo_rl/data/datasets/response_datasets/aishell.py`:
- Line 42: The load_dataset invocation in the constructor incorrectly hardcodes
split="test" and passes the validated split as a positional arg, causing the
user-provided split to be ignored; update the load_dataset call referenced by
self.dataset to use the split variable (e.g., pass split as the keyword
split=split or as the single positional split) and remove the hardcoded
split="test" so the requested split parameter is honored.
- Line 33: vlm_hf_data_processor is missing a handler for task_name "aishell",
causing a ValueError; update the dispatcher in vlm_hf_data_processor (in
nemo_rl/data/processors.py) to add a branch for task_name == "aishell" that
mirrors the AVQA pass-through behavior (i.e., return the input examples/records
unchanged or call the same helper used by AVQA), referencing the task_name
"aishell" string and the vlm_hf_data_processor function name so the aishell
dataset in nemo_rl/data/datasets/response_datasets/aishell.py is processed
without error.

In `@nemo_rl/data/datasets/response_datasets/avqa.py`:
- Line 84: Replace the hardcoded path passed to load_dataset with a configurable
parameter: accept a data_path (or dataset_id) from the constructor kwargs or
config, default to a public HuggingFace dataset identifier if not provided, and
use that value when calling load_dataset to set self.dataset; update the
constructor signature and any callers to forward data_path and ensure the code
uses load_dataset(data_path_or_id, split=split) instead of the
developer-specific "/workspace_yuekai/HF/avqa-processed".

---

Nitpick comments:
In `@examples/prompts/avqa_cot.txt`:
- Line 1: The file examples/prompts/avqa_cot.txt currently contains only "{}",
which is ambiguous; update the file to clarify intent by either replacing "{}"
with the intended prompt template for AVQA chain-of-thought (or a clear
placeholder template) or add a top-line comment explaining that "{}" is
intentional because prompts are provided externally by the AVQA dataset and link
to the dataset/source; if this is a temporary placeholder, add a TODO with an
issue/PR reference in the file (or create a companion README) to indicate who
will complete the template and when.

In `@nemo_rl/data/datasets/response_datasets/avqa.py`:
- Around line 103-107: The prompt currently inserts the raw list returned by
_parse_question into DEFAULT_TEMPLATE, producing Python-list style output (e.g.,
"['3','One',...]"); before formatting the template convert choices into a
human-friendly string (e.g., choices_str = ", ".join(choices) or another desired
separator/labeling) and use that string when building prompt_text (i.e., pass
choices=choices_str to DEFAULT_TEMPLATE.format), keeping the rest of the logic
(question replacement and prompt_text creation) unchanged.

In `@nemo_rl/models/megatron/setup.py`:
- Around line 696-700: The method MoEFloat16Module.re_enable_float32_expert_bias
currently only unwraps modules via the language_model attribute but
freeze_moe_router also unwraps a thinker wrapper first; update
re_enable_float32_expert_bias to mirror that logic by checking hasattr(module,
"thinker") and setting module = module.thinker before the existing
hasattr(module, "language_model") unwrap so it reliably reaches
module.decoder.layers for wrapped models (e.g., Qwen2.5-Omni).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b678d307-04c5-4bc1-8c8e-4d1cd5f8e056

📥 Commits

Reviewing files that changed from the base of the PR and between c4f8e1c and ad1c0b6.

📒 Files selected for processing (15)
  • examples/configs/audio_grpo_3B_megatron.yaml
  • examples/configs/sft_audio_lm_megatron.yaml
  • examples/configs/sft_openmathinstruct2.yaml
  • examples/prompts/avqa_cot.txt
  • nemo_rl/data/collate_fn.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/response_datasets/aishell.py
  • nemo_rl/data/datasets/response_datasets/avqa.py
  • nemo_rl/data/multimodal_utils.py
  • nemo_rl/data/processors.py
  • nemo_rl/environments/utils.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/models/generation/vllm/utils.py
  • nemo_rl/models/megatron/setup.py
  • nemo_rl/utils/logger.py

Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang yuekaizhang changed the title [WIP] support qwen-omni grpo training recipe feat: support qwen-omni grpo training recipe Mar 10, 2026
@yuekaizhang
Copy link
Copy Markdown
Author

yuekaizhang commented Mar 10, 2026

@snowmanwwg Hi, I was wondering if you know someone could help review the PR, many thanks.

I have verified the PR with the below training results:

Model MMAU (v05.15.25)
Qwen2.5-Omni-3B 69.8
+ HF GRPO 71.6
+ Nemo-RL GRPO (This PR) 72.1

Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang yuekaizhang requested a review from a team as a code owner March 24, 2026 07:21
Signed-off-by: root <zhangyuekai@foxmail.com>
Signed-off-by: root <zhangyuekai@foxmail.com>
@yuekaizhang yuekaizhang requested a review from a team as a code owner March 24, 2026 08:05
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 24, 2026
@yuekaizhang
Copy link
Copy Markdown
Author

Hi @snowmanwwg @sharonyu-115

I have added some details about training and evaluation. You can find them at https://github.com/yuekaizhang/RL/blob/qwen_omni/docs/guides/grpo-audio.md, which can serve as an entry point for code review.

Copy link
Copy Markdown
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuekaizhang thanks for supporting this! left some comments, and could you add some tests to guard the new features?

  1. some unit tests for newly added datasets.
  2. functional test (e2e run), you can refer to tests/functional/vlm_grpo.sh and tests/functional/eval.sh respectively, and add them at tests/functional/L1_Functional_Tests_GPU.sh.

task_data_processors=base_dataset.processor,
max_seq_length=data_config["max_input_seq_length"],
)
env = VLMEnvironment.options(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt to use create_env to create env?
you can refer to

RL/nemo_rl/data/utils.py

Lines 81 to 87 in b0da493

env_name_list = extract_necessary_env_names(data_config)
envs = {}
for env_name in env_name_list:
registered_env_name = "vlm" if is_vlm else env_name
envs[env_name] = create_env(
env_name=registered_env_name, env_config=env_configs[env_name]
)

)
else:
# Original text-only path
env = MathEnvironment.options(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here create_env as well, and after changing to create_env, is it possible to remove the if/else for multi-modal and text-only?



def setup_data(tokenizer: AutoTokenizer, data_config, env_configs):
def _is_multimodal_dataset(dataset_name):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt moving this to nemo_rl/data/datasets/eval_datasets/__init__.py? and MULTIMODAL_DATASETS as well.

runtime_env={
"py_executable": get_actor_python_env(
"nemo_rl.environments.math_environment.MathEnvironment"
is_multimodal = isinstance(base_dataset, MMAUDataset)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using _is_multimodal_dataset, so that it won't be broken when adding new datasets.

Suggested change
is_multimodal = isinstance(base_dataset, MMAUDataset)
is_multimodal = _is_multimodal_dataset(data_config["dataset_name"])

)

self.preprocessor = self.format_data
self.val_dataset = None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.val_dataset is only useful when using split_validation_size.
you can refer to

RL/docs/guides/grpo.md

Lines 125 to 132 in b0da493

We support using a single dataset for both train and validation by using `split_validation_size` to set the validation ratio.
[OpenAssistant](../../nemo_rl/data/datasets/response_datasets/oasst.py), [OpenMathInstruct-2](../../nemo_rl/data/datasets/response_datasets/openmathinstruct2.py), [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py), [Tulu3SftMixtureDataset](../../nemo_rl/data/datasets/response_datasets/tulu3.py) are supported for this feature.
If you want to support this feature for your custom datasets or other built-in datasets, you can simply add the code to the dataset like [ResponseDataset](../../nemo_rl/data/datasets/response_datasets/response_dataset.py).
```python
# `self.val_dataset` is used (not None) only when current dataset is used for both training and validation
self.val_dataset = None
self.split_train_validation(split_validation_size, seed)
```

if you want to support split_validation_size in this dataset, you can follow the guide to add something like below. otherwise we can just remove self.val_dataset = None.

# `self.val_dataset` is used (not None) only when current dataset is used for both training and validation
self.val_dataset = None
self.split_train_validation(split_validation_size, seed)

datum_dict = format_geometry3k_dataset(datum_dict)
elif datum_dict["task_name"] == "avqa":
pass # AVQA data is already formatted by AVQADataset.format_data
elif datum_dict["task_name"] == "aishell":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see this dataset, is it not uploaded?

"vlm": {
"actor_class_fqn": "nemo_rl.environments.vlm_environment.VLMEnvironment",
},
"avqa": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use "vlm" and don't need to add this one if we change to create_env?

Comment on lines +358 to +361
content = [message["content"] for message in message_log]
content = "\n".join(content)
prompts.append(content)
prompts_for_display = prompts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are exactly the same as line 351~354.

I think we can just remove line 325327 and line 355361, and change line 332~333 to the below, then we'll have the same logic.

if is_multimodal and batch["vllm_content"][i] is not None:

Comment on lines +1005 to +1007
elif hasattr(value, "tolist"): # numpy arrays
sample[key] = value.tolist()
f.write(json.dumps({**sample, "idx": i}, default=str) + "\n")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use isinstance to narrow down?
and just curious what's default=str here for?

@@ -0,0 +1,259 @@
grpo:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you inherit this from the base config using defaults: "grpo_math_1B.yaml"?
so that if we add some feature later, we don't need to update this config

@yuki-97
Copy link
Copy Markdown
Contributor

yuki-97 commented Mar 25, 2026

@yfw can you take a review as well?

@yuki-97 yuki-97 requested a review from yfw March 25, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants