Skip to content

feat: refactor common data utilities of dtensor policy v2#1710

Merged
yuki-97 merged 7 commits intomainfrom
hemil/automodel-data-refactor
Jan 28, 2026
Merged

feat: refactor common data utilities of dtensor policy v2#1710
yuki-97 merged 7 commits intomainfrom
hemil/automodel-data-refactor

Conversation

@hemildesai
Copy link
Contributor

@hemildesai hemildesai commented Jan 5, 2026

Depends on #1709

Nightly runs:

Issues

#1589

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Enhanced data processing pipeline supporting sequence packing, flash attention optimization, and multimodal input handling for improved training efficiency.
    • Improved distributed training support with context-parallel and data-parallel enhancements.
  • Tests

    • Added comprehensive unit tests for data processing workflows.
  • Chores

    • Refactored internal data handling to streamline microbatch processing across training operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@hemildesai hemildesai mentioned this pull request Jan 5, 2026
3 tasks
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

⚠️ File Consistency Check

Check based on commit: 0dfbb1e (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@hemildesai hemildesai force-pushed the hemil/automodel-init-refactor branch from 4f66b8f to 81174e5 Compare January 9, 2026 14:03
@hemildesai hemildesai force-pushed the hemil/automodel-data-refactor branch from 0dfbb1e to 2959be7 Compare January 9, 2026 22:25
@github-actions
Copy link

github-actions bot commented Jan 9, 2026

⚠️ File Consistency Check

Check based on commit: 2959be7 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Base automatically changed from hemil/automodel-init-refactor to main January 20, 2026 06:56
@hemildesai hemildesai force-pushed the hemil/automodel-data-refactor branch from 2959be7 to be28e23 Compare January 20, 2026 20:10
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: be28e23 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: a6c1a82 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@hemildesai hemildesai marked this pull request as ready for review January 20, 2026 21:02
@hemildesai hemildesai requested review from a team as code owners January 20, 2026 21:02
@hemildesai hemildesai added the CI:L0 Run doctests and unit tests label Jan 20, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 20, 2026

📝 Walkthrough

Walkthrough

Introduces a new data processing module for automodel with dataclasses (ProcessedInputs, ProcessedMicrobatch) and utility functions to handle microbatch iteration, sequence packing, context parallelism, and distributed training. Refactors dtensor_policy_worker_v2.py to use the centralized processing pipeline. Adds comprehensive test coverage for the new data utilities.

Changes

Cohort / File(s) Summary
New Data Processing Module
nemo_rl/models/automodel/data.py
Introduces ProcessedInputs and ProcessedMicrobatch dataclasses with properties for context parallelism, flash attention, and multimodal detection. Adds make_processed_microbatch_iterator(), get_microbatch_iterator(), process_microbatch(), and process_global_batch() functions to handle batching, sequence packing, distributed coordination via all_reduce, and flash attention configuration.
Policy Worker Refactoring
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Replaces manual per-batch microbatching logic in train, get_logprobs, score, and get_topk_logits with centralized get_microbatch_iterator and make_processed_microbatch_iterator workflows. Integrates process_global_batch for distributed normalization factors. Extracts input preparation from processed microbatches instead of constructing inputs inline. Simplified forward-path handling for packing and context parallel.
Test Coverage
tests/unit/models/automodel/test_automodel_data.py
Comprehensive unit tests for microbatch iterator creation, process_microbatch with packing/multimodal/context-parallel scenarios, data structure validation, process_global_batch logic, and end-to-end integration flows with mocked distributed operations.

Sequence Diagram(s)

sequenceDiagram
    participant Iterator as Raw Iterator
    participant MBIter as Microbatch Iterator
    participant Proc as process_microbatch()
    participant Tokenizer as Tokenizer
    participant DPMesh as DP Mesh (all_reduce)
    participant Inputs as ProcessedInputs

    Iterator->>MBIter: raw BatchedDataDict
    loop per item in iterator
        MBIter->>Proc: call with BatchedDataDict
        Proc->>Tokenizer: tokenize/prepare
        alt Sequence Packing Enabled
            Proc->>Proc: pack_sequences()
            Proc->>Proc: get_flash_attention_kwargs()
        end
        alt Context Parallel (cp_size > 1)
            Proc->>Proc: construct cp_buffers<br/>and seq_index
        end
        Proc->>Inputs: create ProcessedInputs
        MBIter->>MBIter: wrap in ProcessedMicrobatch
        MBIter->>Iterator: yield ProcessedMicrobatch
    end
    
    Iterator->>DPMesh: process_global_batch()<br/>extracts batch
    DPMesh->>DPMesh: all_reduce(global_valid_seqs,<br/>global_valid_toks)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA-NeMo/RL#1709: Refactors automodel initialization and setup in the same dtensor_policy_worker_v2 module alongside this PR's data processing pipeline changes.

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • yuki-97
  • yfw
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.30% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Results For Major Changes ⚠️ Warning PR makes major changes (new APIs, significant refactoring) but PR description lacks test results, performance analysis, convergence validation, and has empty Usage section despite 1028 lines of tests added. Update PR description to include: unit test results/status, functional test validation, performance impact analysis with before/after numbers, and convergence validation demonstrating no numerical regression.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main refactoring: extracting common data utilities from dtensor policy v2 into a dedicated module with structured data processing helpers.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)

492-503: Avoid leaking metrics from dummy microbatches.
num_valid_samples is only set for real microbatches; dummy iterations can reuse the prior value and append duplicate metrics. Explicitly zero it (or guard the append) for dummy batches.

🐛 Suggested fix
-                        else:
-                            loss *= 0
+                        else:
+                            loss *= 0
+                            num_valid_samples = 0
🤖 Fix all issues with AI agents
In `@nemo_rl/models/automodel/data.py`:
- Around line 1-13: Update the file header year from 2025 to 2026 in the NVIDIA
copyright block at the top of nemo_rl/models/automodel/data.py; locate the
existing copyright comment starting with "# Copyright (c) 2025, NVIDIA
CORPORATION." and change "2025" to "2026" so the header reflects the current
year.

In `@nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`:
- Around line 639-651: The unpacking of packed logprobs uses input_ids.shape[0]
(which is 1 when packed) and thus allocates/iterates the wrong batch size;
instead read the original batch size from the batch metadata stored in lp_batch
(e.g., lp_batch.original_batch_size or lp_batch['original_batch_size']) and use
that value when allocating unpacked_logprobs and for the unpacking loop/range in
the unpacking logic (the code that creates unpacked_logprobs and iterates over
batch indices). Apply the same change to the other unpacking site referenced
(the block around the unpacking logic in the later section).
- Around line 969-973: In get_topk_logits, skip dummy microbatches by checking
batch_idx against iterator_len inside the loop over processed_iterator—after
obtaining batch_idx and processed_mb (and before using
lp_batch/processed_inputs/input_lengths), add a guard like if batch_idx >=
iterator_len: break (or continue if appropriate) so dummy microbatches do not
produce extra outputs; update any related logic in get_topk_logits to rely on
iterator_len instead of processing all entries from processed_iterator.
- Around line 866-870: In the score() loop inside dtensor_policy_worker_v2.py,
skip any dummy microbatches appended by sequence packing by checking the batch
index against iterator_len; inside the for batch_idx, processed_mb in
enumerate(processed_iterator) loop (which updates step and collects
all_rm_scores), add a guard like if batch_idx >= iterator_len: break or continue
so you do not process or append scores for padded/dummy batches (use the
existing batch_idx, iterator_len, processed_mb, all_rm_scores, step symbols to
locate the change).

In `@tests/unit/models/automodel/test_automodel_data.py`:
- Around line 70-125: Tests contain unused variables/fixtures (e.g.,
mb_iterator, dummy_iterator, mock_tokenizer, *args/**kwargs) that trigger Ruff
ARG001/ARG002/RUF059; update the tests (including functions like
test_dynamic_batching and usages around get_microbatch_iterator,
BatchedDataDict, make_microbatch_iterator_with_dynamic_shapes,
get_microbatch_iterator_dynamic_shapes_len) by renaming intentionally unused
locals/fixtures with a leading underscore (e.g., _mb_iterator, _dummy_iterator,
_mock_tokenizer or _args/_kwargs) or, where truly intentional, annotate the
binding with a trailing "# noqa" to suppress the linter; then re-run Ruff to
confirm all reported warnings (including the other listed test regions) are
resolved.
🧹 Nitpick comments (1)
nemo_rl/models/automodel/data.py (1)

217-220: Track the sequence‑packing workaround.
The TODO indicates a known workaround for min_seq_len; consider linking it to an issue or follow‑up ticket so it doesn’t linger.

If you want, I can help draft a follow-up issue or propose a replacement implementation.

@hemildesai hemildesai added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Jan 20, 2026
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: ef3a135 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 0a6c4c6 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@hemildesai hemildesai added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Jan 22, 2026
Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me, small comments. like the first PR can you run the dtensor v2 nightlies to make sure this change hasn't broken anything

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
@hemildesai hemildesai force-pushed the hemil/automodel-data-refactor branch from 1ff4e30 to aba0a9d Compare January 27, 2026 20:23
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: aba0a9d (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 27, 2026
Signed-off-by: Hemil Desai <hemild@nvidia.com>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: c8a01c9 (PR #1710 from hemil/automodel-data-refactor)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@hemildesai hemildesai added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Jan 28, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) January 28, 2026 06:18
@yuki-97 yuki-97 merged commit 4c30b9d into main Jan 28, 2026
42 of 43 checks passed
@yuki-97 yuki-97 deleted the hemil/automodel-data-refactor branch January 28, 2026 06:42
hijkzzz pushed a commit to hijkzzz/RL that referenced this pull request Jan 30, 2026
…o#1710)

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: jianh <jianh@nvidia.com>
xavier-owkin pushed a commit to owkin/Owkin-NeMo-RL that referenced this pull request Feb 10, 2026
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
…o#1710)

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…o#1710)

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…o#1710)

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Hemil Desai <hemild@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Hemil Desai <hemild@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants