feat: refactor common data utilities of dtensor policy v2#1710
Conversation
|
4f66b8f to
81174e5
Compare
0dfbb1e to
2959be7
Compare
|
2959be7 to
be28e23
Compare
|
|
📝 WalkthroughWalkthroughIntroduces a new data processing module for automodel with dataclasses ( Changes
Sequence Diagram(s)sequenceDiagram
participant Iterator as Raw Iterator
participant MBIter as Microbatch Iterator
participant Proc as process_microbatch()
participant Tokenizer as Tokenizer
participant DPMesh as DP Mesh (all_reduce)
participant Inputs as ProcessedInputs
Iterator->>MBIter: raw BatchedDataDict
loop per item in iterator
MBIter->>Proc: call with BatchedDataDict
Proc->>Tokenizer: tokenize/prepare
alt Sequence Packing Enabled
Proc->>Proc: pack_sequences()
Proc->>Proc: get_flash_attention_kwargs()
end
alt Context Parallel (cp_size > 1)
Proc->>Proc: construct cp_buffers<br/>and seq_index
end
Proc->>Inputs: create ProcessedInputs
MBIter->>MBIter: wrap in ProcessedMicrobatch
MBIter->>Iterator: yield ProcessedMicrobatch
end
Iterator->>DPMesh: process_global_batch()<br/>extracts batch
DPMesh->>DPMesh: all_reduce(global_valid_seqs,<br/>global_valid_toks)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py (1)
492-503: Avoid leaking metrics from dummy microbatches.
num_valid_samplesis only set for real microbatches; dummy iterations can reuse the prior value and append duplicate metrics. Explicitly zero it (or guard the append) for dummy batches.🐛 Suggested fix
- else: - loss *= 0 + else: + loss *= 0 + num_valid_samples = 0
🤖 Fix all issues with AI agents
In `@nemo_rl/models/automodel/data.py`:
- Around line 1-13: Update the file header year from 2025 to 2026 in the NVIDIA
copyright block at the top of nemo_rl/models/automodel/data.py; locate the
existing copyright comment starting with "# Copyright (c) 2025, NVIDIA
CORPORATION." and change "2025" to "2026" so the header reflects the current
year.
In `@nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`:
- Around line 639-651: The unpacking of packed logprobs uses input_ids.shape[0]
(which is 1 when packed) and thus allocates/iterates the wrong batch size;
instead read the original batch size from the batch metadata stored in lp_batch
(e.g., lp_batch.original_batch_size or lp_batch['original_batch_size']) and use
that value when allocating unpacked_logprobs and for the unpacking loop/range in
the unpacking logic (the code that creates unpacked_logprobs and iterates over
batch indices). Apply the same change to the other unpacking site referenced
(the block around the unpacking logic in the later section).
- Around line 969-973: In get_topk_logits, skip dummy microbatches by checking
batch_idx against iterator_len inside the loop over processed_iterator—after
obtaining batch_idx and processed_mb (and before using
lp_batch/processed_inputs/input_lengths), add a guard like if batch_idx >=
iterator_len: break (or continue if appropriate) so dummy microbatches do not
produce extra outputs; update any related logic in get_topk_logits to rely on
iterator_len instead of processing all entries from processed_iterator.
- Around line 866-870: In the score() loop inside dtensor_policy_worker_v2.py,
skip any dummy microbatches appended by sequence packing by checking the batch
index against iterator_len; inside the for batch_idx, processed_mb in
enumerate(processed_iterator) loop (which updates step and collects
all_rm_scores), add a guard like if batch_idx >= iterator_len: break or continue
so you do not process or append scores for padded/dummy batches (use the
existing batch_idx, iterator_len, processed_mb, all_rm_scores, step symbols to
locate the change).
In `@tests/unit/models/automodel/test_automodel_data.py`:
- Around line 70-125: Tests contain unused variables/fixtures (e.g.,
mb_iterator, dummy_iterator, mock_tokenizer, *args/**kwargs) that trigger Ruff
ARG001/ARG002/RUF059; update the tests (including functions like
test_dynamic_batching and usages around get_microbatch_iterator,
BatchedDataDict, make_microbatch_iterator_with_dynamic_shapes,
get_microbatch_iterator_dynamic_shapes_len) by renaming intentionally unused
locals/fixtures with a leading underscore (e.g., _mb_iterator, _dummy_iterator,
_mock_tokenizer or _args/_kwargs) or, where truly intentional, annotate the
binding with a trailing "# noqa" to suppress the linter; then re-run Ruff to
confirm all reported warnings (including the other listed test regions) are
resolved.
🧹 Nitpick comments (1)
nemo_rl/models/automodel/data.py (1)
217-220: Track the sequence‑packing workaround.
The TODO indicates a known workaround formin_seq_len; consider linking it to an issue or follow‑up ticket so it doesn’t linger.If you want, I can help draft a follow-up issue or propose a replacement implementation.
|
|
terrykong
left a comment
There was a problem hiding this comment.
looks good to me, small comments. like the first PR can you run the dtensor v2 nightlies to make sure this change hasn't broken anything
1ff4e30 to
aba0a9d
Compare
|
|
…o#1710) Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: jianh <jianh@nvidia.com>
…o#1710) Signed-off-by: Hemil Desai <hemild@nvidia.com>
…o#1710) Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
…o#1710) Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
…o#1710) Signed-off-by: Hemil Desai <hemild@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Depends on #1709
Nightly runs:
Issues
#1589
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.