feat: Add chunked linear ce loss function from hidden states by pengdurice · Pull Request #2036 · NVIDIA-NeMo/RL

pengdurice · 2026-02-27T20:54:09Z

What does this PR do ?

chunked linear cross entropy loss to avoid materialization of logit tensors to avoid OOM.

Issues

None

Key changes

Patched the forward function of GPT model such that when linear ce fusion loss is enabled, it will return the logprobs shape = [batch_size, seq_length] instead of the logits of shape [batch_size, seq_length, vocab_size] (up to parallelism).
Added autograd function that takes the hidden states as input, chunk the input on the sequence dim, generate the logits on the fly and then return the log probs that are used by 1.
Added the SFT loss functions that takes the log probs as input directly.
Implemented a sequence packing version.
Streamlined the calling of the functionality.

Usage

Enable megatron_cfg and add add the two flags below

#  
megatron_cfg:
    enabled: true
    use_linear_ce_fusion_loss: true
    linear_ce_fusion_chunk_size: 256 # or other numbers

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

Tests:
Unit test passed
Loss curves between baseline and experiment (chunked hidden states --> logprobs enabled)
The longest seq length without OOM extended from < 65K to > 100K.

Summary by CodeRabbit

New Features
- Introduced a new linear cross-entropy fusion loss function option configurable for training workflows
- Supports distributed training and sequence-packing scenarios for enhanced flexibility
Tests
- Added comprehensive distributed testing framework validating the new loss computation implementation across multiple GPU configurations

Signed-off-by: pengdurice <pengduhit@gmail.com>

coderabbitai · 2026-02-27T20:55:59Z

📝 Walkthrough

Walkthrough

This PR introduces a linear cross-entropy fusion loss mechanism for efficient distributed training. It adds NLLLinearCEFusionLoss and SequencePackingNLLLinearCEFusionLossWrapper loss classes, distributed hidden-state-to-logprobs computation functions with custom autograd support, and integrates these components into the Megatron-based SFT pipeline via configuration-driven patching.

Changes

Cohort / File(s)	Summary
Loss Functions `nemo_rl/algorithms/loss_functions.py`	Added NLLLinearCEFusionLoss class for token-level NLL computation with optional DPO mode and SequencePackingNLLLinearCEFusionLossWrapper for handling sequence-packed inputs by iterating and accumulating per-sequence losses.
Distributed Hidden State to Log Probabilities `nemo_rl/distributed/model_utils.py`	Introduced from_parallel_hidden_states_to_logprobs function for computing log probabilities from tensor-parallel sharded hidden states. Added ChunkedDistributedHiddenStatesToLogprobs autograd function with forward/backward implementations for distributed log-softmax and probability gathering. Added patch_gpt_model_forward_for_linear_ce_fusion and _gpt_forward_with_linear_ce_fusion to enable fused linear CE computation path in GPT model forward passes.
SFT Setup Integration `nemo_rl/algorithms/sft.py`	Added import for NLLLinearCEFusionLoss. Modified loss function selection logic in setup() to conditionally use NLLLinearCEFusionLoss when Megatron config is enabled and use_linear_ce_fusion_loss flag is true; otherwise fall back to NLLLoss.
Megatron Model Configuration `nemo_rl/models/megatron/setup.py`	Added import and conditional patching of GPT model forward when use_linear_ce_fusion_loss is enabled. Added runtime guard to disallow context parallelism when linear CE fusion is active. Injected patch_gpt_model_forward_for_linear_ce_fusion in composed_peft_hook and setup_model_and_optimizer with configurable chunk_size.
Megatron Training Loop `nemo_rl/models/megatron/train.py`	Added import for SequencePackingNLLLinearCEFusionLossWrapper. Implemented configuration-driven path to pass labels and set return_logprobs_for_linear_ce_fusion flag to model when linear CE fusion is enabled. Updated loss wrapper selection to use SequencePackingNLLLinearCEFusionLossWrapper based on use_linear_ce_fusion_loss flag.
Type Definitions `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Added TokenizerType as a TypeVar bound to PreTrainedTokenizerBase for generic tokenizer parameter typing.
Distributed Tests `tests/unit/distributed/test_model_utils.py`	Added comprehensive Ray-based distributed tests for from_parallel_hidden_states_to_logprobs with HiddenStatesToLogprobsTestActor. Includes forward/backward consistency checks against PyTorch baseline, inference-only path validation, and parameterized coverage over tensor-parallel sizes and chunk sizes.

Sequence Diagram(s)

sequenceDiagram
    participant Training as Training Loop
    participant GPTModel as GPT Model<br/>(Patched Forward)
    participant DistLogprobs as ChunkedDistributedHiddenStatesToLogprobs
    participant LossFn as NLLLinearCEFusionLoss
    participant Wrapper as SequencePackingNLLLinearCEFusionLossWrapper

    Training->>GPTModel: forward(input_ids, labels,<br/>return_logprobs_for_linear_ce_fusion=True)
    activate GPTModel
    GPTModel->>DistLogprobs: from_parallel_hidden_states_to_logprobs<br/>(tensor_parallel_hidden_states, target, ...)
    activate DistLogprobs
    DistLogprobs->>DistLogprobs: Chunked distributed log-softmax<br/>with gather across TP ranks
    DistLogprobs-->>GPTModel: token_logprobs
    deactivate DistLogprobs
    GPTModel-->>Training: logprobs (instead of loss)
    deactivate GPTModel

    Training->>Wrapper: forward(logprobs, labels, ...)
    activate Wrapper
    Wrapper->>Wrapper: Iterate over packed sequences,<br/>unpad per-sequence data
    Wrapper->>LossFn: forward(unpadded_logprobs,<br/>unpadded_labels, ...)
    activate LossFn
    LossFn->>LossFn: Compute token-level NLL loss
    LossFn-->>Wrapper: per-batch loss + metrics
    deactivate LossFn
    Wrapper->>Wrapper: Accumulate losses<br/>across sequences
    Wrapper-->>Training: final_loss, metrics
    deactivate Wrapper

    Training->>Training: Backward pass with accumulated loss

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: refactor megatron data utils #1651: Sequence-packing utilities and microbatch iterator refactoring used by the new SequencePackingNLLLinearCEFusionLossWrapper.
feat: refactor megatron init #1646: Megatron initialization and setup module refactoring that affects the same functions where linear CE fusion patching is injected.
feat: refactor mcore train/forward utilities #1654: Megatron training utilities and loss post-processing logic that overlaps with the linear CE fusion loss wrapper selection.

Suggested labels

CI:L2, Run CICD

Suggested reviewers

terrykong
parthchadha

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: adding a chunked linear cross-entropy loss function that operates on hidden states instead of materialized logits, which is the core innovation in this PR.
Test Results For Major Changes	✅ Passed	PR contains major changes with comprehensive test documentation including 226+ lines of unit tests, training loss comparison graph, and performance metrics demonstrating no regression.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (3)

nemo_rl/algorithms/sft.py (1)

213-213: Update setup return typing to include fusion loss.

Line 213 can now return NLLLinearCEFusionLoss, but setup(...) is still annotated as returning NLLLoss only. Please update the return type to a union for type safety.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/algorithms/sft.py` at line 213, The annotated return type of setup is
too narrow: since loss_fn can be either NLLLoss or NLLLinearCEFusionLoss (see
the assignment to loss_fn and the megaron/linear fusion flags), update the
setup(...) function signature to reflect a union return type (e.g.
Union[NLLLoss, NLLLinearCEFusionLoss]) or the common base class if one exists;
also add the necessary typing import (from typing import Union) and adjust any
downstream type hints or stubs that assumed only NLLLoss.

nemo_rl/models/megatron/train.py (1)

313-314: Add an explicit loss-type guard for the fusion wrapper path.

Line 313 chooses the wrapper only from config. If a non-fusion-compatible loss is passed while the flag is on, the failure mode will be late and opaque. Consider a clear upfront TypeError guard.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/megatron/train.py` around lines 313 - 314, The selection of
sequence_packing_loss_wrapper_type uses only the cfg flag and can pick the
fusion wrapper even when an incompatible loss object is passed; add an explicit
upfront TypeError guard when
self.cfg["megatron_cfg"]["use_linear_ce_fusion_loss"] is true that validates the
provided loss is a fusion-compatible type (e.g., check isinstance(loss,
<appropriate fusion-compatible loss classes>) or the presence of a
fusion-compatible attribute/method) before constructing
SequencePackingNLLLinearCEFusionLossWrapper, and raise a clear TypeError naming
the expected loss types and the actual type if validation fails; update the
logic around sequence_packing_loss_wrapper_type/loss_fn to perform this check
prior to instantiation.

nemo_rl/distributed/model_utils.py (1)

1067-1069: Trim unused parameters from from_parallel_hidden_states_to_logprobs to keep the API truthful.

output_weight and runtime_gather_output are passed (Lines 1386-1390) but never consumed by the implementation, which makes the interface misleading.

Proposed refactor

 def from_parallel_hidden_states_to_logprobs(
     tensor_parallel_hidden_states: torch.Tensor,
     output_weight_layer: torch.Tensor,
-    output_weight: torch.Tensor,
-    runtime_gather_output: bool,
     target: torch.Tensor,
@@
     logprobs = from_parallel_hidden_states_to_logprobs(
         hidden_states,
         output_weight_layer,
-        self.shared_embedding_or_output_weight()
-        if self.share_embeddings_and_output_weights
-        else self.output_layer.weight,
-        runtime_gather_output,
         labels,

Also applies to: 1386-1390

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/distributed/model_utils.py` around lines 1067 - 1069, The function
signature for from_parallel_hidden_states_to_logprobs currently declares unused
parameters output_weight and runtime_gather_output; remove these parameters from
the function definition and from any calls that pass them (the callers that
invoke from_parallel_hidden_states_to_logprobs with output_weight and
runtime_gather_output) so the API matches the implementation, then run
tests/formatting to ensure no remaining references to those symbols remain; keep
the function name from_parallel_hidden_states_to_logprobs as the identifier to
locate and update both the definition and all call sites.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/algorithms/loss_functions.py`:
- Line 1358: Remove the unused local assignments that cause Ruff F841 by
deleting the unused variables seq_index and seq_end where they are assigned;
locate the assignments (e.g., the line setting seq_index = data.get("seq_index",
None) and the block assigning seq_end) in the function(s) in loss_functions.py
and simply remove those assignment statements (or the unused unpacking) so the
variables are not created if they are not referenced elsewhere.

In `@nemo_rl/algorithms/sft.py`:
- Line 213: The loss selection currently uses hidden defaults via .get(...,
False); change it to read the config keys directly and rely on YAML-provided
defaults by using policy_config["megatron_cfg"]["enabled"] and
policy_config["megatron_cfg"]["use_linear_ce_fusion_loss"] (or validate their
presence earlier) when deciding between NLLLinearCEFusionLoss and NLLLoss
(symbols: loss_fn, NLLLinearCEFusionLoss, NLLLoss,
policy_config["megatron_cfg"]). Ensure you remove .get default values here and
either add an explicit config validation step before this line or let a KeyError
surface so missing values are fixed in configuration rather than silently
defaulted in code.

In `@nemo_rl/distributed/model_utils.py`:
- Around line 1065-1087: The function from_parallel_hidden_states_to_logprobs
forwards chunk_size directly to the chunked autograd op
ChunkedDistributedHiddenStatesToLogprobs.apply but the op expects a positive
integer; validate and normalize chunk_size in
from_parallel_hidden_states_to_logprobs before the apply call (e.g., if
chunk_size is None or <=0, set it to a safe positive default such as 1 or the
hidden dimension), then pass the validated integer to
ChunkedDistributedHiddenStatesToLogprobs.apply; apply the same validation to any
other call sites in this module that forward chunk_size to the chunked autograd
op.
- Around line 1065-1076: The function from_parallel_hidden_states_to_logprobs
currently accepts cp_group but never shards/gathers targets or logprobs by CP,
which can misalign tokens when CP is enabled; update this function to either (A)
implement CP-aware handling: shard targets and gathered logprobs across cp_group
(mirror the TP logic that uses tp_group so that tensor_parallel_hidden_states,
target, and final logprobs are correctly reduced/concatenated across the
model-parallel column group), or (B) add a fail-fast check at the top of
from_parallel_hidden_states_to_logprobs that raises a clear error if cp_group is
not None (or indicates unsupported CP config) so callers (e.g., the call site
passing cp_group=self.cp_group) cannot silently produce incorrect results;
reference the function name from_parallel_hidden_states_to_logprobs and the
parameters cp_group, tensor_parallel_hidden_states, target, tp_group,
runtime_gather_output, output_weight_layer, and output_weight when making the
change.
- Line 1204: Remove the unused local variable assignment
`all_grad_input_output_layer = []` (it is declared but never used) to avoid
leaving a dead local that can interfere with the backward path; locate the
occurrence of `all_grad_input_output_layer` in the function in model_utils.py
and delete the assignment line (or if the variable was intended to be used, wire
it into the computation where gradients are collected instead of leaving it
unused).
- Around line 1276-1281: The code embeds a hidden default chunk_size (256) in
patch_gpt_model_forward_for_linear_ce_fusion and in a getattr call; remove these
hard-coded defaults and read the required value from the canonical config
instead. Update patch_gpt_model_forward_for_linear_ce_fusion to not use a
default parameter (make chunk_size required or accept None and immediately load
policy_cfg['linear_ce_fusion_chunk_size']), remove the getattr(..., 256) usage
so it does not fall back silently, and set GPTModel._linear_ce_fusion_chunk_size
from the explicit config value; keep the existing attribute names
(GPTModel._linear_ce_fusion_chunk_size,
GPTModel._original_forward_for_linear_ce_fusion,
GPTModel._linear_ce_fusion_forward_patched) so the patch logic still finds and
sets the model attributes.

In `@nemo_rl/models/megatron/setup.py`:
- Around line 744-749: The code currently falls back to a hardcoded default
(256) for linear_ce_fusion_chunk_size inside the call to
patch_gpt_model_forward_for_linear_ce_fusion; remove that inline default and
instead require the value be supplied from configuration
(policy_cfg["megatron_cfg"]["linear_ce_fusion_chunk_size"]) or explicitly
validate/panic if missing. Update the conditional around
use_linear_ce_fusion_loss to read the chunk size from policy_cfg (no .get
default), pass that value into patch_gpt_model_forward_for_linear_ce_fusion, and
add a clear error/validation message if linear_ce_fusion_chunk_size is absent or
None so YAML remains the single source of truth.

---

Nitpick comments:
In `@nemo_rl/algorithms/sft.py`:
- Line 213: The annotated return type of setup is too narrow: since loss_fn can
be either NLLLoss or NLLLinearCEFusionLoss (see the assignment to loss_fn and
the megaron/linear fusion flags), update the setup(...) function signature to
reflect a union return type (e.g. Union[NLLLoss, NLLLinearCEFusionLoss]) or the
common base class if one exists; also add the necessary typing import (from
typing import Union) and adjust any downstream type hints or stubs that assumed
only NLLLoss.

In `@nemo_rl/distributed/model_utils.py`:
- Around line 1067-1069: The function signature for
from_parallel_hidden_states_to_logprobs currently declares unused parameters
output_weight and runtime_gather_output; remove these parameters from the
function definition and from any calls that pass them (the callers that invoke
from_parallel_hidden_states_to_logprobs with output_weight and
runtime_gather_output) so the API matches the implementation, then run
tests/formatting to ensure no remaining references to those symbols remain; keep
the function name from_parallel_hidden_states_to_logprobs as the identifier to
locate and update both the definition and all call sites.

In `@nemo_rl/models/megatron/train.py`:
- Around line 313-314: The selection of sequence_packing_loss_wrapper_type uses
only the cfg flag and can pick the fusion wrapper even when an incompatible loss
object is passed; add an explicit upfront TypeError guard when
self.cfg["megatron_cfg"]["use_linear_ce_fusion_loss"] is true that validates the
provided loss is a fusion-compatible type (e.g., check isinstance(loss,
<appropriate fusion-compatible loss classes>) or the presence of a
fusion-compatible attribute/method) before constructing
SequencePackingNLLLinearCEFusionLossWrapper, and raise a clear TypeError naming
the expected loss types and the actual type if validation fails; update the
logic around sequence_packing_loss_wrapper_type/loss_fn to perform this check
prior to instantiation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 618c582 and fbc7a7d.

📒 Files selected for processing (7)

nemo_rl/algorithms/loss_functions.py
nemo_rl/algorithms/sft.py
nemo_rl/distributed/model_utils.py
nemo_rl/models/megatron/setup.py
nemo_rl/models/megatron/train.py
nemo_rl/models/policy/workers/megatron_policy_worker.py
tests/unit/distributed/test_model_utils.py

nemo_rl/algorithms/loss/loss_functions.py

nemo_rl/algorithms/sft.py

nemo_rl/distributed/model_utils.py

nemo_rl/models/megatron/setup.py

Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice · 2026-03-03T21:34:48Z

@yuki-97 , Hi, this is Peng. This PR is to add chunked linear ce fusion loss to avoid the OOM caused by large logit tensor. Would you please help take a look? Thanks!

yuki-97

hi @pengdurice , thanks for the contribution, great work! I left some comments.

and do you mind helping to add your experiment at the PR description as a nightly test? you can refer to https://github.com/NVIDIA-NeMo/RL/pull/1866/changes.

add a config under examples/configs/recipes/llm/
add a script under tests/test_suites/llm/
add the test to tests/test_suites/nightly.txt

nemo_rl/algorithms/loss/loss_functions.py

nemo_rl/algorithms/loss/wrapper.py

nemo_rl/algorithms/sft.py

nemo_rl/models/megatron/setup.py

yuki-97 · 2026-03-04T14:52:52Z

@terrykong could you or find someone who's familiar with mcore distributed part to take a review at nemo_rl/distributed/model_utils.py? I haven't reviewed this file in detail.

Signed-off-by: pengdurice <pengduhit@gmail.com>

copy-pr-bot · 2026-03-05T00:31:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice · 2026-03-05T00:35:40Z

@yuki-97 , thank you so much for your review! I have fixed according to your comments, added the nightly test config and sh files. @terrykong , would you please review the model_utils.py when you got a chance? thanks!

terrykong

@yaoyu-33 @ananthsub can you review?

pengdurice · 2026-03-06T17:21:31Z

@yaoyu-33 @ananthsub can you help review please? thanks!

yuki-97

hi @pengdurice , thanks for the update and sorry for waiting.
I just took a review at model_utils.py and left some comments for that file and previous updates. and there seems some conflict with current main, so you'll need to do a rebase.

@terrykong could you take a review as well?

nemo_rl/algorithms/loss/wrapper.py

nemo_rl/algorithms/sft.py

nemo_rl/distributed/model_utils.py

tests/test_suites/llm/sft-qwen2.5-math7b-1n8g-megatron_chunked_linear_ce_loss.sh

chtruong814 · 2026-03-18T05:26:01Z

/ok to test de97d32

yuki-97

hi @pengdurice , looks there's some unit tests failed. can you help to fix?
https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009384?pr=2036
https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009385?pr=2036
just a reminder that you may need to run the lint command again after fixing.

and thanks again for your contribution!

nemo_rl/models/megatron/setup.py

nemo_rl/models/policy/__init__.py

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

…_fusion_loss value Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice · 2026-03-18T15:35:10Z

hi @pengdurice , looks there's some unit tests failed. can you help to fix? https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009384?pr=2036 https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009385?pr=2036 just a reminder that you may need to run the lint command again after fixing.

and thanks again for your contribution!

Thank you! It looks to me that it is missing config issue. Thank you for the fix;-) I also changed another two places to be on the safe side, LMK if that's not necessary (one in sft.py), thanks!

nemo_rl/algorithms/sft.py

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 · 2026-03-18T16:18:04Z

good catch for these two! the one in SFT should always be in the config so let's just policy_config["megatron_cfg"]["use_linear_ce_fusion_loss"], but it's just a nit hh.

I directly update it and let's run CI again.

yuki-97 · 2026-03-18T16:18:55Z

/ok to test 74bdffb

Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice · 2026-03-18T22:23:18Z

@yuki-97, some tests failed (due to the wrong sh name in the nightly.txt, that was skipped on my local cause I called that sh file directly to test), would you mind trigger the CICD again? thank you!

yuki-97 · 2026-03-19T01:06:50Z

/ok to test c87b344

pengdurice added 4 commits February 26, 2026 20:08

first commit, will clean up and add local tests

b593561

Signed-off-by: pengdurice <pengduhit@gmail.com>

code clean up and unit tests added

52b85d4

Signed-off-by: pengdurice <pengduhit@gmail.com>

more code clean up

9ad25fe

Signed-off-by: pengdurice <pengduhit@gmail.com>

remove a print line

322bea8

Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice requested review from a team as code owners February 27, 2026 20:54

remove a print line

fbc7a7d

Signed-off-by: pengdurice <pengduhit@gmail.com>

pengdurice changed the title ~~Add chunked linear ce loss function from hidden states~~ feat: Add chunked linear ce loss function from hidden states Feb 27, 2026

github-actions bot added the community-request label Feb 27, 2026

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

pengdurice added 2 commits March 2, 2026 21:36

rebase on main and then smoothout again

57fd917

Signed-off-by: pengdurice <pengduhit@gmail.com>

fix the wiring of linear ce fusion

a636341

Signed-off-by: pengdurice <pengduhit@gmail.com>

yuki-97 reviewed Mar 4, 2026

View reviewed changes

nemo_rl/algorithms/loss/loss_functions.py Outdated Show resolved Hide resolved

nemo_rl/algorithms/loss/wrapper.py Outdated Show resolved Hide resolved

nemo_rl/algorithms/sft.py Outdated Show resolved Hide resolved

nemo_rl/models/megatron/setup.py Show resolved Hide resolved

pengdurice added 2 commits March 4, 2026 17:09

address comments.

fc199e5

Signed-off-by: pengdurice <pengduhit@gmail.com>

add nightly config and sh files and tested locally

3e10e27

Signed-off-by: pengdurice <pengduhit@gmail.com>

Update nemo_rl/models/megatron/setup.py

e314b90

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

terrykong reviewed Mar 5, 2026

View reviewed changes

terrykong requested review from ananthsub and yaoyu-33 March 5, 2026 07:29

yuki-97 reviewed Mar 12, 2026

View reviewed changes

chtruong814 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 18, 2026

copy-pr-bot bot temporarily deployed to nemo-ci March 18, 2026 05:26 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 18, 2026 07:53 Inactive

yuki-97 reviewed Mar 18, 2026

View reviewed changes

nemo_rl/models/megatron/setup.py Outdated Show resolved Hide resolved

nemo_rl/models/megatron/setup.py Outdated Show resolved Hide resolved

nemo_rl/models/policy/__init__.py Outdated Show resolved Hide resolved

pengdurice and others added 4 commits March 18, 2026 08:26

Update nemo_rl/models/megatron/setup.py

eabe55d

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update nemo_rl/models/policy/__init__.py

751ccb0

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

Update nemo_rl/models/megatron/setup.py

a6cee3a

Co-authored-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: pengdurice <pengduhit@gmail.com>

run pre-commit, also change another place for accessing use_linear_ce…

c8b0a97

…_fusion_loss value Signed-off-by: pengdurice <pengduhit@gmail.com>

yuki-97 reviewed Mar 18, 2026

View reviewed changes

nemo_rl/algorithms/sft.py Outdated Show resolved Hide resolved

Apply suggestion

74bdffb

Signed-off-by: Yuki Huang <yukih@nvidia.com>

copy-pr-bot bot temporarily deployed to nemo-ci March 18, 2026 16:19 Inactive

yuki-97 previously approved these changes Mar 18, 2026

View reviewed changes

yuki-97 enabled auto-merge (squash) March 18, 2026 16:28

copy-pr-bot bot temporarily deployed to nemo-ci March 18, 2026 16:28 Inactive

add nightly fix

c87b344

Signed-off-by: pengdurice <pengduhit@gmail.com>

auto-merge was automatically disabled March 18, 2026 22:15
Head branch was pushed to by a user without write access

pengdurice dismissed yuki-97’s stale review via c87b344 March 18, 2026 22:15

copy-pr-bot bot temporarily deployed to nemo-ci March 19, 2026 01:07 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci March 19, 2026 01:10 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci March 19, 2026 03:18 Failure

copy-pr-bot bot temporarily deployed to nemo-ci March 19, 2026 07:19 Inactive

yuki-97 approved these changes Mar 19, 2026

View reviewed changes

yuki-97 merged commit 94fa37d into NVIDIA-NeMo:main Mar 19, 2026
57 of 60 checks passed

Conversation

pengdurice commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Key changes

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengdurice commented Mar 3, 2026

Uh oh!

yuki-97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuki-97 commented Mar 4, 2026

Uh oh!

copy-pr-bot bot commented Mar 5, 2026

Uh oh!

pengdurice commented Mar 5, 2026

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

pengdurice commented Mar 6, 2026

Uh oh!

yuki-97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chtruong814 commented Mar 18, 2026

Uh oh!

yuki-97 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pengdurice commented Mar 18, 2026

Uh oh!

Uh oh!

yuki-97 commented Mar 18, 2026

Uh oh!

yuki-97 commented Mar 18, 2026

Uh oh!

pengdurice commented Mar 18, 2026

Uh oh!

yuki-97 commented Mar 19, 2026

Uh oh!

Uh oh!

pengdurice commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

yuki-97 left a comment •

edited

Loading