Skip to content

feat: Add chunked linear ce loss function from hidden states#2036

Merged
yuki-97 merged 28 commits intoNVIDIA-NeMo:mainfrom
pengdurice:peng-add-linear-ce-fusion-v1
Mar 19, 2026
Merged

feat: Add chunked linear ce loss function from hidden states#2036
yuki-97 merged 28 commits intoNVIDIA-NeMo:mainfrom
pengdurice:peng-add-linear-ce-fusion-v1

Conversation

@pengdurice
Copy link
Contributor

@pengdurice pengdurice commented Feb 27, 2026

What does this PR do ?

chunked linear cross entropy loss to avoid materialization of logit tensors to avoid OOM.

Issues

None

Key changes

  1. Patched the forward function of GPT model such that when linear ce fusion loss is enabled, it will return the logprobs shape = [batch_size, seq_length] instead of the logits of shape [batch_size, seq_length, vocab_size] (up to parallelism).
  2. Added autograd function that takes the hidden states as input, chunk the input on the sequence dim, generate the logits on the fly and then return the log probs that are used by 1.
  3. Added the SFT loss functions that takes the log probs as input directly.
  4. Implemented a sequence packing version.
  5. Streamlined the calling of the functionality.

Usage

  • Enable megatron_cfg and add add the two flags below
#  
megatron_cfg:
    enabled: true
    use_linear_ce_fusion_loss: true
    linear_ce_fusion_chunk_size: 256 # or other numbers 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • Tests:
  • Unit test passed
  • Loss curves between baseline and experiment (chunked hidden states --> logprobs enabled)
  • The longest seq length without OOM extended from < 65K to > 100K.
Screenshot 2026-02-27 at 11 25 22 AM

Summary by CodeRabbit

  • New Features

    • Introduced a new linear cross-entropy fusion loss function option configurable for training workflows
    • Supports distributed training and sequence-packing scenarios for enhanced flexibility
  • Tests

    • Added comprehensive distributed testing framework validating the new loss computation implementation across multiple GPU configurations

Signed-off-by: pengdurice <pengduhit@gmail.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
@pengdurice pengdurice requested review from a team as code owners February 27, 2026 20:54
Signed-off-by: pengdurice <pengduhit@gmail.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

This PR introduces a linear cross-entropy fusion loss mechanism for efficient distributed training. It adds NLLLinearCEFusionLoss and SequencePackingNLLLinearCEFusionLossWrapper loss classes, distributed hidden-state-to-logprobs computation functions with custom autograd support, and integrates these components into the Megatron-based SFT pipeline via configuration-driven patching.

Changes

Cohort / File(s) Summary
Loss Functions
nemo_rl/algorithms/loss_functions.py
Added NLLLinearCEFusionLoss class for token-level NLL computation with optional DPO mode and SequencePackingNLLLinearCEFusionLossWrapper for handling sequence-packed inputs by iterating and accumulating per-sequence losses.
Distributed Hidden State to Log Probabilities
nemo_rl/distributed/model_utils.py
Introduced from_parallel_hidden_states_to_logprobs function for computing log probabilities from tensor-parallel sharded hidden states. Added ChunkedDistributedHiddenStatesToLogprobs autograd function with forward/backward implementations for distributed log-softmax and probability gathering. Added patch_gpt_model_forward_for_linear_ce_fusion and _gpt_forward_with_linear_ce_fusion to enable fused linear CE computation path in GPT model forward passes.
SFT Setup Integration
nemo_rl/algorithms/sft.py
Added import for NLLLinearCEFusionLoss. Modified loss function selection logic in setup() to conditionally use NLLLinearCEFusionLoss when Megatron config is enabled and use_linear_ce_fusion_loss flag is true; otherwise fall back to NLLLoss.
Megatron Model Configuration
nemo_rl/models/megatron/setup.py
Added import and conditional patching of GPT model forward when use_linear_ce_fusion_loss is enabled. Added runtime guard to disallow context parallelism when linear CE fusion is active. Injected patch_gpt_model_forward_for_linear_ce_fusion in composed_peft_hook and setup_model_and_optimizer with configurable chunk_size.
Megatron Training Loop
nemo_rl/models/megatron/train.py
Added import for SequencePackingNLLLinearCEFusionLossWrapper. Implemented configuration-driven path to pass labels and set return_logprobs_for_linear_ce_fusion flag to model when linear CE fusion is enabled. Updated loss wrapper selection to use SequencePackingNLLLinearCEFusionLossWrapper based on use_linear_ce_fusion_loss flag.
Type Definitions
nemo_rl/models/policy/workers/megatron_policy_worker.py
Added TokenizerType as a TypeVar bound to PreTrainedTokenizerBase for generic tokenizer parameter typing.
Distributed Tests
tests/unit/distributed/test_model_utils.py
Added comprehensive Ray-based distributed tests for from_parallel_hidden_states_to_logprobs with HiddenStatesToLogprobsTestActor. Includes forward/backward consistency checks against PyTorch baseline, inference-only path validation, and parameterized coverage over tensor-parallel sizes and chunk sizes.

Sequence Diagram(s)

sequenceDiagram
    participant Training as Training Loop
    participant GPTModel as GPT Model<br/>(Patched Forward)
    participant DistLogprobs as ChunkedDistributedHiddenStatesToLogprobs
    participant LossFn as NLLLinearCEFusionLoss
    participant Wrapper as SequencePackingNLLLinearCEFusionLossWrapper

    Training->>GPTModel: forward(input_ids, labels,<br/>return_logprobs_for_linear_ce_fusion=True)
    activate GPTModel
    GPTModel->>DistLogprobs: from_parallel_hidden_states_to_logprobs<br/>(tensor_parallel_hidden_states, target, ...)
    activate DistLogprobs
    DistLogprobs->>DistLogprobs: Chunked distributed log-softmax<br/>with gather across TP ranks
    DistLogprobs-->>GPTModel: token_logprobs
    deactivate DistLogprobs
    GPTModel-->>Training: logprobs (instead of loss)
    deactivate GPTModel

    Training->>Wrapper: forward(logprobs, labels, ...)
    activate Wrapper
    Wrapper->>Wrapper: Iterate over packed sequences,<br/>unpad per-sequence data
    Wrapper->>LossFn: forward(unpadded_logprobs,<br/>unpadded_labels, ...)
    activate LossFn
    LossFn->>LossFn: Compute token-level NLL loss
    LossFn-->>Wrapper: per-batch loss + metrics
    deactivate LossFn
    Wrapper->>Wrapper: Accumulate losses<br/>across sequences
    Wrapper-->>Training: final_loss, metrics
    deactivate Wrapper

    Training->>Training: Backward pass with accumulated loss
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

CI:L2, Run CICD

Suggested reviewers

  • terrykong
  • parthchadha
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.91% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a chunked linear cross-entropy loss function that operates on hidden states instead of materialized logits, which is the core innovation in this PR.
Test Results For Major Changes ✅ Passed PR contains major changes with comprehensive test documentation including 226+ lines of unit tests, training loss comparison graph, and performance metrics demonstrating no regression.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can get early access to new features in CodeRabbit.

Enable the early_access setting to enable early access features such as new models, tools, and more.

@pengdurice pengdurice changed the title Add chunked linear ce loss function from hidden states feat: Add chunked linear ce loss function from hidden states Feb 27, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (3)
nemo_rl/algorithms/sft.py (1)

213-213: Update setup return typing to include fusion loss.

Line 213 can now return NLLLinearCEFusionLoss, but setup(...) is still annotated as returning NLLLoss only. Please update the return type to a union for type safety.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/algorithms/sft.py` at line 213, The annotated return type of setup is
too narrow: since loss_fn can be either NLLLoss or NLLLinearCEFusionLoss (see
the assignment to loss_fn and the megaron/linear fusion flags), update the
setup(...) function signature to reflect a union return type (e.g.
Union[NLLLoss, NLLLinearCEFusionLoss]) or the common base class if one exists;
also add the necessary typing import (from typing import Union) and adjust any
downstream type hints or stubs that assumed only NLLLoss.
nemo_rl/models/megatron/train.py (1)

313-314: Add an explicit loss-type guard for the fusion wrapper path.

Line 313 chooses the wrapper only from config. If a non-fusion-compatible loss is passed while the flag is on, the failure mode will be late and opaque. Consider a clear upfront TypeError guard.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/models/megatron/train.py` around lines 313 - 314, The selection of
sequence_packing_loss_wrapper_type uses only the cfg flag and can pick the
fusion wrapper even when an incompatible loss object is passed; add an explicit
upfront TypeError guard when
self.cfg["megatron_cfg"]["use_linear_ce_fusion_loss"] is true that validates the
provided loss is a fusion-compatible type (e.g., check isinstance(loss,
<appropriate fusion-compatible loss classes>) or the presence of a
fusion-compatible attribute/method) before constructing
SequencePackingNLLLinearCEFusionLossWrapper, and raise a clear TypeError naming
the expected loss types and the actual type if validation fails; update the
logic around sequence_packing_loss_wrapper_type/loss_fn to perform this check
prior to instantiation.
nemo_rl/distributed/model_utils.py (1)

1067-1069: Trim unused parameters from from_parallel_hidden_states_to_logprobs to keep the API truthful.

output_weight and runtime_gather_output are passed (Lines 1386-1390) but never consumed by the implementation, which makes the interface misleading.

Proposed refactor
 def from_parallel_hidden_states_to_logprobs(
     tensor_parallel_hidden_states: torch.Tensor,
     output_weight_layer: torch.Tensor,
-    output_weight: torch.Tensor,
-    runtime_gather_output: bool,
     target: torch.Tensor,
@@
     logprobs = from_parallel_hidden_states_to_logprobs(
         hidden_states,
         output_weight_layer,
-        self.shared_embedding_or_output_weight()
-        if self.share_embeddings_and_output_weights
-        else self.output_layer.weight,
-        runtime_gather_output,
         labels,

Also applies to: 1386-1390

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_rl/distributed/model_utils.py` around lines 1067 - 1069, The function
signature for from_parallel_hidden_states_to_logprobs currently declares unused
parameters output_weight and runtime_gather_output; remove these parameters from
the function definition and from any calls that pass them (the callers that
invoke from_parallel_hidden_states_to_logprobs with output_weight and
runtime_gather_output) so the API matches the implementation, then run
tests/formatting to ensure no remaining references to those symbols remain; keep
the function name from_parallel_hidden_states_to_logprobs as the identifier to
locate and update both the definition and all call sites.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_rl/algorithms/loss_functions.py`:
- Line 1358: Remove the unused local assignments that cause Ruff F841 by
deleting the unused variables seq_index and seq_end where they are assigned;
locate the assignments (e.g., the line setting seq_index = data.get("seq_index",
None) and the block assigning seq_end) in the function(s) in loss_functions.py
and simply remove those assignment statements (or the unused unpacking) so the
variables are not created if they are not referenced elsewhere.

In `@nemo_rl/algorithms/sft.py`:
- Line 213: The loss selection currently uses hidden defaults via .get(...,
False); change it to read the config keys directly and rely on YAML-provided
defaults by using policy_config["megatron_cfg"]["enabled"] and
policy_config["megatron_cfg"]["use_linear_ce_fusion_loss"] (or validate their
presence earlier) when deciding between NLLLinearCEFusionLoss and NLLLoss
(symbols: loss_fn, NLLLinearCEFusionLoss, NLLLoss,
policy_config["megatron_cfg"]). Ensure you remove .get default values here and
either add an explicit config validation step before this line or let a KeyError
surface so missing values are fixed in configuration rather than silently
defaulted in code.

In `@nemo_rl/distributed/model_utils.py`:
- Around line 1065-1087: The function from_parallel_hidden_states_to_logprobs
forwards chunk_size directly to the chunked autograd op
ChunkedDistributedHiddenStatesToLogprobs.apply but the op expects a positive
integer; validate and normalize chunk_size in
from_parallel_hidden_states_to_logprobs before the apply call (e.g., if
chunk_size is None or <=0, set it to a safe positive default such as 1 or the
hidden dimension), then pass the validated integer to
ChunkedDistributedHiddenStatesToLogprobs.apply; apply the same validation to any
other call sites in this module that forward chunk_size to the chunked autograd
op.
- Around line 1065-1076: The function from_parallel_hidden_states_to_logprobs
currently accepts cp_group but never shards/gathers targets or logprobs by CP,
which can misalign tokens when CP is enabled; update this function to either (A)
implement CP-aware handling: shard targets and gathered logprobs across cp_group
(mirror the TP logic that uses tp_group so that tensor_parallel_hidden_states,
target, and final logprobs are correctly reduced/concatenated across the
model-parallel column group), or (B) add a fail-fast check at the top of
from_parallel_hidden_states_to_logprobs that raises a clear error if cp_group is
not None (or indicates unsupported CP config) so callers (e.g., the call site
passing cp_group=self.cp_group) cannot silently produce incorrect results;
reference the function name from_parallel_hidden_states_to_logprobs and the
parameters cp_group, tensor_parallel_hidden_states, target, tp_group,
runtime_gather_output, output_weight_layer, and output_weight when making the
change.
- Line 1204: Remove the unused local variable assignment
`all_grad_input_output_layer = []` (it is declared but never used) to avoid
leaving a dead local that can interfere with the backward path; locate the
occurrence of `all_grad_input_output_layer` in the function in model_utils.py
and delete the assignment line (or if the variable was intended to be used, wire
it into the computation where gradients are collected instead of leaving it
unused).
- Around line 1276-1281: The code embeds a hidden default chunk_size (256) in
patch_gpt_model_forward_for_linear_ce_fusion and in a getattr call; remove these
hard-coded defaults and read the required value from the canonical config
instead. Update patch_gpt_model_forward_for_linear_ce_fusion to not use a
default parameter (make chunk_size required or accept None and immediately load
policy_cfg['linear_ce_fusion_chunk_size']), remove the getattr(..., 256) usage
so it does not fall back silently, and set GPTModel._linear_ce_fusion_chunk_size
from the explicit config value; keep the existing attribute names
(GPTModel._linear_ce_fusion_chunk_size,
GPTModel._original_forward_for_linear_ce_fusion,
GPTModel._linear_ce_fusion_forward_patched) so the patch logic still finds and
sets the model attributes.

In `@nemo_rl/models/megatron/setup.py`:
- Around line 744-749: The code currently falls back to a hardcoded default
(256) for linear_ce_fusion_chunk_size inside the call to
patch_gpt_model_forward_for_linear_ce_fusion; remove that inline default and
instead require the value be supplied from configuration
(policy_cfg["megatron_cfg"]["linear_ce_fusion_chunk_size"]) or explicitly
validate/panic if missing. Update the conditional around
use_linear_ce_fusion_loss to read the chunk size from policy_cfg (no .get
default), pass that value into patch_gpt_model_forward_for_linear_ce_fusion, and
add a clear error/validation message if linear_ce_fusion_chunk_size is absent or
None so YAML remains the single source of truth.

---

Nitpick comments:
In `@nemo_rl/algorithms/sft.py`:
- Line 213: The annotated return type of setup is too narrow: since loss_fn can
be either NLLLoss or NLLLinearCEFusionLoss (see the assignment to loss_fn and
the megaron/linear fusion flags), update the setup(...) function signature to
reflect a union return type (e.g. Union[NLLLoss, NLLLinearCEFusionLoss]) or the
common base class if one exists; also add the necessary typing import (from
typing import Union) and adjust any downstream type hints or stubs that assumed
only NLLLoss.

In `@nemo_rl/distributed/model_utils.py`:
- Around line 1067-1069: The function signature for
from_parallel_hidden_states_to_logprobs currently declares unused parameters
output_weight and runtime_gather_output; remove these parameters from the
function definition and from any calls that pass them (the callers that invoke
from_parallel_hidden_states_to_logprobs with output_weight and
runtime_gather_output) so the API matches the implementation, then run
tests/formatting to ensure no remaining references to those symbols remain; keep
the function name from_parallel_hidden_states_to_logprobs as the identifier to
locate and update both the definition and all call sites.

In `@nemo_rl/models/megatron/train.py`:
- Around line 313-314: The selection of sequence_packing_loss_wrapper_type uses
only the cfg flag and can pick the fusion wrapper even when an incompatible loss
object is passed; add an explicit upfront TypeError guard when
self.cfg["megatron_cfg"]["use_linear_ce_fusion_loss"] is true that validates the
provided loss is a fusion-compatible type (e.g., check isinstance(loss,
<appropriate fusion-compatible loss classes>) or the presence of a
fusion-compatible attribute/method) before constructing
SequencePackingNLLLinearCEFusionLossWrapper, and raise a clear TypeError naming
the expected loss types and the actual type if validation fails; update the
logic around sequence_packing_loss_wrapper_type/loss_fn to perform this check
prior to instantiation.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 618c582 and fbc7a7d.

📒 Files selected for processing (7)
  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/algorithms/sft.py
  • nemo_rl/distributed/model_utils.py
  • nemo_rl/models/megatron/setup.py
  • nemo_rl/models/megatron/train.py
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
  • tests/unit/distributed/test_model_utils.py

Signed-off-by: pengdurice <pengduhit@gmail.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
@pengdurice
Copy link
Contributor Author

@yuki-97 , Hi, this is Peng. This PR is to add chunked linear ce fusion loss to avoid the OOM caused by large logit tensor. Would you please help take a look? Thanks!

Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @pengdurice , thanks for the contribution, great work! I left some comments.

and do you mind helping to add your experiment at the PR description as a nightly test? you can refer to https://github.com/NVIDIA-NeMo/RL/pull/1866/changes.

  1. add a config under examples/configs/recipes/llm/
  2. add a script under tests/test_suites/llm/
  3. add the test to tests/test_suites/nightly.txt

@yuki-97
Copy link
Contributor

yuki-97 commented Mar 4, 2026

@terrykong could you or find someone who's familiar with mcore distributed part to take a review at nemo_rl/distributed/model_utils.py? I haven't reviewed this file in detail.

Signed-off-by: pengdurice <pengduhit@gmail.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Mar 5, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
@pengdurice
Copy link
Contributor Author

@yuki-97 , thank you so much for your review! I have fixed according to your comments, added the nightly test config and sh files. @terrykong , would you please review the model_utils.py when you got a chance? thanks!

Copy link
Collaborator

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaoyu-33 @ananthsub can you review?

@terrykong terrykong requested review from ananthsub and yaoyu-33 March 5, 2026 07:29
@pengdurice
Copy link
Contributor Author

@yaoyu-33 @ananthsub can you help review please? thanks!

Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @pengdurice , thanks for the update and sorry for waiting.
I just took a review at model_utils.py and left some comments for that file and previous updates. and there seems some conflict with current main, so you'll need to do a rebase.

@terrykong could you take a review as well?

@chtruong814 chtruong814 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:Lfast Runs a fast test suite and re-use nightly `main` container (but sync dependencies to PRs version) labels Mar 18, 2026
@chtruong814
Copy link
Contributor

/ok to test de97d32

Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @pengdurice , looks there's some unit tests failed. can you help to fix?
https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009384?pr=2036
https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009385?pr=2036
just a reminder that you may need to run the lint command again after fixing.

and thanks again for your contribution!

pengdurice and others added 4 commits March 18, 2026 08:26
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
…_fusion_loss value

Signed-off-by: pengdurice <pengduhit@gmail.com>
@pengdurice
Copy link
Contributor Author

hi @pengdurice , looks there's some unit tests failed. can you help to fix? https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009384?pr=2036 https://github.com/NVIDIA-NeMo/RL/actions/runs/23230488220/job/67540009385?pr=2036 just a reminder that you may need to run the lint command again after fixing.

and thanks again for your contribution!

Thank you! It looks to me that it is missing config issue. Thank you for the fix;-) I also changed another two places to be on the safe side, LMK if that's not necessary (one in sft.py), thanks!

Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97
Copy link
Contributor

yuki-97 commented Mar 18, 2026

good catch for these two! the one in SFT should always be in the config so let's just policy_config["megatron_cfg"]["use_linear_ce_fusion_loss"], but it's just a nit hh.

I directly update it and let's run CI again.

@yuki-97
Copy link
Contributor

yuki-97 commented Mar 18, 2026

/ok to test 74bdffb

yuki-97
yuki-97 previously approved these changes Mar 18, 2026
@yuki-97 yuki-97 enabled auto-merge (squash) March 18, 2026 16:28
Signed-off-by: pengdurice <pengduhit@gmail.com>
auto-merge was automatically disabled March 18, 2026 22:15

Head branch was pushed to by a user without write access

@pengdurice
Copy link
Contributor Author

@yuki-97, some tests failed (due to the wrong sh name in the nightly.txt, that was skipped on my local cause I called that sh file directly to test), would you mind trigger the CICD again? thank you!

@yuki-97
Copy link
Contributor

yuki-97 commented Mar 19, 2026

/ok to test c87b344

@yuki-97 yuki-97 merged commit 94fa37d into NVIDIA-NeMo:main Mar 19, 2026
57 of 60 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests community-request documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants