chore: Enable LoRA Nightly Test by RayenTian · Pull Request #1634 · NVIDIA-NeMo/RL

RayenTian · 2025-12-14T09:44:45Z

Summary

This PR enables the LoRA (Low-Rank Adaptation) nightly test that was previously disabled due to dataset compatibility issues. Now that the Tulu3 SFT mixture dataset is properly supported, we can re-enable this important test coverage.

Changes

Enable LoRA Configuration
- Renamed sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml.disabled → sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
- Updated dataset configuration from tulu3 to tulu3_sft_mixture to use the correct dataset implementation
Enable LoRA Test Script
- Renamed sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh.disabled → sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
Update Nightly Test Suite
- Added tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh to nightly.txt
- Removed the comment explaining why the test was disabled
Minor Cleanup
- Removed redundant debug print statement in dataset loading code

Result

Summary by CodeRabbit

Configuration Updates
- Updated dataset reference in the Llama 3.1 8B SFT model configuration
Chores
- Removed diagnostic message from data loading process
- Enabled SFT model test in validation suite

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-12-14T09:45:29Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 6a45aac (PR #1634 from ruit/enable_lora_nightly_test)

❌ Submodules that need attention:

Gym: ❌ PR branch is BEHIND main branch
TARGET (main branch): https://github.com/NVIDIA-NeMo/Gym/commits/39ee39e35bfedef86ccf8d1ada9835ab8f62d267/
CURRENT (PR #1634 from ruit/enable_lora_nightly_test): https://github.com/NVIDIA-NeMo/Gym/commits/035c91e4b5b74598c1a313a28980926e1c2b8439/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

coderabbitai · 2025-12-14T10:28:38Z

📝 Walkthrough

Walkthrough

These changes enable tulu3_sft_mixture dataset support for Llama 3.1 8B SFT training configuration. The configuration file references the updated dataset name, a diagnostic print statement is removed from dataset loading code, and the corresponding test is uncommented to validate the configuration.

Changes

Cohort / File(s)	Summary
Dataset configuration and support `examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml`, `nemo_rl/data/datasets/response_datasets/__init__.py`	Updated data.dataset_name from tulu3 to tulu3_sft_mixture in the configuration. Removed a runtime diagnostic print statement from the tulu3_sft_mixture dataset loading path.
Test enablement `tests/test_suites/nightly.txt`	Uncommented the lora test entry for sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh, enabling the test that was previously disabled due to Tulu3 dataset limitations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Suggested reviewers

parthchadha
yfw

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: enabling the LoRA nightly test that was previously disabled due to dataset compatibility issues.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains only 4 minimal changes: dataset name update, debug print removal, test re-enablement, and test script re-enabling. No new features, breaking changes, algorithmic modifications, or performance impacts.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ruit/enable_lora_nightly_test

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3c0e2 and 0a8db04.

📒 Files selected for processing (3)

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml (1 hunks)
nemo_rl/data/datasets/response_datasets/__init__.py (0 hunks)
tests/test_suites/nightly.txt (1 hunks)

💤 Files with no reviewable changes (1)

nemo_rl/data/datasets/response_datasets/init.py

🧰 Additional context used

📓 Path-based instructions (4)

tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

tests/test_suites/nightly.txt

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

tests/test_suites/nightly.txt
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml

examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml

🧠 Learnings (5)

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

tests/test_suites/nightly.txt

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/nightly.txt

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

tests/test_suites/nightly.txt

📚 Learning: 2025-09-19T07:28:29.887Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

tests/test_suites/nightly.txt

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml

🔇 Additional comments (2)

tests/test_suites/nightly.txt (1)

74-74: Test entry correctly uncommented in nightly registry.

The path format is correct (relative to tests/test_suites/), the test script exists with all required infrastructure variables (NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS, NUM_MINUTES), and it properly invokes the training entrypoint with uv run.

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml (1)

29-29: tulu3_sft_mixture dataset is properly registered and accessible.

The dataset name change in the recipe is correct. The Tulu3SftMixtureDataset class is properly registered in the dataset factory (nemo_rl/data/datasets/response_datasets/__init__.py) and the corresponding format_tulu3_sft_mixture formatting function is implemented in the codebase. The configuration change will work as intended.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian force-pushed the ruit/enable_lora_nightly_test branch from 6a45aac to 0a8db04 Compare December 14, 2025 09:48

RayenTian self-assigned this Dec 14, 2025

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 14, 2025

RayenTian temporarily deployed to nemo-ci December 14, 2025 10:16 — with GitHub Actions Inactive

RayenTian marked this pull request as ready for review December 14, 2025 10:24

RayenTian requested review from a team as code owners December 14, 2025 10:24

RayenTian requested review from joyang-nv, terrykong and yuki-97 December 14, 2025 10:25

RayenTian temporarily deployed to nemo-ci December 14, 2025 10:53 — with GitHub Actions Inactive

terrykong previously approved these changes Dec 14, 2025

View reviewed changes

RayenTian added 2 commits December 14, 2025 18:28

enable lora nightly test

660f949

Signed-off-by: ruit <ruit@nvidia.com>

add execution mode

7cfb8cc

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian dismissed terrykong’s stale review via 7cfb8cc December 15, 2025 02:28

RayenTian force-pushed the ruit/enable_lora_nightly_test branch from 0a8db04 to 7cfb8cc Compare December 15, 2025 02:28

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 15, 2025

RayenTian temporarily deployed to nemo-ci December 15, 2025 02:28 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 15, 2025 02:32 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 15, 2025 04:20 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) December 15, 2025 05:27

terrykong approved these changes Dec 15, 2025

View reviewed changes

terrykong merged commit a010564 into main Dec 15, 2025
37 of 38 checks passed

terrykong deleted the ruit/enable_lora_nightly_test branch December 15, 2025 06:19

coderabbitai bot mentioned this pull request Dec 23, 2025

feat: Add Nemotron‑3 Nano 30B A3B BF16 SFT nightly tests (FSDP2, +LoRA) #1648

Merged

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

chore: Enable LoRA Nightly Test (NVIDIA-NeMo#1634)

7e8ba4f

Signed-off-by: ruit <ruit@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026

chore: Enable LoRA Nightly Test (NVIDIA-NeMo#1634)

2ad293b

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

chore: Enable LoRA Nightly Test (NVIDIA-NeMo#1634)

c26200a

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026

chore: Enable LoRA Nightly Test (NVIDIA-NeMo#1634)

e837538

Signed-off-by: ruit <ruit@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: Enable LoRA Nightly Test (#1634)

c3cde03

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: Enable LoRA Nightly Test (#1634)

c939443

Signed-off-by: ruit <ruit@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

chore: Enable LoRA Nightly Test (#1634)

40c2c55

Signed-off-by: ruit <ruit@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Enable LoRA Nightly Test#1634

chore: Enable LoRA Nightly Test#1634
terrykong merged 2 commits intomainfrom
ruit/enable_lora_nightly_test

RayenTian commented Dec 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Dec 14, 2025

Uh oh!

coderabbitai bot commented Dec 14, 2025

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RayenTian commented Dec 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Result

Summary by CodeRabbit

Uh oh!

github-actions bot commented Dec 14, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

coderabbitai bot commented Dec 14, 2025

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RayenTian commented Dec 14, 2025 •

edited by coderabbitai bot

Loading