Skip to content

chore: Enable LoRA Nightly Test#1634

Merged
terrykong merged 2 commits intomainfrom
ruit/enable_lora_nightly_test
Dec 15, 2025
Merged

chore: Enable LoRA Nightly Test#1634
terrykong merged 2 commits intomainfrom
ruit/enable_lora_nightly_test

Conversation

@RayenTian
Copy link
Contributor

@RayenTian RayenTian commented Dec 14, 2025

Summary

This PR enables the LoRA (Low-Rank Adaptation) nightly test that was previously disabled due to dataset compatibility issues. Now that the Tulu3 SFT mixture dataset is properly supported, we can re-enable this important test coverage.

Changes

  1. Enable LoRA Configuration

    • Renamed sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml.disabled sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
    • Updated dataset configuration from tulu3 to tulu3_sft_mixture to use the correct dataset implementation
  2. Enable LoRA Test Script

    • Renamed sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh.disabledsft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh
  3. Update Nightly Test Suite

    • Added tests/test_suites/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh to nightly.txt
    • Removed the comment explaining why the test was disabled
  4. Minor Cleanup

    • Removed redundant debug print statement in dataset loading code

Result

image image

Summary by CodeRabbit

  • Configuration Updates

    • Updated dataset reference in the Llama 3.1 8B SFT model configuration
  • Chores

    • Removed diagnostic message from data loading process
    • Enabled SFT model test in validation suite

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

❌ Submodule Fast-Forward Check Failed

Check based on commit: 6a45aac (PR #1634 from ruit/enable_lora_nightly_test)

❌ Submodules that need attention:

Gym: ❌ PR branch is BEHIND main branch
TARGET (main branch): https://github.com/NVIDIA-NeMo/Gym/commits/39ee39e35bfedef86ccf8d1ada9835ab8f62d267/
CURRENT (PR #1634 from ruit/enable_lora_nightly_test): https://github.com/NVIDIA-NeMo/Gym/commits/035c91e4b5b74598c1a313a28980926e1c2b8439/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

@RayenTian RayenTian force-pushed the ruit/enable_lora_nightly_test branch from 6a45aac to 0a8db04 Compare December 14, 2025 09:48
@RayenTian RayenTian self-assigned this Dec 14, 2025
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 14, 2025
@RayenTian RayenTian marked this pull request as ready for review December 14, 2025 10:24
@RayenTian RayenTian requested review from a team as code owners December 14, 2025 10:24
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 14, 2025

📝 Walkthrough

Walkthrough

These changes enable tulu3_sft_mixture dataset support for Llama 3.1 8B SFT training configuration. The configuration file references the updated dataset name, a diagnostic print statement is removed from dataset loading code, and the corresponding test is uncommented to validate the configuration.

Changes

Cohort / File(s) Summary
Dataset configuration and support
examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml, nemo_rl/data/datasets/response_datasets/__init__.py
Updated data.dataset_name from tulu3 to tulu3_sft_mixture in the configuration. Removed a runtime diagnostic print statement from the tulu3_sft_mixture dataset loading path.
Test enablement
tests/test_suites/nightly.txt
Uncommented the lora test entry for sft-llama3.1-8b-1n8g-fsdp2tp1-lora.sh, enabling the test that was previously disabled due to Tulu3 dataset limitations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~5 minutes

Suggested reviewers

  • parthchadha
  • yfw

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: enabling the LoRA nightly test that was previously disabled due to dataset compatibility issues.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes ✅ Passed PR contains only 4 minimal changes: dataset name update, debug print removal, test re-enablement, and test script re-enabling. No new features, breaking changes, algorithmic modifications, or performance impacts.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ruit/enable_lora_nightly_test

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5e3c0e2 and 0a8db04.

📒 Files selected for processing (3)
  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml (1 hunks)
  • nemo_rl/data/datasets/response_datasets/__init__.py (0 hunks)
  • tests/test_suites/nightly.txt (1 hunks)
💤 Files with no reviewable changes (1)
  • nemo_rl/data/datasets/response_datasets/init.py
🧰 Additional context used
📓 Path-based instructions (4)
tests/test_suites/nightly.txt

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Files:

  • tests/test_suites/nightly.txt
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • tests/test_suites/nightly.txt
  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
🧠 Learnings (5)
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/nightly.txt : When adding a nightly test for a new model, append the driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/test_suites/nightly.txt
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes

Applied to files:

  • examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml
🔇 Additional comments (2)
tests/test_suites/nightly.txt (1)

74-74: Test entry correctly uncommented in nightly registry.

The path format is correct (relative to tests/test_suites/), the test script exists with all required infrastructure variables (NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS, NUM_MINUTES), and it properly invokes the training entrypoint with uv run.

examples/configs/recipes/llm/sft-llama3.1-8b-1n8g-fsdp2tp1-lora.yaml (1)

29-29: tulu3_sft_mixture dataset is properly registered and accessible.

The dataset name change in the recipe is correct. The Tulu3SftMixtureDataset class is properly registered in the dataset factory (nemo_rl/data/datasets/response_datasets/__init__.py) and the corresponding format_tulu3_sft_mixture formatting function is implemented in the codebase. The configuration change will work as intended.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

terrykong
terrykong previously approved these changes Dec 14, 2025
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian force-pushed the ruit/enable_lora_nightly_test branch from 0a8db04 to 7cfb8cc Compare December 15, 2025 02:28
@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 15, 2025
@terrykong terrykong enabled auto-merge (squash) December 15, 2025 05:27
@terrykong terrykong merged commit a010564 into main Dec 15, 2025
37 of 38 checks passed
@terrykong terrykong deleted the ruit/enable_lora_nightly_test branch December 15, 2025 06:19
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: ruit <ruit@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 12, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: ruit <ruit@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: ruit <ruit@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: ruit <ruit@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants