Skip to content

chore: add a research template project#1278

Merged
terrykong merged 6 commits intomainfrom
tk/research
Nov 23, 2025
Merged

chore: add a research template project#1278
terrykong merged 6 commits intomainfrom
tk/research

Conversation

@terrykong
Copy link
Collaborator

@terrykong terrykong commented Oct 5, 2025

This PR introduces a research/ directory and a sample template project that also serves as an exemplar of how to construct a loop using nemo-rl utilities. It also doesn't introduce any data module so it is as minimal as possible.

The project is a "uv workspace" meaning it shares the global uv.lock; this is necessary to ensure we can make sure each research project dependencies are aware of nemo-rl's dependencies. The dependency chain is the research project depends on nemo_rl as a direct dependency.

Summary by CodeRabbit

  • New Features
    • Introduced a research template project with example configs and a minimal train-and-generate workflow.
    • Added data utilities and Python version pinning for the template.
  • Documentation
    • Added comprehensive research guidelines and template project README.
  • Tests
    • Added unit and functional tests for the template project.
    • Integrated research tests into existing test suites.
  • Chores
    • Updated workspace and ownership settings.
    • Improved environment consistency by standardizing project-root execution for tooling.
    • Expanded copyright tooling to include research files and skip empty files.

@euronymous-aithal
Copy link
Contributor

@terrykong is there a general guidelines/requirements we can provide. are we expecting code review, tests and docs requirements are same for research contributions ?

@terrykong
Copy link
Collaborator Author

@euronymous-aithal let me add some requirements and some expectations

@github-actions github-actions bot added the CI Relating to CI label Oct 6, 2025
@terrykong
Copy link
Collaborator Author

@euronymous-aithal see cf0cbf2

Copy link
Contributor

@euronymous-aithal euronymous-aithal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to move "Expectations for Research Project Authors" this section under research ?
i would imagine "research" should have a readme with all the info/guidelines.
the template-project should be an example with those best practices. authors can just clone the template-project ?

@euronymous-aithal
Copy link
Contributor

looks good otherwise @terrykong

Copy link
Contributor

@odelalleau odelalleau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a few comments

@terrykong
Copy link
Collaborator Author

@euronymous-aithal 4f0d5c8

@terrykong terrykong marked this pull request as ready for review October 10, 2025 07:14
@terrykong terrykong requested review from a team as code owners October 10, 2025 07:14
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 10, 2025
@terrykong terrykong enabled auto-merge (squash) October 10, 2025 07:15
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

📝 Walkthrough

Walkthrough

Adds a research template project with configs, code, tests, and docs; wires it into the workspace and ownership; updates uv commands to pin project directory; extends CI scripts to discover and run research tests; and updates tooling to include research files in copyright checks.

Changes

Cohort / File(s) Summary
Ownership and workspace
.github/CODEOWNERS, pyproject.toml
Adds CODEOWNERS entry for /research/template_project; includes research/template_project in uv workspace members.
uv execution directory flags
nemo_rl/distributed/virtual_cluster.py, nemo_rl/utils/venvs.py
Appends --directory {git_root} to uv run/sync commands for BASE, VLLM, AUTOMODEL, MCORE, and venv sync.
Research docs
research/README.md
New contribution and testing guidelines for research folder.
Template project scaffolding
research/template_project/.python-version, research/template_project/README.md, research/template_project/pyproject.toml
Adds Python version pin, project README, and pyproject with metadata, deps, and pytest config.
Template configs
research/template_project/configs/grpo_math_1B.yaml, research/template_project/configs/recipes/llm/single_update_1n8g.yaml
Introduces GRPO and LLM recipe YAML configurations, including policy, generation, batching, and cluster settings.
Template code
research/template_project/single_update.py, research/template_project/template_project/data_utils.py
Adds minimal GRPO workflow script and a helper to build BatchedDataDict from sentences.
Template tests (functional/unit/suites)
research/template_project/tests/functional/single_update.sh, research/template_project/tests/test_suites/llm/common.env, research/template_project/tests/test_suites/llm/single_update_1n8g.sh, research/template_project/tests/unit/test_data_utils.py
Adds functional runner, shared env utilities, suite script, and unit tests for create_batch_from.
CI test discovery
tests/functional/L1_Functional_Tests_GPU.sh, tests/unit/L0_Unit_Tests_Other.sh
Extends scripts to auto-discover and run research functional and unit tests.
Tooling
tools/copyright.sh
Includes research/ in scan, skips empty files, and updates header filtering/output logic.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI (single_update.py)
  participant CFG as Config Loader
  participant VC as VirtualCluster (Ray)
  participant GEN as Generation (vLLM)
  participant POL as Policy (LM)

  User->>CLI: Launch with config + overrides
  CLI->>CFG: Load MasterConfig
  CLI->>VC: Init cluster (gpus_per_node, nodes)
  CLI->>GEN: Init VLLM generation (cluster)
  CLI->>POL: Init policy (+ reference)
  note over GEN,POL: Prepare/Refit before training

  loop For N tiny iterations
    CLI->>GEN: Refit weights from POL
    CLI->>GEN: Generate outputs (prompts)
    GEN-->>CLI: Token ids / texts
    CLI->>POL: Prepare for training
    CLI->>POL: Train on tiny batch (NLL)
  end

  CLI->>GEN: Shutdown
  CLI->>POL: Shutdown
  CLI->>VC: Shutdown
  CLI-->>User: Done
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.57% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR introduces a substantial new research template project along with associated configs, scripts, and documentation, which qualifies as a major change under the check criteria, yet the PR description does not mention any executed tests or results, so the requirement to document testing information is unmet. Please update the PR description to summarize the relevant test coverage, including the commands run and their outcomes (e.g., unit, functional, or other validation), so reviewers can confirm the new template was exercised.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title 'chore: add a research template project' accurately and concisely summarizes the main change of introducing a new research template project in the repository. It follows conventional commit style and avoids unnecessary detail while clearly conveying intent. A reviewer scanning the history will immediately understand the primary addition implemented by this PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/research

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 10, 2025

📝 Walkthrough

Walkthrough

Adds a research template project with code, configs, tests, and docs; wires it into the workspace and CODEOWNERS; updates uv invocation to run in the repo root; includes new functional/unit test discovery for research projects; and extends copyright tooling.

Changes

Cohort / File(s) Summary
Governance & Workspace
.github/CODEOWNERS, pyproject.toml
Add code ownership for research/template_project and include research/template_project as a uv workspace member.
Core runtime invocation
nemo_rl/distributed/virtual_cluster.py, nemo_rl/utils/venvs.py
Prefix uv commands with --directory {git_root} for run/sync; update PY_EXECUTABLES constants accordingly.
Research docs
research/README.md, research/template_project/README.md
Add guidelines and template project README documentation.
Template project scaffolding
research/template_project/.python-version, research/template_project/pyproject.toml
Specify Python 3.12 and project pyproject with deps, test config, and workspace integration.
Template project configs
research/template_project/configs/grpo_math_1B.yaml, research/template_project/configs/recipes/llm/single_update_1n8g.yaml
Add GRPO configuration and a recipe setting 1 node with 8 GPUs.
Template project code
research/template_project/single_update.py, research/template_project/template_project/data_utils.py
Add minimal GRPO single-update script and data utility to build batched tensors from sentences.
Template project tests — functional
research/template_project/tests/functional/single_update.sh
Add functional test script running a single iteration and logging outputs.
Template project tests — suites
research/template_project/tests/test_suites/llm/common.env, research/template_project/tests/test_suites/llm/single_update_1n8g.sh
Add suite environment bootstrap and a single-update test that runs, logs, emits metrics, and validates success.
Template project tests — unit
research/template_project/tests/unit/test_data_utils.py
Add unit tests for create_batch_from, checking shapes, dtypes, and lengths.
Repo test harness updates
tests/functional/L1_Functional_Tests_GPU.sh, tests/unit/L0_Unit_Tests_Other.sh
Auto-discover and run research functional and unit tests via uv in each research project.
Tooling
tools/copyright.sh
Include research/ in scan, skip empty files, and print first non-comment line for each file.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as User/CI
    participant S as single_update.py
    participant C as Ray Cluster
    participant G as vLLM Generator
    participant P as LM Policy
    participant D as Data Utils

    U->>S: Launch with config (YAML + overrides)
    S->>C: Initialize cluster (Ray)
    S->>P: Initialize policy (weights, tokenizer)
    S->>G: Initialize generation backend (vLLM)

    rect rgb(245,248,255)
    note right of S: Iterative single-update cycle
    loop For i in 1..N
        S->>G: Refit with latest policy weights
        G-->>S: Ready

        S->>G: Generate outputs for prompts
        G-->>S: Completions

        S->>D: create_batch_from(sentences)
        D-->>S: BatchedDataDict

        S->>P: Train step (tiny NLL batch)
        P-->>S: Loss/metrics
    end
    end

    S-->>G: Shutdown
    S-->>C: Shutdown
    S-->>U: All done.
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

Suggested reviewers

  • chtruong814

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning The PR introduces a substantial new research template project, associated configurations, documentation, and updates to shared test runners, which constitutes a major change, yet the PR description retrieved via gh pr view contains no testing or performance results. Because significant additions lack documented test evidence, the custom check requirement is not met. Please update the PR description to summarize the tests run (e.g., unit, functional, integration) and their outcomes, or provide relevant performance or convergence data if applicable, so the change complies with the major-change testing requirement.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "chore: add a research template project" succinctly captures the primary change of introducing a new research template, aligning closely with the pull request’s main objective without extraneous details.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/research

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
research/template_project/README.md (1)

110-114: Add language specifier to code block.

The contact information block should specify a language identifier for proper rendering, even though it contains placeholder text.

Apply this diff:

-```
+```text
 Author: AUTHOR NAMES HERE
 Email: AUTHOR EMAILS HERE
 Organization: ORGANIZATION HERE (optional)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)

12-13: Remove or document unused variables.

NUM_RUNS and NUM_MINUTES are defined but never used in the script. If they're intended for external tooling (e.g., SLURM job configuration via tools/launch), add a comment explaining their purpose. Otherwise, remove them to reduce confusion.

If these are for external tooling, add a comment:

 NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))  # Round up
-NUM_MINUTES=30
+NUM_MINUTES=30  # Used by tools/launch for SLURM time allocation

Or remove them if truly unused:

-NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))  # Round up
-NUM_MINUTES=30
research/template_project/tests/unit/test_data_utils.py (1)

101-113: Consider simplifying the test logic.

The test recreates the tokenizer and re-tokenizes the sentences to verify input_lengths, which duplicates the implementation logic. While this works, consider using manually verified examples for more independent validation.

Example alternative approach:

def test_create_batch_from_input_lengths(tokenizer):
    """Test that input_lengths correctly represent non-padded token counts."""
    # Use sentences with known token counts for more explicit verification
    sentences = ["a b c", "x y"]
    
    batch = create_batch_from(tokenizer, sentences)
    
    # Verify the first sequence has more tokens than the second
    assert batch["input_lengths"][0] > batch["input_lengths"][1]
    
    # Optionally verify against known tokenization if deterministic
    # e.g., manually check that "a b c" tokenizes to N tokens with this tokenizer
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c574d0 and baa7cf9.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/CODEOWNERS (1 hunks)
  • nemo_rl/distributed/virtual_cluster.py (1 hunks)
  • nemo_rl/utils/venvs.py (1 hunks)
  • pyproject.toml (1 hunks)
  • research/README.md (1 hunks)
  • research/template_project/.python-version (1 hunks)
  • research/template_project/README.md (1 hunks)
  • research/template_project/configs/grpo_math_1B.yaml (1 hunks)
  • research/template_project/configs/recipes/llm/single_update_1n8g.yaml (1 hunks)
  • research/template_project/pyproject.toml (1 hunks)
  • research/template_project/single_update.py (1 hunks)
  • research/template_project/template_project/data_utils.py (1 hunks)
  • research/template_project/tests/functional/single_update.sh (1 hunks)
  • research/template_project/tests/test_suites/llm/common.env (1 hunks)
  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1 hunks)
  • research/template_project/tests/unit/test_data_utils.py (1 hunks)
  • tests/functional/L1_Functional_Tests_GPU.sh (1 hunks)
  • tests/unit/L0_Unit_Tests_Other.sh (1 hunks)
  • tools/copyright.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tools/copyright.sh
  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh
  • research/template_project/tests/functional/single_update.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tests/unit/L0_Unit_Tests_Other.sh
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/utils/venvs.py
  • research/template_project/single_update.py
  • nemo_rl/distributed/virtual_cluster.py
  • research/template_project/tests/unit/test_data_utils.py
  • research/template_project/template_project/data_utils.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/utils/venvs.py
  • nemo_rl/distributed/virtual_cluster.py
🧬 Code graph analysis (4)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)
research/template_project/tests/test_suites/llm/common.env (1)
  • exit_if_max_steps_reached (15-23)
tests/test_suites/llm/common.env (1)
  • exit_if_max_steps_reached (12-20)
research/template_project/single_update.py (5)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
nemo_rl/algorithms/grpo.py (1)
  • refit_policy_generation (498-566)
nemo_rl/distributed/virtual_cluster.py (1)
  • init_ray (84-170)
nemo_rl/models/generation/__init__.py (1)
  • configure_generation_config (24-45)
nemo_rl/models/policy/lm_policy.py (1)
  • Policy (59-806)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
research/template_project/tests/unit/test_data_utils.py (1)
  • tokenizer (23-27)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md

110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py

98-98: subprocess call: check for execution of untrusted input

(S603)


98-98: Starting a process with a partial executable path

(S607)

🪛 Shellcheck (0.11.0)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh

[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 33-33: Double quote array expansions to avoid re-splitting elements.

(SC2068)

research/template_project/tests/functional/single_update.sh

[error] 27-27: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/functional/L1_Functional_Tests_GPU.sh

[warning] 36-36: Quote this to prevent word splitting.

(SC2046)

tests/unit/L0_Unit_Tests_Other.sh

[warning] 41-41: Quote this to prevent word splitting.

(SC2046)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (21)
research/template_project/.python-version (1)

1-1: LGTM!

The Python version specification aligns with the project's requires-python = ">=3.12" requirement in pyproject.toml.

research/template_project/configs/recipes/llm/single_update_1n8g.yaml (1)

1-6: LGTM!

The configuration correctly references a parent config and overrides the GPU allocation. The structure follows standard Hydra conventions.

.github/CODEOWNERS (1)

62-63: LGTM!

The CODEOWNERS entry correctly assigns ownership of the template project to @terrykong, following the established pattern in the file.

nemo_rl/utils/venvs.py (1)

98-98: LGTM!

The addition of --directory ensures UV operates from the repository root, which is necessary for the workspace configuration to function correctly with research projects sharing the global uv.lock.

The static analysis warnings (S603, S607) are false positives: git_root is computed from the module's file path, not user input, and uv is a well-known build tool.

pyproject.toml (1)

171-176: LGTM!

The workspace configuration correctly includes the research template project, and the detailed inline comments clearly explain the rationale for sharing the global uv.lock (avoiding numpy version mismatches that cause pickle issues with Ray).

research/template_project/pyproject.toml (1)

1-31: LGTM!

The template project configuration is well-structured:

  • Correctly declares nemo-rl as a workspace dependency
  • Includes appropriate test tooling
  • Follows the pattern established in the main project
research/README.md (1)

1-50: Comprehensive documentation for research projects.

This documentation provides clear guidance for research project authors, covering acceptance criteria, testing expectations, and code ownership. The structure is well-organized and addresses key concerns about maintainability and reproducibility.

research/template_project/tests/test_suites/llm/common.env (1)

1-40: LGTM!

The environment setup script is well-structured with clear separation of concerns. The exit_if_max_steps_reached function provides efficient early stopping for incremental test runs, and the path initialization properly handles the project root context.

research/template_project/template_project/data_utils.py (1)

36-38: Verify token_mask initialization aligns with the comment.

The comment states "use the attention mask as token_mask," but the implementation creates token_mask as ones for all positions. If the intent is to apply loss only to non-padded tokens, consider using the attention mask directly. If the current implementation is correct (e.g., because NLLLoss handles masking separately), please update the comment to clarify the behavior.

If token_mask should reflect padding, apply this diff:

-    # For simple NLL training, use the attention mask as token_mask
+    # For simple NLL training, token_mask marks all positions
     # (loss will be applied to positions 1..len-1 via NLLLoss)
     token_mask = torch.ones_like(input_ids)

Or if the attention mask should be used:

-    # For simple NLL training, use the attention mask as token_mask
-    # (loss will be applied to positions 1..len-1 via NLLLoss)
-    token_mask = torch.ones_like(input_ids)
+    # For simple NLL training, use the attention mask as token_mask
+    # to exclude padding tokens from loss computation
+    token_mask = attention_mask.to(torch.int64)
research/template_project/configs/grpo_math_1B.yaml (1)

1-254: LGTM!

The configuration file is comprehensive and well-documented, with clear sections for all major components (GRPO algorithm, loss function, checkpointing, policy, data, environment, logging, and cluster). The inline comments provide helpful context for key parameters like async GRPO and importance sampling corrections.

nemo_rl/distributed/virtual_cluster.py (1)

46-58: LGTM!

The addition of --directory {git_root} to all uv executable definitions ensures consistent execution context for workspace projects. This change properly supports the research template project structure introduced in this PR.

research/template_project/tests/unit/test_data_utils.py (5)

22-27: LGTM!

The tokenizer fixture is well-structured. Setting pad_token = eos_token is the standard approach for GPT-2, which doesn't have a dedicated padding token.


30-43: LGTM!

The test correctly verifies the presence of all required keys and their shapes for a single-sentence batch.


45-54: LGTM!

The test properly validates batch dimensions and shape consistency for multiple sentences.


69-78: LGTM!

The dtype validations correctly match the implementation in data_utils.py.


81-98: LGTM!

Both mask tests correctly verify that the masks are initialized to all ones with proper shapes.

research/template_project/single_update.py (5)

1-46: LGTM!

The copyright header, docstring, and imports are all properly structured. The OmegaConf "mul" resolver registration is a good practice for supporting config-based arithmetic.


49-94: LGTM!

The initialization sequence is well-structured:

  • VllmGeneration is initialized before Policy to ensure a clean GPU environment
  • The finish_generation() call pre-initializes workers to avoid later contention
  • The state_dict_info or {} fallback safely handles None returns

96-117: Verify train_sentences pattern matches expectations.

Lines 98-100 create train_sentences by repeating the 2-sentence pattern train_global_batch_size times. This means:

  • If train_global_batch_size=1: 2 sentences
  • If train_global_batch_size=2: 4 sentences

Ensure this aligns with your intended batch construction, as the multiplier doesn't directly correspond to the final batch size. The pattern is multiplied rather than used as the target count.


122-163: LGTM!

The training loop correctly implements the refit → generate → train cycle:

  • Refit ensures the generation engine has the latest policy weights
  • Generation runs with greedy=True for deterministic demonstration
  • Training properly extracts and logs the loss
  • All components are properly shut down at the end

166-201: LGTM!

The argument parsing and main entry point are well-designed:

  • parse_known_args() allows for flexible Hydra-style overrides
  • Default config path is set appropriately
  • Config resolution and pretty-printing aid debugging

@terrykong terrykong force-pushed the tk/research branch 2 times, most recently from d5c513e to 3c3ab37 Compare October 10, 2025 19:16
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (5)
tools/copyright.sh (1)

22-28: Replace echo -e with printf to avoid mangling headers.

Interpreting backslash escapes breaks the captured first line (e.g., \n, \t, \r, \c) and can hide real violations. Keep the output literal with printf.

Apply this diff:

-    echo -e "$path\t$first_line"
+    printf '%s\t%s\n' "$path" "$first_line"
tests/unit/L0_Unit_Tests_Other.sh (1)

39-45: Quote variables to prevent word splitting.

Lines 41-42 have unquoted command substitutions and variables that should be quoted to prevent word splitting if paths contain spaces. This issue was already identified in a previous review.

Apply this diff to fix the quoting issues:

-    project_dir=$(dirname $(dirname $i))
-    pushd $project_dir
+    project_dir=$(dirname "$(dirname "$i")")
+    pushd "$project_dir"
tests/functional/L1_Functional_Tests_GPU.sh (1)

34-40: Quote all variable expansions and command substitutions to prevent word splitting.

Lines 36-38 have multiple unquoted variables and command substitutions that could cause issues with paths containing spaces or special characters.

Apply this diff to add proper quoting:

 # Research functional tests (self-discovery)
 for test_script in research/*/tests/functional/*.sh; do
-    project_dir=$(dirname $(dirname $test_script))
-    pushd $project_dir
-    time uv run --no-sync bash $test_script
+    project_dir=$(dirname "$(dirname "$test_script")")
+    pushd "$project_dir"
+    time uv run --no-sync bash "$test_script"
     popd
 done
research/template_project/tests/functional/single_update.sh (1)

14-28: Quote variable expansions throughout the script to prevent word splitting.

Multiple lines use unquoted variables that could cause issues with paths containing spaces or special characters.

Apply this diff:

-rm -rf $EXP_DIR $LOG_DIR
-mkdir -p $EXP_DIR $LOG_DIR
+rm -rf "$EXP_DIR" "$LOG_DIR"
+mkdir -p "$EXP_DIR" "$LOG_DIR"
 
-cd $PROJECT_ROOT
+cd "$PROJECT_ROOT"
 
 # Run single_update.py for just 1 iteration
 SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \
     single_update.py \
-    --config $PROJECT_ROOT/configs/grpo_math_1B.yaml \
+    --config "$PROJECT_ROOT/configs/grpo_math_1B.yaml" \
     cluster.gpus_per_node=1 \
     cluster.num_nodes=1 \
     policy.train_global_batch_size=1 \
     policy.train_micro_batch_size=1 \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    "$@" \
+    2>&1 | tee "$RUN_LOG"

Note: Line 27 was already flagged in a previous review.

research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)

24-34: Add error handling for directory change and quote all variable expansions.

Line 24 uses cd without error handling, and lines 24, 30, 33-34 have unquoted variables that could cause word splitting issues.

Apply this diff:

 # Run the experiment
-cd $PROJECT_ROOT
+cd "$PROJECT_ROOT" || exit 1
 
 # Set the number of iterations via environment variable
 NRL_FORCE_REBUILD_VENVS=true \
 SINGLE_UPDATE_ITERS=$MAX_STEPS \
 uv run single_update.py \
     --config configs/grpo_math_1B.yaml \
     cluster.gpus_per_node=8 \
-    cluster.num_nodes=$NUM_NODES \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    cluster.num_nodes="$NUM_NODES" \
+    "$@" \
+    2>&1 | tee "$RUN_LOG"

Note: Lines 24 and 33 were already flagged in previous reviews.

🧹 Nitpick comments (10)
research/template_project/README.md (1)

110-114: Add language identifier to fenced code block.

The author/citation information block should specify a language identifier for proper rendering. As per coding guidelines, this helps with syntax highlighting and documentation consistency.

Apply this diff:

-```
+```text
 Author: AUTHOR NAMES HERE
 Email: AUTHOR EMAILS HERE
 Organization: ORGANIZATION HERE (optional)

</blockquote></details>
<details>
<summary>research/README.md (1)</summary><blockquote>

`26-26`: **Clarify acceptance criteria language.**

The phrase "results outlined in this README" is confusing since this is the general research directory README, not a project-specific README. Each project has its own README where results would be documented.

Consider rephrasing for clarity:



```diff
-The acceptance criteria for merging your research project into the main repository are reproduction steps for the results outlined in this README. We want to make sure others can reproduce your great work! Please include sufficient documentation in the README.md that enables users to follow and reproduce your results step-by-step.
+The acceptance criteria for merging your research project into the main repository are reproduction steps for the results outlined in your project's README. We want to make sure others can reproduce your great work! Please include sufficient documentation in your project's README.md that enables users to follow and reproduce your results step-by-step.
research/template_project/tests/unit/test_data_utils.py (1)

108-110: Use the tokenizer fixture instead of creating a new tokenizer.

Lines 108-110 duplicate the tokenizer setup logic that's already available via the tokenizer fixture. This reduces maintainability.

Apply this diff to use the fixture:

     batch = create_batch_from(tokenizer, sentences)
 
-    # Compute expected lengths from attention mask
-    tok = AutoTokenizer.from_pretrained("gpt2")
-    tok.pad_token = tok.eos_token
-    enc = tok(sentences, add_special_tokens=False, return_tensors="pt", padding=True)
+    # Compute expected lengths from attention mask using the fixture
+    enc = tokenizer(sentences, add_special_tokens=False, return_tensors="pt", padding=True)
     expected_lengths = enc["attention_mask"].sum(dim=1).to(torch.int32)
 
     assert torch.all(batch["input_lengths"] == expected_lengths)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)

12-13: Remove unused variables or document their purpose.

NUM_RUNS and NUM_MINUTES are defined but never used in the script. Based on the static analysis hint and code inspection, these appear to be leftover from a template or intended for documentation purposes.

If these variables serve as documentation for the typical structure of test suites, consider adding a comment explaining this. Otherwise, remove them to reduce confusion:

-NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))  # Round up
-NUM_MINUTES=30
research/template_project/single_update.py (2)

49-164: Add error handling to ensure resource cleanup on failure.

The main function doesn't have try-except blocks around the training loop. If any step fails, the shutdown calls (lines 161-163) won't execute, potentially leaving Ray workers and GPU resources allocated.

For a more robust demonstration, consider adding error handling:

 def main(config: MasterConfig) -> None:
     # 0) Config
     policy_config = config["policy"]
     tokenizer = get_tokenizer(policy_config["tokenizer"])
     policy_config["generation"] = configure_generation_config(
         policy_config["generation"], tokenizer
     )
 
     # 1) Set up compute cluster (single GPU for demo)
     print("\n▶ Setting up compute cluster...")
     init_ray()
     cluster = RayVirtualCluster(
         name="single_update_cluster",
         bundle_ct_per_node_list=[config["cluster"]["gpus_per_node"]]
         * config["cluster"]["num_nodes"],
         use_gpus=True,
         num_gpus_per_node=config["cluster"]["gpus_per_node"],
         max_colocated_worker_groups=1
         if policy_config["generation"]["backend"] == "megatron"
         else 2,
     )
 
+    try:
         # 2) Initialize vLLM generation first for a clean GPU environment
         print("\n▶ Initializing vLLM generation...")
         # Initialize vLLM directly from config
         policy_config["generation"]["model_name"] = policy_config["model_name"]
         policy_generation = VllmGeneration(
             cluster=cluster, config=policy_config["generation"]
         )
         # Pre-initialize workers to avoid contention later
         policy_generation.finish_generation()
         print("  ✓ vLLM generation ready")
 
         # 3) Initialize policy (LM)
         print("\n▶ Initializing LM Policy...")
         policy = Policy(
             cluster=cluster,
             config=policy_config,
             tokenizer=tokenizer,
             init_reference_model=False,
         )
         print("  ✓ Policy created")
 
         # ... rest of the training loop ...
         
         print("\nAll done.")
+    finally:
         policy.shutdown()
         policy_generation.shutdown()
         cluster.shutdown()

195-196: Clarify or relocate the "Applied CLI overrides" message.

Line 196 prints "Applied CLI overrides" after calling to_container(), but the overrides were actually applied at line 193. This placement could be confusing.

Consider moving the print statement or revising the message for clarity:

     if overrides:
         print(f"Overrides: {overrides}")
         config = parse_hydra_overrides(config, overrides)
+        print("Applied CLI overrides")
 
     config: MasterConfig = OmegaConf.to_container(config, resolve=True)
-    print("Applied CLI overrides")
+    print("Resolved configuration:")
     from rich.pretty import pprint
 
     pprint(config)
research/template_project/tests/test_suites/llm/common.env (3)

15-23: Document the MAX_STEPS requirement and improve robustness.

The function references MAX_STEPS, which must be set by the calling script. Consider adding a comment documenting this requirement. Additionally, quote $JSON_METRICS to safely handle paths with spaces.

Apply this diff to improve the function:

 exit_if_max_steps_reached() {
-  # Early stopping to save compute if max step has been reached
+  # Early stopping to save compute if max step has been reached.
+  # Requires: MAX_STEPS environment variable must be set by caller.
-  STEPS_SO_FAR=$(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS || echo 0)
+  STEPS_SO_FAR=$(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' "$JSON_METRICS" || echo 0)
   if [[ $STEPS_SO_FAR -ge $MAX_STEPS ]]; then
       echo "[INFO] Target step $MAX_STEPS reached, skipping run"
       exit 0
   fi
   echo "[INFO] Steps so far: $STEPS_SO_FAR, running till $MAX_STEPS steps"
 }

32-32: Quote PROJECT_ROOT for safety.

To handle edge cases with spaces or special characters in the path, quote the variable.

Apply this diff:

-export PYTHONPATH=${PROJECT_ROOT}:${PYTHONPATH:-}
+export PYTHONPATH="${PROJECT_ROOT}:${PYTHONPATH:-}"

39-39: Quote directory variables for safety.

To handle edge cases with spaces or special characters in paths, quote the variables.

Apply this diff:

-mkdir -p $EXP_DIR $LOG_DIR $CKPT_DIR
+mkdir -p "$EXP_DIR" "$LOG_DIR" "$CKPT_DIR"
research/template_project/configs/grpo_math_1B.yaml (1)

28-30: Consider adding validation for async GRPO configuration.

The configuration correctly notes that importance sampling correction is required when async GRPO is enabled, but there's no safeguard to prevent users from enabling async mode without enabling importance sampling. Consider adding a validation check in the code that consumes this config, or using a more explicit comment that emphasizes this is a hard requirement.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f29fa2a and e7def21.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/CODEOWNERS (1 hunks)
  • nemo_rl/distributed/virtual_cluster.py (1 hunks)
  • nemo_rl/utils/venvs.py (1 hunks)
  • pyproject.toml (1 hunks)
  • research/README.md (1 hunks)
  • research/template_project/.python-version (1 hunks)
  • research/template_project/README.md (1 hunks)
  • research/template_project/configs/grpo_math_1B.yaml (1 hunks)
  • research/template_project/configs/recipes/llm/single_update_1n8g.yaml (1 hunks)
  • research/template_project/pyproject.toml (1 hunks)
  • research/template_project/single_update.py (1 hunks)
  • research/template_project/template_project/data_utils.py (1 hunks)
  • research/template_project/tests/functional/single_update.sh (1 hunks)
  • research/template_project/tests/test_suites/llm/common.env (1 hunks)
  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1 hunks)
  • research/template_project/tests/unit/test_data_utils.py (1 hunks)
  • tests/functional/L1_Functional_Tests_GPU.sh (1 hunks)
  • tests/unit/L0_Unit_Tests_Other.sh (1 hunks)
  • tools/copyright.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tests/functional/L1_Functional_Tests_GPU.sh
  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh
  • tests/unit/L0_Unit_Tests_Other.sh
  • tools/copyright.sh
  • research/template_project/tests/functional/single_update.sh
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/distributed/virtual_cluster.py
  • research/template_project/template_project/data_utils.py
  • research/template_project/single_update.py
  • research/template_project/tests/unit/test_data_utils.py
  • nemo_rl/utils/venvs.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/distributed/virtual_cluster.py
  • nemo_rl/utils/venvs.py
🧠 Learnings (3)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

  • research/template_project/configs/recipes/llm/single_update_1n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.py : Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Applied to files:

  • research/template_project/tests/unit/test_data_utils.py
  • tools/copyright.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Applied to files:

  • tools/copyright.sh
🧬 Code graph analysis (4)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
research/template_project/tests/test_suites/llm/common.env (1)
  • exit_if_max_steps_reached (15-23)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
research/template_project/tests/unit/test_data_utils.py (1)
  • tokenizer (23-27)
research/template_project/single_update.py (10)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
nemo_rl/algorithms/grpo.py (1)
  • refit_policy_generation (498-566)
nemo_rl/algorithms/loss_functions.py (1)
  • NLLLoss (362-438)
nemo_rl/algorithms/utils.py (1)
  • get_tokenizer (157-288)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
nemo_rl/distributed/virtual_cluster.py (3)
  • RayVirtualCluster (177-435)
  • init_ray (84-170)
  • shutdown (407-426)
nemo_rl/models/generation/__init__.py (1)
  • configure_generation_config (24-45)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • VllmGeneration (47-838)
nemo_rl/models/policy/lm_policy.py (1)
  • Policy (59-806)
nemo_rl/utils/config.py (1)
  • parse_hydra_overrides (146-166)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md

110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py

98-98: subprocess call: check for execution of untrusted input

(S603)


98-98: Starting a process with a partial executable path

(S607)

🪛 Shellcheck (0.11.0)
tests/functional/L1_Functional_Tests_GPU.sh

[warning] 36-36: Quote this to prevent word splitting.

(SC2046)

research/template_project/tests/test_suites/llm/single_update_1n8g.sh

[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 33-33: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/unit/L0_Unit_Tests_Other.sh

[warning] 41-41: Quote this to prevent word splitting.

(SC2046)

research/template_project/tests/functional/single_update.sh

[error] 27-27: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (6)
nemo_rl/utils/venvs.py (1)

98-98: LGTM! Directory-scoped uv sync aligns with workspace pattern.

The addition of --directory git_root ensures the uv sync command operates from the repository root, which is consistent with the directory-scoped execution pattern introduced elsewhere in this PR (e.g., in virtual_cluster.py). The static analysis warnings about subprocess execution are false positives since git_root is computed from the module's file path, not from user input.

.github/CODEOWNERS (1)

62-63: LGTM! Ownership assignment is appropriate.

The new CODEOWNERS entry correctly assigns ownership of the template project to @terrykong, aligning with the PR context.

pyproject.toml (1)

176-181: LGTM! Workspace integration with clear rationale.

The addition of research/template_project to the workspace members enables shared dependency resolution and the global uv.lock. The detailed comment explaining the numpy version conflict is valuable context for future maintainers.

research/template_project/pyproject.toml (1)

1-31: LGTM! Well-structured project configuration.

The pyproject.toml follows standard patterns:

  • Uses hatchling for builds (consistent with project requirements)
  • Correctly declares nemo-rl as a workspace dependency
  • Includes appropriate test dependencies and pytest configuration
  • Specifies Python 3.12+ requirement matching the .python-version file

The configuration integrates properly with the workspace setup.

nemo_rl/distributed/virtual_cluster.py (1)

46-58: LGTM! Directory-scoped uv commands ensure consistent execution context.

The addition of --directory {git_root} to all PY_EXECUTABLES ensures that uv commands execute from the repository root, which is essential for:

  • Resolving the correct pyproject.toml and uv.lock
  • Ensuring workspace member visibility
  • Maintaining consistency with the multi-project workspace setup

This pattern aligns with the similar change in nemo_rl/utils/venvs.py and supports the new research project structure.

research/template_project/.python-version (1)

1-1: Python versions match; root and template .python-version both specify 3.12.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (8)
tests/unit/L0_Unit_Tests_Other.sh (1)

39-45: Quote command substitutions and variables to prevent word splitting.

Lines 41-42 should quote the nested dirname command substitutions and the project_dir variable to prevent potential word splitting issues if paths contain spaces.

Apply this diff:

-    project_dir=$(dirname $(dirname $i))
-    pushd $project_dir
+    project_dir=$(dirname "$(dirname "$i")")
+    pushd "$project_dir"
tools/copyright.sh (1)

27-28: Avoid echo -e; it interprets backslash escapes.

Using echo -e on line 28 interprets backslash escapes in the captured first line (e.g., \n, \t, \r, \c), which can cause misidentification of files with missing copyright headers.

Apply this diff to emit the content verbatim:

-    first_line=$(head -2 "$path" | grep -iv 'coding=' | head -1)
-    echo -e "$path\t$first_line"
+    first_line=$(head -2 "$path" | grep -iv 'coding=' | head -1)
+    printf '%s\t%s\n' "$path" "$first_line"
tests/functional/L1_Functional_Tests_GPU.sh (1)

34-40: Quote variable expansions to prevent word splitting.

Lines 36-38 contain unquoted command substitutions and variable references that can cause word splitting if paths contain spaces.

Apply this diff to fix the quoting issues:

 # Research functional tests (self-discovery)
 for test_script in research/*/tests/functional/*.sh; do
-    project_dir=$(dirname $(dirname $test_script))
-    pushd $project_dir
-    time uv run --no-sync bash $test_script
+    project_dir=$(dirname "$(dirname "$test_script")")
+    pushd "$project_dir"
+    time uv run --no-sync bash "$test_script"
     popd
 done
research/template_project/tests/unit/test_data_utils.py (2)

1-13: Remove copyright header from test files.

Per coding guidelines: test files should not include the copyright header. This is a test-only script, so the header should be removed.

Apply this diff:

-# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
 """Unit tests for template_project.data_utils."""

Based on coding guidelines.


57-66: Fix tautological assertion in padding test.

Line 64 compares batch["input_ids"].shape[1] to itself, which always passes. The comment states "All sequences should have the same length (padded to max)", but the assertion doesn't verify this.

Apply this diff to properly verify uniform padding:

 def test_create_batch_from_padding(tokenizer):
     """Test that create_batch_from correctly pads sequences."""
     sentences = ["short", "this is a much longer sentence"]
 
     batch = create_batch_from(tokenizer, sentences)
 
     # All sequences should have the same length (padded to max)
-    assert batch["input_ids"].shape[1] == batch["input_ids"].shape[1]
+    assert batch["input_ids"].shape[0] == 2  # Two sequences
+    # Both sequences should have the same padded length
+    max_len = batch["input_ids"].shape[1]
+    assert max_len > 0
     # Input lengths should reflect the actual (unpadded) lengths
     assert batch["input_lengths"][0] < batch["input_lengths"][1]
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)

24-24: Handle cd failure and ensure PROJECT_ROOT is set (duplicate of prior comment).

Make the directory change robust; provide a fallback if PROJECT_ROOT isn’t defined by common.env.

-# Run the experiment
-cd $PROJECT_ROOT
+# Run the experiment
+# Default PROJECT_ROOT to the template project if not provided by common.env
+: "${PROJECT_ROOT:=${NEMO_RL_ROOT}/research/template_project}"
+cd "$PROJECT_ROOT" || { echo "ERROR: cd to PROJECT_ROOT failed: $PROJECT_ROOT"; exit 1; }

29-46: Quote forwarded args and file paths; avoid re-splitting (duplicate for "$@").

Prevents argument mangling and path splitting (SC2068); also quote logs/paths.

-NRL_FORCE_REBUILD_VENVS=true \
-SINGLE_UPDATE_ITERS=$MAX_STEPS \
-uv run single_update.py \
+NRL_FORCE_REBUILD_VENVS=true \
+SINGLE_UPDATE_ITERS="$MAX_STEPS" \
+uv run single_update.py \
     --config configs/grpo_math_1B.yaml \
-    cluster.gpus_per_node=8 \
-    cluster.num_nodes=$NUM_NODES \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    cluster.gpus_per_node=8 \
+    cluster.num_nodes="$NUM_NODES" \
+    "$@" \
+    2>&1 | tee "$RUN_LOG"
@@
-if grep -q "All done." "$RUN_LOG"; then
-  echo '{"succeed": "yes"}' > $JSON_METRICS
+if grep -q "All done." "$RUN_LOG"; then
+  echo '{"succeed": "yes"}' > "$JSON_METRICS"
 else
-  echo '{"succeed": "no"}' > $JSON_METRICS
+  echo '{"succeed": "no"}' > "$JSON_METRICS"
 fi
@@
-uv run $NEMO_RL_ROOT/tests/check_metrics.py $JSON_METRICS \
+uv run "$NEMO_RL_ROOT/tests/check_metrics.py" "$JSON_METRICS" \
   'data["succeed"] == "yes"'
research/template_project/tests/functional/single_update.sh (1)

20-28: Quote forwarded args and file paths (duplicate for "$@").

Prevents re-splitting and path issues (SC2068).

-SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \
+SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file="$PROJECT_ROOT/tests/.coverage" --source="$PROJECT_ROOT/nemo_rl" \
     single_update.py \
-    --config $PROJECT_ROOT/configs/grpo_math_1B.yaml \
+    --config "$PROJECT_ROOT/configs/grpo_math_1B.yaml" \
     cluster.gpus_per_node=1 \
     cluster.num_nodes=1 \
     policy.train_global_batch_size=1 \
     policy.train_micro_batch_size=1 \
-    $@ \
-    2>&1 | tee $RUN_LOG
+    "$@" \
+    2>&1 | tee "$RUN_LOG"
🧹 Nitpick comments (8)
research/template_project/template_project/data_utils.py (1)

36-38: Clarify token_mask comment.

The comment states "use the attention mask as token_mask", but the code creates an all-ones mask instead. Consider clarifying that this creates a full mask (all tokens participate in loss) for simple NLL training, rather than using the attention_mask directly.

Apply this diff to improve clarity:

-    # For simple NLL training, use the attention mask as token_mask
+    # For simple NLL training, create a full token_mask (all tokens participate in loss)
     # (loss will be applied to positions 1..len-1 via NLLLoss)
     token_mask = torch.ones_like(input_ids)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)

1-1: Harden the script with strict mode.

Enable safer defaults to fail fast on errors and undefined vars.

As per coding guidelines

 #!/bin/bash
+set -euo pipefail

3-6: Quote paths and sourced file.

Prevents word-splitting when paths contain spaces.

As per coding guidelines

-NEMO_RL_ROOT=$(realpath $SCRIPT_DIR/../../../../..)
+NEMO_RL_ROOT=$(realpath "$SCRIPT_DIR/../../../../..")

-# Source common.env from local test suite
-source $SCRIPT_DIR/common.env
+# Source common.env from local test suite
+source "$SCRIPT_DIR/common.env"
research/template_project/tests/functional/single_update.sh (2)

3-5: Quote path computations.

As per coding guidelines

-SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
-PROJECT_ROOT=$(realpath $SCRIPT_DIR/../..)
+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd)
+PROJECT_ROOT=$(realpath "$SCRIPT_DIR/../..")

12-17: Quote dirs and handle cd failure.

As per coding guidelines

-export PYTHONPATH=${PROJECT_ROOT}:${PYTHONPATH:-}
+export PYTHONPATH="${PROJECT_ROOT}:${PYTHONPATH:-}"

-rm -rf $EXP_DIR $LOG_DIR
-mkdir -p $EXP_DIR $LOG_DIR
+rm -rf "$EXP_DIR" "$LOG_DIR"
+mkdir -p "$EXP_DIR" "$LOG_DIR"

-cd $PROJECT_ROOT
+cd "$PROJECT_ROOT" || { echo "ERROR: cd to PROJECT_ROOT failed: $PROJECT_ROOT"; exit 1; }
research/template_project/single_update.py (3)

49-55: Add a docstring to main().

Document inputs/behavior for the public entry point.

As per coding guidelines

-def main(config: MasterConfig) -> None:
+def main(config: MasterConfig) -> None:
+    """Run a minimal GRPO single-update demo.
+
+    Args:
+        config: Fully-resolved MasterConfig dict (OmegaConf->container).
+    """

166-176: Add a docstring to parse_args().

Clarify return values.

As per coding guidelines

 def parse_args() -> tuple[argparse.Namespace, list[str]]:
-    """Parse command line arguments."""
+    """Parse CLI args and collect Hydra-like overrides.
+
+    Returns:
+        Tuple of (known args namespace, list of override strings).
+    """

46-46: Remove unused resolver registration (optional).

The "mul" resolver isn’t used; consider dropping to reduce globals side-effects.

-OmegaConf.register_new_resolver("mul", lambda a, b: a * b)
+# OmegaConf custom resolvers can be registered here if needed.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f29fa2a and e7def21.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (19)
  • .github/CODEOWNERS (1 hunks)
  • nemo_rl/distributed/virtual_cluster.py (1 hunks)
  • nemo_rl/utils/venvs.py (1 hunks)
  • pyproject.toml (1 hunks)
  • research/README.md (1 hunks)
  • research/template_project/.python-version (1 hunks)
  • research/template_project/README.md (1 hunks)
  • research/template_project/configs/grpo_math_1B.yaml (1 hunks)
  • research/template_project/configs/recipes/llm/single_update_1n8g.yaml (1 hunks)
  • research/template_project/pyproject.toml (1 hunks)
  • research/template_project/single_update.py (1 hunks)
  • research/template_project/template_project/data_utils.py (1 hunks)
  • research/template_project/tests/functional/single_update.sh (1 hunks)
  • research/template_project/tests/test_suites/llm/common.env (1 hunks)
  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1 hunks)
  • research/template_project/tests/unit/test_data_utils.py (1 hunks)
  • tests/functional/L1_Functional_Tests_GPU.sh (1 hunks)
  • tests/unit/L0_Unit_Tests_Other.sh (1 hunks)
  • tools/copyright.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • research/template_project/tests/unit/test_data_utils.py
  • nemo_rl/distributed/virtual_cluster.py
  • research/template_project/template_project/data_utils.py
  • nemo_rl/utils/venvs.py
  • research/template_project/single_update.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/distributed/virtual_cluster.py
  • nemo_rl/utils/venvs.py
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • research/template_project/tests/test_suites/llm/single_update_1n8g.sh
  • research/template_project/tests/functional/single_update.sh
  • tests/unit/L0_Unit_Tests_Other.sh
  • tests/functional/L1_Functional_Tests_GPU.sh
  • tools/copyright.sh
🧠 Learnings (3)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.py : Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Applied to files:

  • research/template_project/tests/unit/test_data_utils.py
  • tools/copyright.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

  • research/template_project/configs/recipes/llm/single_update_1n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Applied to files:

  • tools/copyright.sh
🧬 Code graph analysis (4)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
research/template_project/tests/test_suites/llm/common.env (1)
  • exit_if_max_steps_reached (15-23)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
  • BatchedDataDict (75-860)
research/template_project/tests/unit/test_data_utils.py (1)
  • tokenizer (23-27)
research/template_project/single_update.py (7)
research/template_project/template_project/data_utils.py (1)
  • create_batch_from (21-47)
nemo_rl/algorithms/grpo.py (1)
  • refit_policy_generation (498-566)
nemo_rl/algorithms/loss_functions.py (1)
  • NLLLoss (362-438)
nemo_rl/algorithms/utils.py (1)
  • get_tokenizer (157-288)
nemo_rl/distributed/virtual_cluster.py (3)
  • RayVirtualCluster (177-435)
  • init_ray (84-170)
  • shutdown (407-426)
nemo_rl/models/generation/__init__.py (1)
  • configure_generation_config (24-45)
nemo_rl/models/generation/vllm/vllm_generation.py (1)
  • VllmGeneration (47-838)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md

110-110: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py

98-98: subprocess call: check for execution of untrusted input

(S603)


98-98: Starting a process with a partial executable path

(S607)

🪛 Shellcheck (0.11.0)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh

[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 33-33: Double quote array expansions to avoid re-splitting elements.

(SC2068)

research/template_project/tests/functional/single_update.sh

[error] 27-27: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/unit/L0_Unit_Tests_Other.sh

[warning] 41-41: Quote this to prevent word splitting.

(SC2046)

tests/functional/L1_Functional_Tests_GPU.sh

[warning] 36-36: Quote this to prevent word splitting.

(SC2046)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (9)
research/template_project/.python-version (1)

1-1: LGTM!

The Python version specification is correct and follows the standard format for .python-version files.

.github/CODEOWNERS (1)

63-63: LGTM!

The code ownership entry is correctly formatted and appropriately assigns ownership of the template project.

research/template_project/pyproject.toml (1)

1-31: LGTM!

The project configuration is well-structured with appropriate build system, dependencies, and test setup. The workspace configuration correctly references nemo-rl as a workspace dependency.

research/template_project/README.md (1)

1-126: LGTM!

The README is comprehensive, well-organized, and provides clear guidance for using the template project. The documentation covers all essential aspects including setup, testing, dependencies, and citation information.

Note: The markdownlint warning about the fenced code block at line 110 is acceptable in this context, as it's displaying a contact information template rather than executable code.

pyproject.toml (1)

176-181: LGTM!

The workspace member addition is correctly configured, and the accompanying comments provide valuable context about the rationale for including research projects in the workspace to share the global uv.lock. This helps prevent dependency conflicts, particularly with numpy versioning.

tools/copyright.sh (2)

22-22: LGTM!

Adding the research/ directory to the copyright check scope is appropriate and ensures new research projects comply with copyright requirements.


23-26: Good defensive programming.

The empty file check prevents attempting to extract copyright headers from empty files, which is a sensible safeguard.

research/README.md (1)

1-50: LGTM!

The research directory README provides comprehensive guidance for contributors, with clear expectations for acceptance criteria, code reviews, ownership, and testing. The documentation strikes a good balance between encouraging contributions and maintaining quality standards.

research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)

12-13: Verify unused variables NUM_RUNS and NUM_MINUTES.

They appear unused (SC2034). Remove them or export if used by the harness.

As per coding guidelines

@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 15, 2025
@terrykong terrykong removed the r0.4.0 label Oct 28, 2025
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 18, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 18, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 19, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 22, 2025
@terrykong terrykong merged commit 4a59436 into main Nov 23, 2025
41 of 42 checks passed
@terrykong terrykong deleted the tk/research branch November 23, 2025 23:08
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
seonjinn pushed a commit that referenced this pull request Mar 8, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
seonjinn pushed a commit that referenced this pull request Mar 9, 2026
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests CI Relating to CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants