Conversation
|
@terrykong is there a general guidelines/requirements we can provide. are we expecting code review, tests and docs requirements are same for research contributions ? |
|
@euronymous-aithal let me add some requirements and some expectations |
euronymous-aithal
left a comment
There was a problem hiding this comment.
does it make sense to move "Expectations for Research Project Authors" this section under research ?
i would imagine "research" should have a readme with all the info/guidelines.
the template-project should be an example with those best practices. authors can just clone the template-project ?
|
looks good otherwise @terrykong |
odelalleau
left a comment
There was a problem hiding this comment.
lgtm, just a few comments
260c4be to
baa7cf9
Compare
📝 WalkthroughWalkthroughAdds a research template project with configs, code, tests, and docs; wires it into the workspace and ownership; updates uv commands to pin project directory; extends CI scripts to discover and run research tests; and updates tooling to include research files in copyright checks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as CLI (single_update.py)
participant CFG as Config Loader
participant VC as VirtualCluster (Ray)
participant GEN as Generation (vLLM)
participant POL as Policy (LM)
User->>CLI: Launch with config + overrides
CLI->>CFG: Load MasterConfig
CLI->>VC: Init cluster (gpus_per_node, nodes)
CLI->>GEN: Init VLLM generation (cluster)
CLI->>POL: Init policy (+ reference)
note over GEN,POL: Prepare/Refit before training
loop For N tiny iterations
CLI->>GEN: Refit weights from POL
CLI->>GEN: Generate outputs (prompts)
GEN-->>CLI: Token ids / texts
CLI->>POL: Prepare for training
CLI->>POL: Train on tiny batch (NLL)
end
CLI->>GEN: Shutdown
CLI->>POL: Shutdown
CLI->>VC: Shutdown
CLI-->>User: Done
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
📝 WalkthroughWalkthroughAdds a research template project with code, configs, tests, and docs; wires it into the workspace and CODEOWNERS; updates uv invocation to run in the repo root; includes new functional/unit test discovery for research projects; and extends copyright tooling. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User/CI
participant S as single_update.py
participant C as Ray Cluster
participant G as vLLM Generator
participant P as LM Policy
participant D as Data Utils
U->>S: Launch with config (YAML + overrides)
S->>C: Initialize cluster (Ray)
S->>P: Initialize policy (weights, tokenizer)
S->>G: Initialize generation backend (vLLM)
rect rgb(245,248,255)
note right of S: Iterative single-update cycle
loop For i in 1..N
S->>G: Refit with latest policy weights
G-->>S: Ready
S->>G: Generate outputs for prompts
G-->>S: Completions
S->>D: create_batch_from(sentences)
D-->>S: BatchedDataDict
S->>P: Train step (tiny NLL batch)
P-->>S: Loss/metrics
end
end
S-->>G: Shutdown
S-->>C: Shutdown
S-->>U: All done.
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
research/template_project/README.md (1)
110-114: Add language specifier to code block.The contact information block should specify a language identifier for proper rendering, even though it contains placeholder text.
Apply this diff:
-``` +```text Author: AUTHOR NAMES HERE Email: AUTHOR EMAILS HERE Organization: ORGANIZATION HERE (optional)research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
12-13: Remove or document unused variables.
NUM_RUNSandNUM_MINUTESare defined but never used in the script. If they're intended for external tooling (e.g., SLURM job configuration viatools/launch), add a comment explaining their purpose. Otherwise, remove them to reduce confusion.If these are for external tooling, add a comment:
NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN )) # Round up -NUM_MINUTES=30 +NUM_MINUTES=30 # Used by tools/launch for SLURM time allocationOr remove them if truly unused:
-NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN )) # Round up -NUM_MINUTES=30research/template_project/tests/unit/test_data_utils.py (1)
101-113: Consider simplifying the test logic.The test recreates the tokenizer and re-tokenizes the sentences to verify
input_lengths, which duplicates the implementation logic. While this works, consider using manually verified examples for more independent validation.Example alternative approach:
def test_create_batch_from_input_lengths(tokenizer): """Test that input_lengths correctly represent non-padded token counts.""" # Use sentences with known token counts for more explicit verification sentences = ["a b c", "x y"] batch = create_batch_from(tokenizer, sentences) # Verify the first sequence has more tokens than the second assert batch["input_lengths"][0] > batch["input_lengths"][1] # Optionally verify against known tokenization if deterministic # e.g., manually check that "a b c" tokenizes to N tokens with this tokenizer
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/CODEOWNERS(1 hunks)nemo_rl/distributed/virtual_cluster.py(1 hunks)nemo_rl/utils/venvs.py(1 hunks)pyproject.toml(1 hunks)research/README.md(1 hunks)research/template_project/.python-version(1 hunks)research/template_project/README.md(1 hunks)research/template_project/configs/grpo_math_1B.yaml(1 hunks)research/template_project/configs/recipes/llm/single_update_1n8g.yaml(1 hunks)research/template_project/pyproject.toml(1 hunks)research/template_project/single_update.py(1 hunks)research/template_project/template_project/data_utils.py(1 hunks)research/template_project/tests/functional/single_update.sh(1 hunks)research/template_project/tests/test_suites/llm/common.env(1 hunks)research/template_project/tests/test_suites/llm/single_update_1n8g.sh(1 hunks)research/template_project/tests/unit/test_data_utils.py(1 hunks)tests/functional/L1_Functional_Tests_GPU.sh(1 hunks)tests/unit/L0_Unit_Tests_Other.sh(1 hunks)tools/copyright.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tools/copyright.shresearch/template_project/tests/test_suites/llm/single_update_1n8g.shresearch/template_project/tests/functional/single_update.shtests/functional/L1_Functional_Tests_GPU.shtests/unit/L0_Unit_Tests_Other.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/utils/venvs.pyresearch/template_project/single_update.pynemo_rl/distributed/virtual_cluster.pyresearch/template_project/tests/unit/test_data_utils.pyresearch/template_project/template_project/data_utils.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/utils/venvs.pynemo_rl/distributed/virtual_cluster.py
🧬 Code graph analysis (4)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)
research/template_project/tests/test_suites/llm/common.env (1)
exit_if_max_steps_reached(15-23)tests/test_suites/llm/common.env (1)
exit_if_max_steps_reached(12-20)
research/template_project/single_update.py (5)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)nemo_rl/algorithms/grpo.py (1)
refit_policy_generation(498-566)nemo_rl/distributed/virtual_cluster.py (1)
init_ray(84-170)nemo_rl/models/generation/__init__.py (1)
configure_generation_config(24-45)nemo_rl/models/policy/lm_policy.py (1)
Policy(59-806)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
BatchedDataDict(75-860)research/template_project/tests/unit/test_data_utils.py (1)
tokenizer(23-27)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md
110-110: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py
98-98: subprocess call: check for execution of untrusted input
(S603)
98-98: Starting a process with a partial executable path
(S607)
🪛 Shellcheck (0.11.0)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh
[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 33-33: Double quote array expansions to avoid re-splitting elements.
(SC2068)
research/template_project/tests/functional/single_update.sh
[error] 27-27: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/functional/L1_Functional_Tests_GPU.sh
[warning] 36-36: Quote this to prevent word splitting.
(SC2046)
tests/unit/L0_Unit_Tests_Other.sh
[warning] 41-41: Quote this to prevent word splitting.
(SC2046)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (21)
research/template_project/.python-version (1)
1-1: LGTM!The Python version specification aligns with the project's
requires-python = ">=3.12"requirement in pyproject.toml.research/template_project/configs/recipes/llm/single_update_1n8g.yaml (1)
1-6: LGTM!The configuration correctly references a parent config and overrides the GPU allocation. The structure follows standard Hydra conventions.
.github/CODEOWNERS (1)
62-63: LGTM!The CODEOWNERS entry correctly assigns ownership of the template project to @terrykong, following the established pattern in the file.
nemo_rl/utils/venvs.py (1)
98-98: LGTM!The addition of
--directoryensures UV operates from the repository root, which is necessary for the workspace configuration to function correctly with research projects sharing the global uv.lock.The static analysis warnings (S603, S607) are false positives:
git_rootis computed from the module's file path, not user input, anduvis a well-known build tool.pyproject.toml (1)
171-176: LGTM!The workspace configuration correctly includes the research template project, and the detailed inline comments clearly explain the rationale for sharing the global uv.lock (avoiding numpy version mismatches that cause pickle issues with Ray).
research/template_project/pyproject.toml (1)
1-31: LGTM!The template project configuration is well-structured:
- Correctly declares nemo-rl as a workspace dependency
- Includes appropriate test tooling
- Follows the pattern established in the main project
research/README.md (1)
1-50: Comprehensive documentation for research projects.This documentation provides clear guidance for research project authors, covering acceptance criteria, testing expectations, and code ownership. The structure is well-organized and addresses key concerns about maintainability and reproducibility.
research/template_project/tests/test_suites/llm/common.env (1)
1-40: LGTM!The environment setup script is well-structured with clear separation of concerns. The
exit_if_max_steps_reachedfunction provides efficient early stopping for incremental test runs, and the path initialization properly handles the project root context.research/template_project/template_project/data_utils.py (1)
36-38: Verify token_mask initialization aligns with the comment.The comment states "use the attention mask as token_mask," but the implementation creates
token_maskas ones for all positions. If the intent is to apply loss only to non-padded tokens, consider using the attention mask directly. If the current implementation is correct (e.g., because NLLLoss handles masking separately), please update the comment to clarify the behavior.If token_mask should reflect padding, apply this diff:
- # For simple NLL training, use the attention mask as token_mask + # For simple NLL training, token_mask marks all positions # (loss will be applied to positions 1..len-1 via NLLLoss) token_mask = torch.ones_like(input_ids)Or if the attention mask should be used:
- # For simple NLL training, use the attention mask as token_mask - # (loss will be applied to positions 1..len-1 via NLLLoss) - token_mask = torch.ones_like(input_ids) + # For simple NLL training, use the attention mask as token_mask + # to exclude padding tokens from loss computation + token_mask = attention_mask.to(torch.int64)research/template_project/configs/grpo_math_1B.yaml (1)
1-254: LGTM!The configuration file is comprehensive and well-documented, with clear sections for all major components (GRPO algorithm, loss function, checkpointing, policy, data, environment, logging, and cluster). The inline comments provide helpful context for key parameters like async GRPO and importance sampling corrections.
nemo_rl/distributed/virtual_cluster.py (1)
46-58: LGTM!The addition of
--directory {git_root}to all uv executable definitions ensures consistent execution context for workspace projects. This change properly supports the research template project structure introduced in this PR.research/template_project/tests/unit/test_data_utils.py (5)
22-27: LGTM!The tokenizer fixture is well-structured. Setting
pad_token = eos_tokenis the standard approach for GPT-2, which doesn't have a dedicated padding token.
30-43: LGTM!The test correctly verifies the presence of all required keys and their shapes for a single-sentence batch.
45-54: LGTM!The test properly validates batch dimensions and shape consistency for multiple sentences.
69-78: LGTM!The dtype validations correctly match the implementation in
data_utils.py.
81-98: LGTM!Both mask tests correctly verify that the masks are initialized to all ones with proper shapes.
research/template_project/single_update.py (5)
1-46: LGTM!The copyright header, docstring, and imports are all properly structured. The OmegaConf "mul" resolver registration is a good practice for supporting config-based arithmetic.
49-94: LGTM!The initialization sequence is well-structured:
- VllmGeneration is initialized before Policy to ensure a clean GPU environment
- The
finish_generation()call pre-initializes workers to avoid later contention- The
state_dict_info or {}fallback safely handles None returns
96-117: Verify train_sentences pattern matches expectations.Lines 98-100 create
train_sentencesby repeating the 2-sentence patterntrain_global_batch_sizetimes. This means:
- If
train_global_batch_size=1: 2 sentences- If
train_global_batch_size=2: 4 sentencesEnsure this aligns with your intended batch construction, as the multiplier doesn't directly correspond to the final batch size. The pattern is multiplied rather than used as the target count.
122-163: LGTM!The training loop correctly implements the refit → generate → train cycle:
- Refit ensures the generation engine has the latest policy weights
- Generation runs with
greedy=Truefor deterministic demonstration- Training properly extracts and logs the loss
- All components are properly shut down at the end
166-201: LGTM!The argument parsing and main entry point are well-designed:
parse_known_args()allows for flexible Hydra-style overrides- Default config path is set appropriately
- Config resolution and pretty-printing aid debugging
d5c513e to
3c3ab37
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (5)
tools/copyright.sh (1)
22-28: Replaceecho -ewithprintfto avoid mangling headers.Interpreting backslash escapes breaks the captured first line (e.g.,
\n,\t,\r,\c) and can hide real violations. Keep the output literal withprintf.Apply this diff:
- echo -e "$path\t$first_line" + printf '%s\t%s\n' "$path" "$first_line"tests/unit/L0_Unit_Tests_Other.sh (1)
39-45: Quote variables to prevent word splitting.Lines 41-42 have unquoted command substitutions and variables that should be quoted to prevent word splitting if paths contain spaces. This issue was already identified in a previous review.
Apply this diff to fix the quoting issues:
- project_dir=$(dirname $(dirname $i)) - pushd $project_dir + project_dir=$(dirname "$(dirname "$i")") + pushd "$project_dir"tests/functional/L1_Functional_Tests_GPU.sh (1)
34-40: Quote all variable expansions and command substitutions to prevent word splitting.Lines 36-38 have multiple unquoted variables and command substitutions that could cause issues with paths containing spaces or special characters.
Apply this diff to add proper quoting:
# Research functional tests (self-discovery) for test_script in research/*/tests/functional/*.sh; do - project_dir=$(dirname $(dirname $test_script)) - pushd $project_dir - time uv run --no-sync bash $test_script + project_dir=$(dirname "$(dirname "$test_script")") + pushd "$project_dir" + time uv run --no-sync bash "$test_script" popd doneresearch/template_project/tests/functional/single_update.sh (1)
14-28: Quote variable expansions throughout the script to prevent word splitting.Multiple lines use unquoted variables that could cause issues with paths containing spaces or special characters.
Apply this diff:
-rm -rf $EXP_DIR $LOG_DIR -mkdir -p $EXP_DIR $LOG_DIR +rm -rf "$EXP_DIR" "$LOG_DIR" +mkdir -p "$EXP_DIR" "$LOG_DIR" -cd $PROJECT_ROOT +cd "$PROJECT_ROOT" # Run single_update.py for just 1 iteration SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \ single_update.py \ - --config $PROJECT_ROOT/configs/grpo_math_1B.yaml \ + --config "$PROJECT_ROOT/configs/grpo_math_1B.yaml" \ cluster.gpus_per_node=1 \ cluster.num_nodes=1 \ policy.train_global_batch_size=1 \ policy.train_micro_batch_size=1 \ - $@ \ - 2>&1 | tee $RUN_LOG + "$@" \ + 2>&1 | tee "$RUN_LOG"Note: Line 27 was already flagged in a previous review.
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
24-34: Add error handling for directory change and quote all variable expansions.Line 24 uses
cdwithout error handling, and lines 24, 30, 33-34 have unquoted variables that could cause word splitting issues.Apply this diff:
# Run the experiment -cd $PROJECT_ROOT +cd "$PROJECT_ROOT" || exit 1 # Set the number of iterations via environment variable NRL_FORCE_REBUILD_VENVS=true \ SINGLE_UPDATE_ITERS=$MAX_STEPS \ uv run single_update.py \ --config configs/grpo_math_1B.yaml \ cluster.gpus_per_node=8 \ - cluster.num_nodes=$NUM_NODES \ - $@ \ - 2>&1 | tee $RUN_LOG + cluster.num_nodes="$NUM_NODES" \ + "$@" \ + 2>&1 | tee "$RUN_LOG"Note: Lines 24 and 33 were already flagged in previous reviews.
🧹 Nitpick comments (10)
research/template_project/README.md (1)
110-114: Add language identifier to fenced code block.The author/citation information block should specify a language identifier for proper rendering. As per coding guidelines, this helps with syntax highlighting and documentation consistency.
Apply this diff:
-``` +```text Author: AUTHOR NAMES HERE Email: AUTHOR EMAILS HERE Organization: ORGANIZATION HERE (optional)</blockquote></details> <details> <summary>research/README.md (1)</summary><blockquote> `26-26`: **Clarify acceptance criteria language.** The phrase "results outlined in this README" is confusing since this is the general research directory README, not a project-specific README. Each project has its own README where results would be documented. Consider rephrasing for clarity: ```diff -The acceptance criteria for merging your research project into the main repository are reproduction steps for the results outlined in this README. We want to make sure others can reproduce your great work! Please include sufficient documentation in the README.md that enables users to follow and reproduce your results step-by-step. +The acceptance criteria for merging your research project into the main repository are reproduction steps for the results outlined in your project's README. We want to make sure others can reproduce your great work! Please include sufficient documentation in your project's README.md that enables users to follow and reproduce your results step-by-step.research/template_project/tests/unit/test_data_utils.py (1)
108-110: Use the tokenizer fixture instead of creating a new tokenizer.Lines 108-110 duplicate the tokenizer setup logic that's already available via the
tokenizerfixture. This reduces maintainability.Apply this diff to use the fixture:
batch = create_batch_from(tokenizer, sentences) - # Compute expected lengths from attention mask - tok = AutoTokenizer.from_pretrained("gpt2") - tok.pad_token = tok.eos_token - enc = tok(sentences, add_special_tokens=False, return_tensors="pt", padding=True) + # Compute expected lengths from attention mask using the fixture + enc = tokenizer(sentences, add_special_tokens=False, return_tensors="pt", padding=True) expected_lengths = enc["attention_mask"].sum(dim=1).to(torch.int32) assert torch.all(batch["input_lengths"] == expected_lengths)research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
12-13: Remove unused variables or document their purpose.
NUM_RUNSandNUM_MINUTESare defined but never used in the script. Based on the static analysis hint and code inspection, these appear to be leftover from a template or intended for documentation purposes.If these variables serve as documentation for the typical structure of test suites, consider adding a comment explaining this. Otherwise, remove them to reduce confusion:
-NUM_RUNS=$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN )) # Round up -NUM_MINUTES=30research/template_project/single_update.py (2)
49-164: Add error handling to ensure resource cleanup on failure.The main function doesn't have try-except blocks around the training loop. If any step fails, the shutdown calls (lines 161-163) won't execute, potentially leaving Ray workers and GPU resources allocated.
For a more robust demonstration, consider adding error handling:
def main(config: MasterConfig) -> None: # 0) Config policy_config = config["policy"] tokenizer = get_tokenizer(policy_config["tokenizer"]) policy_config["generation"] = configure_generation_config( policy_config["generation"], tokenizer ) # 1) Set up compute cluster (single GPU for demo) print("\n▶ Setting up compute cluster...") init_ray() cluster = RayVirtualCluster( name="single_update_cluster", bundle_ct_per_node_list=[config["cluster"]["gpus_per_node"]] * config["cluster"]["num_nodes"], use_gpus=True, num_gpus_per_node=config["cluster"]["gpus_per_node"], max_colocated_worker_groups=1 if policy_config["generation"]["backend"] == "megatron" else 2, ) + try: # 2) Initialize vLLM generation first for a clean GPU environment print("\n▶ Initializing vLLM generation...") # Initialize vLLM directly from config policy_config["generation"]["model_name"] = policy_config["model_name"] policy_generation = VllmGeneration( cluster=cluster, config=policy_config["generation"] ) # Pre-initialize workers to avoid contention later policy_generation.finish_generation() print(" ✓ vLLM generation ready") # 3) Initialize policy (LM) print("\n▶ Initializing LM Policy...") policy = Policy( cluster=cluster, config=policy_config, tokenizer=tokenizer, init_reference_model=False, ) print(" ✓ Policy created") # ... rest of the training loop ... print("\nAll done.") + finally: policy.shutdown() policy_generation.shutdown() cluster.shutdown()
195-196: Clarify or relocate the "Applied CLI overrides" message.Line 196 prints "Applied CLI overrides" after calling
to_container(), but the overrides were actually applied at line 193. This placement could be confusing.Consider moving the print statement or revising the message for clarity:
if overrides: print(f"Overrides: {overrides}") config = parse_hydra_overrides(config, overrides) + print("Applied CLI overrides") config: MasterConfig = OmegaConf.to_container(config, resolve=True) - print("Applied CLI overrides") + print("Resolved configuration:") from rich.pretty import pprint pprint(config)research/template_project/tests/test_suites/llm/common.env (3)
15-23: Document the MAX_STEPS requirement and improve robustness.The function references
MAX_STEPS, which must be set by the calling script. Consider adding a comment documenting this requirement. Additionally, quote$JSON_METRICSto safely handle paths with spaces.Apply this diff to improve the function:
exit_if_max_steps_reached() { - # Early stopping to save compute if max step has been reached + # Early stopping to save compute if max step has been reached. + # Requires: MAX_STEPS environment variable must be set by caller. - STEPS_SO_FAR=$(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS || echo 0) + STEPS_SO_FAR=$(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' "$JSON_METRICS" || echo 0) if [[ $STEPS_SO_FAR -ge $MAX_STEPS ]]; then echo "[INFO] Target step $MAX_STEPS reached, skipping run" exit 0 fi echo "[INFO] Steps so far: $STEPS_SO_FAR, running till $MAX_STEPS steps" }
32-32: QuotePROJECT_ROOTfor safety.To handle edge cases with spaces or special characters in the path, quote the variable.
Apply this diff:
-export PYTHONPATH=${PROJECT_ROOT}:${PYTHONPATH:-} +export PYTHONPATH="${PROJECT_ROOT}:${PYTHONPATH:-}"
39-39: Quote directory variables for safety.To handle edge cases with spaces or special characters in paths, quote the variables.
Apply this diff:
-mkdir -p $EXP_DIR $LOG_DIR $CKPT_DIR +mkdir -p "$EXP_DIR" "$LOG_DIR" "$CKPT_DIR"research/template_project/configs/grpo_math_1B.yaml (1)
28-30: Consider adding validation for async GRPO configuration.The configuration correctly notes that importance sampling correction is required when async GRPO is enabled, but there's no safeguard to prevent users from enabling async mode without enabling importance sampling. Consider adding a validation check in the code that consumes this config, or using a more explicit comment that emphasizes this is a hard requirement.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/CODEOWNERS(1 hunks)nemo_rl/distributed/virtual_cluster.py(1 hunks)nemo_rl/utils/venvs.py(1 hunks)pyproject.toml(1 hunks)research/README.md(1 hunks)research/template_project/.python-version(1 hunks)research/template_project/README.md(1 hunks)research/template_project/configs/grpo_math_1B.yaml(1 hunks)research/template_project/configs/recipes/llm/single_update_1n8g.yaml(1 hunks)research/template_project/pyproject.toml(1 hunks)research/template_project/single_update.py(1 hunks)research/template_project/template_project/data_utils.py(1 hunks)research/template_project/tests/functional/single_update.sh(1 hunks)research/template_project/tests/test_suites/llm/common.env(1 hunks)research/template_project/tests/test_suites/llm/single_update_1n8g.sh(1 hunks)research/template_project/tests/unit/test_data_utils.py(1 hunks)tests/functional/L1_Functional_Tests_GPU.sh(1 hunks)tests/unit/L0_Unit_Tests_Other.sh(1 hunks)tools/copyright.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/functional/L1_Functional_Tests_GPU.shresearch/template_project/tests/test_suites/llm/single_update_1n8g.shtests/unit/L0_Unit_Tests_Other.shtools/copyright.shresearch/template_project/tests/functional/single_update.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
nemo_rl/distributed/virtual_cluster.pyresearch/template_project/template_project/data_utils.pyresearch/template_project/single_update.pyresearch/template_project/tests/unit/test_data_utils.pynemo_rl/utils/venvs.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/utils/venvs.py
🧠 Learnings (3)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
research/template_project/configs/recipes/llm/single_update_1n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.py : Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Applied to files:
research/template_project/tests/unit/test_data_utils.pytools/copyright.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Applied to files:
tools/copyright.sh
🧬 Code graph analysis (4)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
research/template_project/tests/test_suites/llm/common.env (1)
exit_if_max_steps_reached(15-23)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
BatchedDataDict(75-860)research/template_project/tests/unit/test_data_utils.py (1)
tokenizer(23-27)
research/template_project/single_update.py (10)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)nemo_rl/algorithms/grpo.py (1)
refit_policy_generation(498-566)nemo_rl/algorithms/loss_functions.py (1)
NLLLoss(362-438)nemo_rl/algorithms/utils.py (1)
get_tokenizer(157-288)nemo_rl/distributed/batched_data_dict.py (1)
BatchedDataDict(75-860)nemo_rl/distributed/virtual_cluster.py (3)
RayVirtualCluster(177-435)init_ray(84-170)shutdown(407-426)nemo_rl/models/generation/__init__.py (1)
configure_generation_config(24-45)nemo_rl/models/generation/vllm/vllm_generation.py (1)
VllmGeneration(47-838)nemo_rl/models/policy/lm_policy.py (1)
Policy(59-806)nemo_rl/utils/config.py (1)
parse_hydra_overrides(146-166)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md
110-110: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py
98-98: subprocess call: check for execution of untrusted input
(S603)
98-98: Starting a process with a partial executable path
(S607)
🪛 Shellcheck (0.11.0)
tests/functional/L1_Functional_Tests_GPU.sh
[warning] 36-36: Quote this to prevent word splitting.
(SC2046)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh
[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 33-33: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/unit/L0_Unit_Tests_Other.sh
[warning] 41-41: Quote this to prevent word splitting.
(SC2046)
research/template_project/tests/functional/single_update.sh
[error] 27-27: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (6)
nemo_rl/utils/venvs.py (1)
98-98: LGTM! Directory-scoped uv sync aligns with workspace pattern.The addition of
--directory git_rootensures the uv sync command operates from the repository root, which is consistent with the directory-scoped execution pattern introduced elsewhere in this PR (e.g., invirtual_cluster.py). The static analysis warnings about subprocess execution are false positives sincegit_rootis computed from the module's file path, not from user input..github/CODEOWNERS (1)
62-63: LGTM! Ownership assignment is appropriate.The new CODEOWNERS entry correctly assigns ownership of the template project to @terrykong, aligning with the PR context.
pyproject.toml (1)
176-181: LGTM! Workspace integration with clear rationale.The addition of
research/template_projectto the workspace members enables shared dependency resolution and the global uv.lock. The detailed comment explaining the numpy version conflict is valuable context for future maintainers.research/template_project/pyproject.toml (1)
1-31: LGTM! Well-structured project configuration.The pyproject.toml follows standard patterns:
- Uses hatchling for builds (consistent with project requirements)
- Correctly declares nemo-rl as a workspace dependency
- Includes appropriate test dependencies and pytest configuration
- Specifies Python 3.12+ requirement matching the .python-version file
The configuration integrates properly with the workspace setup.
nemo_rl/distributed/virtual_cluster.py (1)
46-58: LGTM! Directory-scoped uv commands ensure consistent execution context.The addition of
--directory {git_root}to allPY_EXECUTABLESensures that uv commands execute from the repository root, which is essential for:
- Resolving the correct pyproject.toml and uv.lock
- Ensuring workspace member visibility
- Maintaining consistency with the multi-project workspace setup
This pattern aligns with the similar change in
nemo_rl/utils/venvs.pyand supports the new research project structure.research/template_project/.python-version (1)
1-1: Python versions match; root and template.python-versionboth specify 3.12.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (8)
tests/unit/L0_Unit_Tests_Other.sh (1)
39-45: Quote command substitutions and variables to prevent word splitting.Lines 41-42 should quote the nested
dirnamecommand substitutions and theproject_dirvariable to prevent potential word splitting issues if paths contain spaces.Apply this diff:
- project_dir=$(dirname $(dirname $i)) - pushd $project_dir + project_dir=$(dirname "$(dirname "$i")") + pushd "$project_dir"tools/copyright.sh (1)
27-28: Avoidecho -e; it interprets backslash escapes.Using
echo -eon line 28 interprets backslash escapes in the captured first line (e.g.,\n,\t,\r,\c), which can cause misidentification of files with missing copyright headers.Apply this diff to emit the content verbatim:
- first_line=$(head -2 "$path" | grep -iv 'coding=' | head -1) - echo -e "$path\t$first_line" + first_line=$(head -2 "$path" | grep -iv 'coding=' | head -1) + printf '%s\t%s\n' "$path" "$first_line"tests/functional/L1_Functional_Tests_GPU.sh (1)
34-40: Quote variable expansions to prevent word splitting.Lines 36-38 contain unquoted command substitutions and variable references that can cause word splitting if paths contain spaces.
Apply this diff to fix the quoting issues:
# Research functional tests (self-discovery) for test_script in research/*/tests/functional/*.sh; do - project_dir=$(dirname $(dirname $test_script)) - pushd $project_dir - time uv run --no-sync bash $test_script + project_dir=$(dirname "$(dirname "$test_script")") + pushd "$project_dir" + time uv run --no-sync bash "$test_script" popd doneresearch/template_project/tests/unit/test_data_utils.py (2)
1-13: Remove copyright header from test files.Per coding guidelines: test files should not include the copyright header. This is a test-only script, so the header should be removed.
Apply this diff:
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. """Unit tests for template_project.data_utils."""Based on coding guidelines.
57-66: Fix tautological assertion in padding test.Line 64 compares
batch["input_ids"].shape[1]to itself, which always passes. The comment states "All sequences should have the same length (padded to max)", but the assertion doesn't verify this.Apply this diff to properly verify uniform padding:
def test_create_batch_from_padding(tokenizer): """Test that create_batch_from correctly pads sequences.""" sentences = ["short", "this is a much longer sentence"] batch = create_batch_from(tokenizer, sentences) # All sequences should have the same length (padded to max) - assert batch["input_ids"].shape[1] == batch["input_ids"].shape[1] + assert batch["input_ids"].shape[0] == 2 # Two sequences + # Both sequences should have the same padded length + max_len = batch["input_ids"].shape[1] + assert max_len > 0 # Input lengths should reflect the actual (unpadded) lengths assert batch["input_lengths"][0] < batch["input_lengths"][1]research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)
24-24: Handle cd failure and ensure PROJECT_ROOT is set (duplicate of prior comment).Make the directory change robust; provide a fallback if PROJECT_ROOT isn’t defined by common.env.
-# Run the experiment -cd $PROJECT_ROOT +# Run the experiment +# Default PROJECT_ROOT to the template project if not provided by common.env +: "${PROJECT_ROOT:=${NEMO_RL_ROOT}/research/template_project}" +cd "$PROJECT_ROOT" || { echo "ERROR: cd to PROJECT_ROOT failed: $PROJECT_ROOT"; exit 1; }
29-46: Quote forwarded args and file paths; avoid re-splitting (duplicate for "$@").Prevents argument mangling and path splitting (SC2068); also quote logs/paths.
-NRL_FORCE_REBUILD_VENVS=true \ -SINGLE_UPDATE_ITERS=$MAX_STEPS \ -uv run single_update.py \ +NRL_FORCE_REBUILD_VENVS=true \ +SINGLE_UPDATE_ITERS="$MAX_STEPS" \ +uv run single_update.py \ --config configs/grpo_math_1B.yaml \ - cluster.gpus_per_node=8 \ - cluster.num_nodes=$NUM_NODES \ - $@ \ - 2>&1 | tee $RUN_LOG + cluster.gpus_per_node=8 \ + cluster.num_nodes="$NUM_NODES" \ + "$@" \ + 2>&1 | tee "$RUN_LOG" @@ -if grep -q "All done." "$RUN_LOG"; then - echo '{"succeed": "yes"}' > $JSON_METRICS +if grep -q "All done." "$RUN_LOG"; then + echo '{"succeed": "yes"}' > "$JSON_METRICS" else - echo '{"succeed": "no"}' > $JSON_METRICS + echo '{"succeed": "no"}' > "$JSON_METRICS" fi @@ -uv run $NEMO_RL_ROOT/tests/check_metrics.py $JSON_METRICS \ +uv run "$NEMO_RL_ROOT/tests/check_metrics.py" "$JSON_METRICS" \ 'data["succeed"] == "yes"'research/template_project/tests/functional/single_update.sh (1)
20-28: Quote forwarded args and file paths (duplicate for "$@").Prevents re-splitting and path issues (SC2068).
-SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file=$PROJECT_ROOT/tests/.coverage --source=$PROJECT_ROOT/nemo_rl \ +SINGLE_UPDATE_ITERS=1 uv run coverage run -a --data-file="$PROJECT_ROOT/tests/.coverage" --source="$PROJECT_ROOT/nemo_rl" \ single_update.py \ - --config $PROJECT_ROOT/configs/grpo_math_1B.yaml \ + --config "$PROJECT_ROOT/configs/grpo_math_1B.yaml" \ cluster.gpus_per_node=1 \ cluster.num_nodes=1 \ policy.train_global_batch_size=1 \ policy.train_micro_batch_size=1 \ - $@ \ - 2>&1 | tee $RUN_LOG + "$@" \ + 2>&1 | tee "$RUN_LOG"
🧹 Nitpick comments (8)
research/template_project/template_project/data_utils.py (1)
36-38: Clarify token_mask comment.The comment states "use the attention mask as token_mask", but the code creates an all-ones mask instead. Consider clarifying that this creates a full mask (all tokens participate in loss) for simple NLL training, rather than using the attention_mask directly.
Apply this diff to improve clarity:
- # For simple NLL training, use the attention mask as token_mask + # For simple NLL training, create a full token_mask (all tokens participate in loss) # (loss will be applied to positions 1..len-1 via NLLLoss) token_mask = torch.ones_like(input_ids)research/template_project/tests/test_suites/llm/single_update_1n8g.sh (2)
1-1: Harden the script with strict mode.Enable safer defaults to fail fast on errors and undefined vars.
As per coding guidelines
#!/bin/bash +set -euo pipefail
3-6: Quote paths and sourced file.Prevents word-splitting when paths contain spaces.
As per coding guidelines
-NEMO_RL_ROOT=$(realpath $SCRIPT_DIR/../../../../..) +NEMO_RL_ROOT=$(realpath "$SCRIPT_DIR/../../../../..") -# Source common.env from local test suite -source $SCRIPT_DIR/common.env +# Source common.env from local test suite +source "$SCRIPT_DIR/common.env"research/template_project/tests/functional/single_update.sh (2)
3-5: Quote path computations.As per coding guidelines
-SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd) -PROJECT_ROOT=$(realpath $SCRIPT_DIR/../..) +SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd) +PROJECT_ROOT=$(realpath "$SCRIPT_DIR/../..")
12-17: Quote dirs and handle cd failure.As per coding guidelines
-export PYTHONPATH=${PROJECT_ROOT}:${PYTHONPATH:-} +export PYTHONPATH="${PROJECT_ROOT}:${PYTHONPATH:-}" -rm -rf $EXP_DIR $LOG_DIR -mkdir -p $EXP_DIR $LOG_DIR +rm -rf "$EXP_DIR" "$LOG_DIR" +mkdir -p "$EXP_DIR" "$LOG_DIR" -cd $PROJECT_ROOT +cd "$PROJECT_ROOT" || { echo "ERROR: cd to PROJECT_ROOT failed: $PROJECT_ROOT"; exit 1; }research/template_project/single_update.py (3)
49-55: Add a docstring to main().Document inputs/behavior for the public entry point.
As per coding guidelines
-def main(config: MasterConfig) -> None: +def main(config: MasterConfig) -> None: + """Run a minimal GRPO single-update demo. + + Args: + config: Fully-resolved MasterConfig dict (OmegaConf->container). + """
166-176: Add a docstring to parse_args().Clarify return values.
As per coding guidelines
def parse_args() -> tuple[argparse.Namespace, list[str]]: - """Parse command line arguments.""" + """Parse CLI args and collect Hydra-like overrides. + + Returns: + Tuple of (known args namespace, list of override strings). + """
46-46: Remove unused resolver registration (optional).The "mul" resolver isn’t used; consider dropping to reduce globals side-effects.
-OmegaConf.register_new_resolver("mul", lambda a, b: a * b) +# OmegaConf custom resolvers can be registered here if needed.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (19)
.github/CODEOWNERS(1 hunks)nemo_rl/distributed/virtual_cluster.py(1 hunks)nemo_rl/utils/venvs.py(1 hunks)pyproject.toml(1 hunks)research/README.md(1 hunks)research/template_project/.python-version(1 hunks)research/template_project/README.md(1 hunks)research/template_project/configs/grpo_math_1B.yaml(1 hunks)research/template_project/configs/recipes/llm/single_update_1n8g.yaml(1 hunks)research/template_project/pyproject.toml(1 hunks)research/template_project/single_update.py(1 hunks)research/template_project/template_project/data_utils.py(1 hunks)research/template_project/tests/functional/single_update.sh(1 hunks)research/template_project/tests/test_suites/llm/common.env(1 hunks)research/template_project/tests/test_suites/llm/single_update_1n8g.sh(1 hunks)research/template_project/tests/unit/test_data_utils.py(1 hunks)tests/functional/L1_Functional_Tests_GPU.sh(1 hunks)tests/unit/L0_Unit_Tests_Other.sh(1 hunks)tools/copyright.sh(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
research/template_project/tests/unit/test_data_utils.pynemo_rl/distributed/virtual_cluster.pyresearch/template_project/template_project/data_utils.pynemo_rl/utils/venvs.pyresearch/template_project/single_update.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/utils/venvs.py
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
research/template_project/tests/test_suites/llm/single_update_1n8g.shresearch/template_project/tests/functional/single_update.shtests/unit/L0_Unit_Tests_Other.shtests/functional/L1_Functional_Tests_GPU.shtools/copyright.sh
🧠 Learnings (3)
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.py : Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Applied to files:
research/template_project/tests/unit/test_data_utils.pytools/copyright.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
research/template_project/configs/recipes/llm/single_update_1n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
PR: NVIDIA-NeMo/RL#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to **/*.sh : Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Applied to files:
tools/copyright.sh
🧬 Code graph analysis (4)
research/template_project/tests/unit/test_data_utils.py (1)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
research/template_project/tests/test_suites/llm/common.env (1)
exit_if_max_steps_reached(15-23)
research/template_project/template_project/data_utils.py (2)
nemo_rl/distributed/batched_data_dict.py (1)
BatchedDataDict(75-860)research/template_project/tests/unit/test_data_utils.py (1)
tokenizer(23-27)
research/template_project/single_update.py (7)
research/template_project/template_project/data_utils.py (1)
create_batch_from(21-47)nemo_rl/algorithms/grpo.py (1)
refit_policy_generation(498-566)nemo_rl/algorithms/loss_functions.py (1)
NLLLoss(362-438)nemo_rl/algorithms/utils.py (1)
get_tokenizer(157-288)nemo_rl/distributed/virtual_cluster.py (3)
RayVirtualCluster(177-435)init_ray(84-170)shutdown(407-426)nemo_rl/models/generation/__init__.py (1)
configure_generation_config(24-45)nemo_rl/models/generation/vllm/vllm_generation.py (1)
VllmGeneration(47-838)
🪛 markdownlint-cli2 (0.18.1)
research/template_project/README.md
110-110: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.3)
nemo_rl/utils/venvs.py
98-98: subprocess call: check for execution of untrusted input
(S603)
98-98: Starting a process with a partial executable path
(S607)
🪛 Shellcheck (0.11.0)
research/template_project/tests/test_suites/llm/single_update_1n8g.sh
[warning] 12-12: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 24-24: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 33-33: Double quote array expansions to avoid re-splitting elements.
(SC2068)
research/template_project/tests/functional/single_update.sh
[error] 27-27: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/unit/L0_Unit_Tests_Other.sh
[warning] 41-41: Quote this to prevent word splitting.
(SC2046)
tests/functional/L1_Functional_Tests_GPU.sh
[warning] 36-36: Quote this to prevent word splitting.
(SC2046)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (9)
research/template_project/.python-version (1)
1-1: LGTM!The Python version specification is correct and follows the standard format for
.python-versionfiles..github/CODEOWNERS (1)
63-63: LGTM!The code ownership entry is correctly formatted and appropriately assigns ownership of the template project.
research/template_project/pyproject.toml (1)
1-31: LGTM!The project configuration is well-structured with appropriate build system, dependencies, and test setup. The workspace configuration correctly references
nemo-rlas a workspace dependency.research/template_project/README.md (1)
1-126: LGTM!The README is comprehensive, well-organized, and provides clear guidance for using the template project. The documentation covers all essential aspects including setup, testing, dependencies, and citation information.
Note: The markdownlint warning about the fenced code block at line 110 is acceptable in this context, as it's displaying a contact information template rather than executable code.
pyproject.toml (1)
176-181: LGTM!The workspace member addition is correctly configured, and the accompanying comments provide valuable context about the rationale for including research projects in the workspace to share the global
uv.lock. This helps prevent dependency conflicts, particularly with numpy versioning.tools/copyright.sh (2)
22-22: LGTM!Adding the
research/directory to the copyright check scope is appropriate and ensures new research projects comply with copyright requirements.
23-26: Good defensive programming.The empty file check prevents attempting to extract copyright headers from empty files, which is a sensible safeguard.
research/README.md (1)
1-50: LGTM!The research directory README provides comprehensive guidance for contributors, with clear expectations for acceptance criteria, code reviews, ownership, and testing. The documentation strikes a good balance between encouraging contributions and maintaining quality standards.
research/template_project/tests/test_suites/llm/single_update_1n8g.sh (1)
12-13: Verify unused variables NUM_RUNS and NUM_MINUTES.They appear unused (SC2034). Remove them or export if used by the harness.
As per coding guidelines
e7def21 to
f9ea697
Compare
6e85980 to
c0667e5
Compare
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Olivier Delalleau <507137+odelalleau@users.noreply.github.com>
This PR introduces a
research/directory and a sample template project that also serves as an exemplar of how to construct a loop using nemo-rl utilities. It also doesn't introduce any data module so it is as minimal as possible.The project is a "uv workspace" meaning it shares the global uv.lock; this is necessary to ensure we can make sure each research project dependencies are aware of nemo-rl's dependencies. The dependency chain is the research project depends on
nemo_rlas a direct dependency.Summary by CodeRabbit