Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 34 additions & 13 deletions nemo_skills/training/nemo_rl/start_grpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,7 +230,9 @@ def setup_data(
"env_cls",
"nemo_skills.training.nemo_rl.environments.math_environment.MathEnvironment",
)
ACTOR_ENVIRONMENT_REGISTRY[env_cls_path] = PY_EXECUTABLES.SYSTEM
py_executable_str = env_configs["math"].get("py_executable", "system")
py_executable = getattr(PY_EXECUTABLES, py_executable_str.upper())
ACTOR_ENVIRONMENT_REGISTRY[env_cls_path] = py_executable

module_name, class_name = env_cls_path.rsplit(".", 1)
env_module = importlib.import_module(module_name)
Expand Down Expand Up @@ -315,18 +317,37 @@ def main() -> None:
val_task_to_env,
) = setup_data(tokenizer, config["data"], config["env"])

(
policy,
policy_generation,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup(config, tokenizer, dataset, val_dataset)
setup_result = setup(config, tokenizer, dataset, val_dataset)

if len(setup_result) == 10: # Nemo-RL main branch
(
policy,
policy_generation,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
elif len(setup_result) == 11: # For Nano/Super nemo-RL
(
policy,
policy_generation,
nemo_gym_actor,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
else:
raise ValueError(f"Expected 10 or 11 elements in setup_result, got {len(setup_result)}")
Comment on lines +320 to +350
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find where grpo_train and async_grpo_train are called in the file
rg "grpo_train|async_grpo_train" nemo_skills/training/nemo_rl/start_grpo.py -A 10 -B 2

Repository: NVIDIA-NeMo/Skills

Length of output: 1838


🏁 Script executed:

# Check if nemo_gym_actor is used anywhere else in the same file
rg "nemo_gym_actor" nemo_skills/training/nemo_rl/start_grpo.py

Repository: NVIDIA-NeMo/Skills

Length of output: 89


🏁 Script executed:

# Find the function definitions of grpo_train and async_grpo_train in the codebase
fd -e py . | xargs rg "def (grpo_train|async_grpo_train)" --max-count=5

Repository: NVIDIA-NeMo/Skills

Length of output: 46


🏁 Script executed:

# Check the setup function to understand what it returns and the context
rg "def setup" nemo_skills/training/nemo_rl/start_grpo.py -A 50 | head -80

Repository: NVIDIA-NeMo/Skills

Length of output: 1983


🏁 Script executed:

# Get more context around the grpo_train and async_grpo_train calls to see full parameter list
rg "async_grpo_train\(" nemo_skills/training/nemo_rl/start_grpo.py -A 15 | head -40

Repository: NVIDIA-NeMo/Skills

Length of output: 631


🏁 Script executed:

# Check the actual line numbers around 320-350 to verify the exact context
sed -n '310,370p' nemo_skills/training/nemo_rl/start_grpo.py

Repository: NVIDIA-NeMo/Skills

Length of output: 2047


Prefix unused nemo_gym_actor with underscore.

In the 11-element branch (Nano/Super nemo-RL), nemo_gym_actor is unpacked at line 339 but never passed to grpo_train() or async_grpo_train(). Prefix it with an underscore to signal intent and silence linting warnings:

-            nemo_gym_actor,
+            _nemo_gym_actor,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
setup_result = setup(config, tokenizer, dataset, val_dataset)
if len(setup_result) == 10: # Nemo-RL main branch
(
policy,
policy_generation,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
elif len(setup_result) == 11: # For Nano/Super nemo-RL
(
policy,
policy_generation,
nemo_gym_actor,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
else:
raise ValueError(f"Expected 10 or 11 elements in setup_result, got {len(setup_result)}")
setup_result = setup(config, tokenizer, dataset, val_dataset)
if len(setup_result) == 10: # Nemo-RL main branch
(
policy,
policy_generation,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
elif len(setup_result) == 11: # For Nano/Super nemo-RL
(
policy,
policy_generation,
_nemo_gym_actor,
cluster,
dataloader,
val_dataloader,
loss_fn,
logger,
checkpointer,
grpo_state,
master_config,
) = setup_result
else:
raise ValueError(f"Expected 10 or 11 elements in setup_result, got {len(setup_result)}")
🧰 Tools
🪛 Ruff (0.14.14)

[warning] 339-339: Unpacked variable nemo_gym_actor is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


[warning] 340-340: Unpacked variable cluster is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)


[warning] 350-350: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@nemo_skills/training/nemo_rl/start_grpo.py` around lines 320 - 350, The
11-element unpack from setup_result introduces nemo_gym_actor which is unused;
rename it to _nemo_gym_actor (prefix with underscore) in the unpack of
setup_result to signal intent and silence linters where the 11-element branch is
handled, ensuring other variables (policy, policy_generation, cluster,
dataloader, val_dataloader, loss_fn, logger, checkpointer, grpo_state,
master_config) remain unchanged and that downstream calls to grpo_train() or
async_grpo_train() are left as-is.


# Check if async mode is enabled
if "async_grpo" in config["grpo"] and config["grpo"]["async_grpo"]["enabled"]:
Expand Down