feat: Add Penguin run by bxyu-nvidia · Pull Request #1481 · NVIDIA-NeMo/RL

bxyu-nvidia · 2025-11-06T18:43:00Z

What does this PR do ?

This PR:

Contributes the run_grpo_penguin.py example file along with one example config for Qwen 3 4B Instruct
Adds the Penguin sanity test util file
Updates the Penguin code and corresponding tests

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added GRPO training support with configuration and training script for the Qwen-3-4B-Instruct model
- Integrated tokenizer handling for improved generation and result processing
- Added performance timing instrumentation to rollout operations
- Added sanity test suite for Penguin environment validation

Signed-off-by: Brian Yu <bxyu@nvidia.com>

coderabbitai · 2025-11-06T18:48:58Z

📝 Walkthrough

Walkthrough

This PR introduces GRPO training support for the Penguin environment in NeMo RL. Changes include a YAML configuration file, a main orchestration script for GRPO training, sanity test automation, tokenizer and timing integration into Penguin rollout APIs, timing metric collection in rollout processing, and corresponding test updates.

Changes

Cohort / File(s)	Summary
Configuration and Examples `examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml`	New YAML configuration file defining GRPO training hyperparameters, model and tokenizer settings, loss functions, optimizer/scheduler configuration, generation backend (vLLM), dataset paths, and cluster hardware specifications.
Training and Orchestration `examples/penguin/run_grpo_penguin.py`	New Python script orchestrating GRPO training with Penguin; includes CLI arg parsing, config loading, tokenizer/generation setup, Penguin-specific config application, dataset preparation (train/validation from JSONL), Ray initialization, Penguin actor creation with health checks, and branching between trajectory collection and training execution paths.
Sanity Testing Automation `examples/penguin/run_penguin_single_node_sanity_tests.sh`	New shell script automating sanity test execution; runs environment sync, Ray cleanup, and comprehensive unit tests covering vLLM generation, tokenization, math environments, NeMo Gym, and end-to-end Penguin rollout validation.
Penguin Environment API Updates `nemo_rl/environments/penguin.py`	Updated `run_rollouts` method signature to accept tokenizer and timer_prefix; now returns tuple of (results, timing_metrics); updated `_postprocess_penguin_to_nemo_rl_result` to accept tokenizer for token-id decoding; hardened global config defaults and added diagnostic logging.
Rollout Timing Instrumentation `nemo_rl/experience/rollouts.py`	Added Timer-based instrumentation to `run_async_penguin_rollout`; wraps metric preparation, aggregation, and per-agent metrics sections with timing context; timing metrics merged into rollout_metrics; removed `_tensorize_by_key` utility; enhanced per-agent result logging.
Test Updates `tests/unit/environments/test_penguin.py`, `tests/unit/experience/test_rollouts.py`	Updated `test_penguin_sanity` to accept penguin_tokenizer and handle tuple return from `run_rollouts`; converts token_ids to lists; standardizes prompt_str/generation_str fields; updated test expectations to include timing metrics under rollout_metrics.

Sequence Diagram

sequenceDiagram
    participant User
    participant MainScript as run_grpo_penguin.py
    participant Config as Hydra Config
    participant Dataset as Datasets
    participant Ray
    participant PenguinActor
    participant TrainingLoop as Training/Collection

    User->>MainScript: python run_grpo_penguin.py --config=...
    MainScript->>Config: Load YAML + parse overrides
    Config-->>MainScript: MasterConfig
    MainScript->>Dataset: setup_single_penguin_dataset()
    Dataset-->>MainScript: train_dataset, val_dataset
    MainScript->>Ray: init_ray()
    Ray-->>MainScript: Ray initialized
    MainScript->>MainScript: setup(config, datasets...)
    MainScript-->>MainScript: policy, dataloader, loss_fn, etc.
    
    alt is_trajectory_collection == True
        MainScript->>PenguinActor: Create with runtime_env
        PenguinActor-->>MainScript: health check passed
        MainScript->>MainScript: collect_trajectories()
        MainScript->>PenguinActor: run_async_penguin_rollout(tokenizer, timer_prefix)
        PenguinActor-->>MainScript: rollout results + timing_metrics
        MainScript->>Dataset: Log to trajectory_collection.jsonl
    else
        MainScript->>TrainingLoop: grpo_train(policy, dataloader, ...)
        TrainingLoop-->>MainScript: training complete
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

API signature changes in penguin.py: run_rollouts now requires tokenizer and timer_prefix parameters; return type changed from list[dict] to tuple[list[dict], dict]; _postprocess_penguin_to_nemo_rl_result updated to accept tokenizer—ensure callers are updated consistently
Orchestration complexity in run_grpo_penguin.py: orchestrates multiple components (Ray, Penguin actor, config, datasets); branching logic between trajectory collection and training paths requires careful validation
Trajectory collection logic: dataset repeat patterns, JSONL loading and logging, async rollout handling with tokenizer integration
Timing instrumentation: ensure timer context blocks capture all relevant phases and don't introduce performance regressions

Possibly related PRs

PR #1327: Introduces Penguin environment and registration—foundational work that this PR builds upon by extending Penguin APIs
PR #1450: Modifies Penguin integration paths in nemo_rl/environments/penguin.py and rollout handling; overlapping changes to the same core modules

Suggested labels

CI:L0

Suggested reviewers

parthchadha
terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 46.15% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR implements major feature additions with public API changes but PR description is empty template with no documented test results, performance data, or regression testing confirmation.	Update PR description to document test execution results, confirm existing unit tests pass with no regressions, validate performance/numeric impact from timing changes, and provide CI/CD pipeline results demonstrating successful test execution.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Add Penguin run' directly relates to the changeset's main objective of adding Penguin training and evaluation functionality, including new configuration files, scripts, and environment setup code.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bxyu/run-penguin

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

examples/penguin/run_grpo_penguin.py (1)
216-222: Consider extracting long error message to exception class or docstring.

While the detailed error message is helpful, consider moving it to a custom exception class or module-level constant for better maintainability.

Example:
class UnsupportedPenguinConfigError(ValueError):
    """A non-null `grpo.max_val_samples` parameter is not supported.
    
    Gym principle is that there is no hidden data pre or post processing from you.
    What you see is what you get.
    
    The validation set you pass in will directly be used for validation with no
    additional preprocessing. If you want to have some number of repetitions,
    please include that in your dataset, via `num_repeats`, in your dataset
    config and `ng_prepare_data` will prepare it accordingly.
    """

# Then use:
raise UnsupportedPenguinConfigError()
nemo_rl/experience/rollouts.py (1)
1036-1102: LGTM! Metric aggregation and per-agent logging properly implemented.

The timing instrumentation around metric aggregation and per-agent metric calculation provides good observability. The full result logging via wandb.Table enables detailed debugging.

Consider adding strict=True to the zip on line 1076 for defensive programming, though the paired nature of penguin_rows and results makes this low risk:
-        for penguin_row, result in zip(penguin_rows, results):
+        for penguin_row, result in zip(penguin_rows, results, strict=True):

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6984ba7 and bd84c00.

📒 Files selected for processing (7)

examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml (1 hunks)
examples/penguin/run_grpo_penguin.py (1 hunks)
examples/penguin/run_penguin_single_node_sanity_tests.sh (1 hunks)
nemo_rl/environments/penguin.py (4 hunks)
nemo_rl/experience/rollouts.py (3 hunks)
tests/unit/environments/test_penguin.py (3 hunks)
tests/unit/experience/test_rollouts.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/experience/rollouts.py
nemo_rl/environments/penguin.py
tests/unit/experience/test_rollouts.py
tests/unit/environments/test_penguin.py
examples/penguin/run_grpo_penguin.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/experience/rollouts.py
nemo_rl/environments/penguin.py

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

examples/penguin/run_penguin_single_node_sanity_tests.sh

🧠 Learnings (5)

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

nemo_rl/experience/rollouts.py
examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to nemo_rl/**/*.py : Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults

Applied to files:

nemo_rl/environments/penguin.py

📚 Learning: 2025-09-18T14:57:31.003Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.

Applied to files:

examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

examples/penguin/run_penguin_single_node_sanity_tests.sh

📚 Learning: 2025-09-19T07:28:29.887Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

examples/penguin/run_penguin_single_node_sanity_tests.sh

🧬 Code graph analysis (5)

nemo_rl/experience/rollouts.py (2)

nemo_rl/utils/timer.py (5)

Timer (22-248)

start (79-83)

time (110-123)

stop (85-107)

get_timing_metrics (196-233)

nemo_rl/environments/penguin.py (1)

run_rollouts (106-138)

nemo_rl/environments/penguin.py (1)

nemo_rl/utils/timer.py (3)

Timer (22-248)

time (110-123)

get_timing_metrics (196-233)

examples/penguin/run_penguin_single_node_sanity_tests.sh (1)

tests/unit/environments/test_penguin.py (1)

penguin (80-130)

tests/unit/environments/test_penguin.py (1)

nemo_rl/environments/penguin.py (1)

run_rollouts (106-138)

examples/penguin/run_grpo_penguin.py (13)

nemo_rl/models/policy/interfaces.py (1)

ColocatablePolicyInterface (141-168)

nemo_rl/models/generation/interfaces.py (1)

GenerationInterface (215-254)

nemo_rl/utils/logger.py (2)

Logger (804-1039)

get_next_experiment_dir (1328-1362)

nemo_rl/algorithms/grpo.py (2)

_should_use_penguin (728-748)

refit_policy_generation (751-822)

nemo_rl/algorithms/utils.py (1)

get_tokenizer (184-315)

nemo_rl/data/datasets/processed_dataset.py (1)

AllTaskProcessedDataset (31-126)

nemo_rl/data/interfaces.py (1)

DatumSpec (32-40)

nemo_rl/distributed/ray_actor_environment_registry.py (1)

get_actor_python_env (49-64)

nemo_rl/distributed/virtual_cluster.py (1)

init_ray (85-171)

nemo_rl/environments/penguin.py (5)

Penguin (34-209)

PenguinConfig (27-30)

penguin_example_to_nemo_rl_datum_spec (235-248)

setup_penguin_config (217-226)

health_check (103-104)

nemo_rl/experience/rollouts.py (1)

run_async_penguin_rollout (958-1145)

nemo_rl/models/generation/__init__.py (1)

configure_generation_config (25-54)

nemo_rl/utils/config.py (1)

parse_hydra_overrides (146-166)

🪛 Ruff (0.14.3)

nemo_rl/experience/rollouts.py

1076-1076: zip() without an explicit strict= parameter

Add explicit value for parameter strict=

(B905)

examples/penguin/run_grpo_penguin.py

103-103: Do not assign a lambda expression, use a def

Rewrite passthrough_task_processor as a def

(E731)

103-103: Unused lambda argument: args

(ARG005)

103-103: Unused lambda argument: kwargs

(ARG005)

216-222: Avoid specifying long messages outside the exception class

(TRY003)

239-239: Unpacked variable cluster is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

🪛 Shellcheck (0.11.0)

examples/penguin/run_penguin_single_node_sanity_tests.sh

[error] 1-1: Tips depend on target shell and yours is unknown. Add a shebang or a 'shell' directive.

(SC2148)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (12)

tests/unit/experience/test_rollouts.py (1)

785-792: LGTM! Timing metrics properly integrated.

The new timing metric keys align with the Timer instrumentation added to the rollout flow. The test correctly normalizes these values to None during comparison, avoiding flakiness from non-deterministic timing.

tests/unit/environments/test_penguin.py (2)

145-166: LGTM! Test properly updated for new signature.

The test correctly passes the penguin_tokenizer fixture to run_rollouts and handles the updated return type (tuple of results and timing metrics). The empty timer_prefix is appropriate for test context.

169-191: LGTM! Proper normalization for reproducible tests.

The token_ids conversion and dummy value substitution for prompt_str/generation_str fields ensure deterministic test comparisons while accommodating the new decoded string fields added to penguin results.

nemo_rl/experience/rollouts.py (4)

51-51: LGTM! Timer instrumentation properly initialized.

The Timer import and initialization follow the correct pattern. The timer_prefix convention ("timing/rollout") provides clear namespacing for timing metrics.

Also applies to: 991-993

1007-1013: LGTM! Rollout timing properly captured.

The timer context around penguin_environment.run_rollouts correctly measures end-to-end rollout time, and the tuple unpacking matches the updated Penguin environment signature.

1016-1033: LGTM! Metric preparation properly timed.

The timer context around metric preparation provides visibility into preprocessing overhead. The metrics calculation correctly extracts token counts, rewards, and turns from Penguin results.

1108-1109: LGTM! Timer properly finalized and merged.

The timer is correctly stopped and timing metrics are merged into rollout_metrics using sum reduction, which is appropriate for cumulative timing measurements across the batch.

examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml (1)

1-267: LGTM! Comprehensive GRPO configuration for Penguin.

The configuration properly sets up GRPO training with the Qwen3-4B-Instruct model for Penguin integration. Key settings are correctly configured:

async_engine: true and expose_http_server: true (required for Penguin)

max_new_tokens set to full context length (per codebase pattern)

Tool parser configured for Qwen 3 4B Instruct

Data paths point to Penguin workspace

The configuration aligns with the new Penguin integration in the codebase.

Based on learnings

nemo_rl/environments/penguin.py (4)

19-19: LGTM! Required imports for new functionality.

The PreTrainedTokenizerBase import enables tokenizer-aware postprocessing, and the Timer import supports the new timing instrumentation in rollout execution.

Also applies to: 24-24

51-68: LGTM! More defensive config handling with helpful diagnostics.

The use of get() with a fallback and setdefault() for aiohttp limits makes the initialization more robust. The diagnostic print provides visibility into connection limit settings, which is helpful for debugging concurrency issues.

106-138: LGTM! Comprehensive timing instrumentation added.

The updated signature properly accepts a tokenizer and timer_prefix, enabling downstream string decoding and timing metrics. The timing instrumentation provides excellent observability:

Per-task await and postprocessing times

Total time tracking

Postprocess percentage calculation

The tuple return (nemo_rl_results, timing_metrics) is properly handled by callers.

140-198: LGTM! Tokenizer integration with memory optimization.

The postprocessing now accepts a tokenizer to decode token IDs into human-readable strings. The approach of:

Converting token IDs to torch tensors for NeMo RL compatibility

Decoding to strings for logging

Popping large tensor fields from the result

...provides a good balance between functionality and memory efficiency. The prompt_str and generation_str fields improve debugging without bloating logs with full token ID tensors.

examples/penguin/run_grpo_penguin.py

examples/penguin/run_penguin_single_node_sanity_tests.sh

examples/penguin/run_grpo_penguin.py

Signed-off-by: Brian Yu <bxyu@nvidia.com>

examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

bxyu-nvidia added 2 commits November 6, 2025 10:40

copy over

dd4b8c4

Signed-off-by: Brian Yu <bxyu@nvidia.com>

precommit

bd84c00

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia requested review from a team as code owners November 6, 2025 18:43

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

examples/penguin/run_grpo_penguin.py Show resolved Hide resolved

examples/penguin/run_grpo_penguin.py Show resolved Hide resolved

examples/penguin/run_penguin_single_node_sanity_tests.sh Show resolved Hide resolved

parthchadha reviewed Nov 6, 2025

View reviewed changes

examples/penguin/run_grpo_penguin.py Outdated Show resolved Hide resolved

remove comment

61b1088

Signed-off-by: Brian Yu <bxyu@nvidia.com>

parthchadha previously approved these changes Nov 6, 2025

View reviewed changes

examples/penguin/grpo_dapo17k_bytedtsinghua_qwen3_4binstruct_nf.yaml Show resolved Hide resolved

add missing mcore fields

b75fb92

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia dismissed parthchadha’s stale review via b75fb92 November 7, 2025 00:29

parthchadha approved these changes Nov 7, 2025

View reviewed changes

Merge branch 'main' into bxyu/run-penguin

f252803

parthchadha added the CI:L0 Run doctests and unit tests label Nov 7, 2025

parthchadha temporarily deployed to nemo-ci November 7, 2025 06:19 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) November 7, 2025 07:30

terrykong merged commit 40de222 into main Nov 7, 2025
40 of 41 checks passed

terrykong deleted the bxyu/run-penguin branch November 7, 2025 08:47

PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025

feat: Add Penguin run (NVIDIA-NeMo#1481)

28c5368

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

coderabbitai bot mentioned this pull request Dec 2, 2025

chore: rename penguin -> nemo_gym and add the gym submodule #1587

Merged

4 tasks

DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026

feat: Add Penguin run (NVIDIA-NeMo#1481)

2d03603

Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com>

coderabbitai bot mentioned this pull request Feb 20, 2026

feat: Nano v3 RL Recipe #1989

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Penguin run#1481

feat: Add Penguin run#1481
terrykong merged 5 commits intomainfrom
bxyu/run-penguin

bxyu-nvidia commented Nov 6, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 6, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bxyu-nvidia commented Nov 6, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 6, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bxyu-nvidia commented Nov 6, 2025 •

edited by coderabbitai bot

Loading