docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] by bxyu-nvidia · Pull Request #481 · NVIDIA-NeMo/Gym

bxyu-nvidia · 2025-12-11T01:19:59Z

No description provided.

Signed-off-by: Brian Yu <bxyu@nvidia.com>

copy-pr-bot · 2025-12-11T01:20:02Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…2e-grpo-tut

copy-pr-bot · 2025-12-11T04:39:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

docs/tutorials/nemo-rl-grpo/multi-node-training.md

…2e-grpo-tut Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

…2e-grpo-tut

docs/tutorials/nemo-rl-grpo/single-node-training.md

shashank3959 · 2025-12-12T23:54:25Z

docs/tutorials/nemo-rl-grpo/setup.md

+
+# Initialize all submodules (Megatron, AutoModel, etc.)
+git submodule update --init --recursive
+


I see you've removed this bit. Without this, I observe issues with the Ray stashed in the container.

rm -rf /opt/ray_venvs/*

docs/tutorials/nemo-rl-grpo/setup.md

docs/tutorials/nemo-rl-grpo/single-node-training.md

Signed-off-by: Frankie Siino <fsiino@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com>

…b914746e00835f449d2e5649997f6ed6523615f1 Signed-off-by: Brian Yu <bxyu@nvidia.com>

Signed-off-by: Brian Yu <bxyu@nvidia.com>

@bxyu-nvidia

commit 647d1e5 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Fri Dec 19 18:40:39 2025 -0800 Remove PlainTextResponse response_class (NVIDIA-NeMo#544) https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079 Initially in NVIDIA-NeMo#290 , the `response_class=PlainTextResponse` was added to the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt to debug parsing server info for the `ng_status` command. This lead to a parsing error in `load_from_global_config`. This command now uses it's own separate endpoint `server_instances`, so this needs to be removed. Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit f250e0c Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 16:38:29 2025 -0800 docs: remove trl docs (NVIDIA-NeMo#543) remove trl from docs, leaving just unsloth. was unclear that they are together. will make a trl section when we have a standalone trl notebook, or a section on trl's docs too. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> commit 34a2b0f Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 14:01:56 2025 -0800 add unsloth and trl to docs (NVIDIA-NeMo#536) adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both, but could be confusing. not sure if we should also add more info about multi-step (hopefully) coming soon. Signed-off-by: Christian Munley <cmunley@nvidia.com> commit 146b1a5 Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 12:56:33 2025 -0800 python flag for colab venv installation (NVIDIA-NeMo#526) need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For NVIDIA-NeMo#370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> commit ba2153a Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 10:42:44 2025 -0800 Salesforce xlam-function-calling-60k resources server (NVIDIA-NeMo#262) function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> commit 29d3511 Author: pjin-nvidia <pjin@nvidia.com> Date: Fri Dec 19 10:28:28 2025 -0800 VLLMModel supports chat template kwargs (NVIDIA-NeMo#538) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> commit 7d8fdda Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 17 18:38:18 2025 -0800 List running server health and status (NVIDIA-NeMo#290) This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 076d002 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Tue Dec 16 10:25:14 2025 -0800 Debug server package versions (NVIDIA-NeMo#406) Adds `ng_pip_list` command to see the underlying uv pip list of the specified environment. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit c192ee4 Author: Lawrence Lane <llane@nvidia.com> Date: Tue Dec 16 12:19:31 2025 -0500 docs settings update (NVIDIA-NeMo#525) Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 8ca39d6 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Mon Dec 15 19:56:03 2025 -0800 docs: Miscellaneous GRPO tutorial fixes (NVIDIA-NeMo#512) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 1539b2b Author: Lawrence Lane <llane@nvidia.com> Date: Mon Dec 15 18:28:11 2025 -0500 docs: redirect setup (NVIDIA-NeMo#513) Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 96ccdfc Author: cmunley1 <cmunley@nvidia.com> Date: Mon Dec 15 14:31:59 2025 -0800 reasoning-gym resource server (NVIDIA-NeMo#113) single turn tasks across various domains: "Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL). The goal is to generate virtually infinite training data with adjustable complexity. It currently provides more than 100 tasks over many domains, including but not limited to algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games." Tested all 100+ environments for errors, and tested training on many, demonstrated convergence. This dataset of 100+ environments is also used in ProRL (https://arxiv.org/abs/2505.24864) --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Co-authored-by: ARC Bot <arc-bot@example.com> commit 8c4c5e3 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:38:21 2025 -0800 Bump to v0.2.0 (NVIDIA-NeMo#510) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 3897ff4 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:28:58 2025 -0800 Change to v0.1.1 release version (NVIDIA-NeMo#509) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit b1bf0f4 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:24:49 2025 -0800 Update dataset configs with HuggingFace links (NVIDIA-NeMo#508) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 9a9177e Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:12:06 2025 -0800 docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] (NVIDIA-NeMo#481) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com> commit d3646c5 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 12:20:25 2025 -0800 Reorder README structure (NVIDIA-NeMo#501) move available environments higher up in the README after the quickstart Signed-off-by: Chris Wing <cwing@nvidia.com> commit b9cf8b2 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 08:13:32 2025 -0800 Simplify contributing.md (NVIDIA-NeMo#500) added links to contribute section of docs site and removed redundant content. links need to be verified after NVIDIA-NeMo#498 is merged to main --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit eabcbcf Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 07:43:04 2025 -0800 FAQ cleanup (NVIDIA-NeMo#499) This PR removes redundant content from the FAQ and better organizes the documentation structure. **Removed redundant FAQ sections** now covered in dedicated documentation: - `ng_version` → `docs/reference/cli-commands.md` - Config anatomy → `docs/reference/configuration.md` (section was incomplete TODO) - DCO and commit signing → `CONTRIBUTING.md` and `docs/contribute/development-setup.md` - Copyright errors → `docs/contribute/development-setup.md` - CI/CD requirements → `docs/contribute/development-setup.md` **Reorganized FAQ placement:** - Moved `docs/how-to-faq.md` → `docs/reference/faq.md` (consistent with other reference docs) - Repositioned FAQ to bottom of Reference section (after Configuration, CLI Commands, API Reference) - Updated intro to clarify FAQ provides quick answers while comprehensive docs are developed --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit fc59615 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 07:38:48 2025 -0800 Add environment contribution docs (NVIDIA-NeMo#498) Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 39ee39e Author: Chris Wing <cwing@nvidia.com> Date: Thu Dec 11 15:52:13 2025 -0800 Docs: Contribution Home & Dev Setup (NVIDIA-NeMo#494) Added types of contribution to contribution overview and replicated dev setup instructions from contributing.md to docs --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit aa48c20 Author: Chris Wing <cwing@nvidia.com> Date: Thu Dec 11 14:16:47 2025 -0800 improve framing of training framework integration guide for contributing (NVIDIA-NeMo#493) Make it more clear this guide is for contributing training framework integrations Signed-off-by: Chris Wing <cwing@nvidia.com> commit a4cfd5e Author: pjin-nvidia <pjin@nvidia.com> Date: Thu Dec 11 13:31:09 2025 -0800 Misc rollout fixes (NVIDIA-NeMo#447) Signed-off-by: Peter Jin <pjin@nvidia.com> commit def5fdd Author: L.B. <llane@nvidia.com> Date: Thu Dec 11 15:00:38 2025 -0500 docs: contribute section (NVIDIA-NeMo#490) - move training content into new contribute section - create contributing overview page - add contributing section on home page with link to RL integrations content hub --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 8f4d638 Author: L.B. <llane@nvidia.com> Date: Thu Dec 11 14:17:03 2025 -0500 docs: move FAQ (NVIDIA-NeMo#489) moves how-to-faq to render under "references" and display as FAQ. no material changes to the content. Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 54b21db Author: bxyu-nvidia <bxyu@nvidia.com> Date: Thu Dec 11 10:27:28 2025 -0800 Fix NeMo Gym Pyproject links (NVIDIA-NeMo#486) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 82f0f0c Author: fsiino-nvidia <fsiino@nvidia.com> Date: Thu Dec 11 10:18:58 2025 -0800 More single tool call filename updates cont (NVIDIA-NeMo#484) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 8654ecf Author: L.B. <llane@nvidia.com> Date: Wed Dec 10 22:08:20 2025 -0500 docs: home pg, quickstart move, gh icon (NVIDIA-NeMo#463) - adds GH icon + link to global top nav - rebuilds the home page to standard layout - adds CTA to quickstart and tutorials - moves quickstart into get started - clarifies differences between the quickstart and more detailed onboarding materials --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> commit c345e5d Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 19:05:20 2025 -0800 Fix duplicate reference sections (NVIDIA-NeMo#483) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit be25806 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 17:24:13 2025 -0800 docs: Fix wrong count vs actual (NVIDIA-NeMo#482) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit a3417ce Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 16:58:55 2025 -0800 More single tool call filename updates (NVIDIA-NeMo#480) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 25808bf Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 16:36:05 2025 -0800 Rename examples simple_weather and stateful_counter (NVIDIA-NeMo#479) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit bf0b0c5 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 15:44:25 2025 -0800 Expose server host and port in dataset viewer CLI (NVIDIA-NeMo#476) Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/126 @bxyu-nvidia Per the issue, the PR also changes the default `server_host` to `0.0.0.0` (accessible from everywhere). But I would advise against this for security reasons. I think keeping the default to `127.0.0.1` is the right call even if the user needs to modify the command to access the server. --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit 993543a Author: pjin-nvidia <pjin@nvidia.com> Date: Wed Dec 10 14:38:22 2025 -0800 Miscellaneous infra improvements/fixes (NVIDIA-NeMo#317) should resolve NVIDIA-NeMo#342 Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Peter Jin <pjin@nvidia.com> commit 845bf71 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 14:15:07 2025 -0800 pyproject typos and grammar fixes (NVIDIA-NeMo#473) Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/132 Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit 81a0013 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 14:11:08 2025 -0800 docs: Improve server reference info (NVIDIA-NeMo#474) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 1d78f22 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 13:50:27 2025 -0800 Bug: inconsistent documentation around servers running (NVIDIA-NeMo#472) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 9f26473 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 13:25:42 2025 -0800 docs: Training framework integration (NVIDIA-NeMo#439) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit f67fa48 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 13:19:24 2025 -0800 Remove penguin references (NVIDIA-NeMo#469) After this PR, the only remaining penguin references are in the NeMo-RL tutorial, but these should be fixed with tutorial rewrite. Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/131 Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit eecb93c Author: L.B. <llane@nvidia.com> Date: Wed Dec 10 16:13:44 2025 -0500 docs(readme): fix Example Resource Servers table - correct Multi Step… (NVIDIA-NeMo#464) Update 'Demonstrates' column for Multi Step example: - Before: Instruction_Following example - After: Multi-step tool calling Fixes NVIDIA-NeMo#417 --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 0e367c2 Author: Sanjay Kariyappa <sanjaykariyappa@users.noreply.github.com> Date: Thu Dec 11 02:38:51 2025 +0530 add calendar env for multi-turn IF (NVIDIA-NeMo#297) This PR introduces the **Calendar Resource Server**, a new training environment that challenges models to schedule multiple events on a calendar while satisfying complex temporal constraints. The constraints are mentioned in a multi-turn conversation format (generated synthetically using a role-playing model). Achieving high performance on this benchmark requires the model to satisfy constraints mentioned in different user turns. When trained on this synthetic dataset, we observe an improvement in the model's multi-turn instruction following ability. The Calendar environment simulates a realistic scheduling task where an AI agent must: - Schedule multiple events within a working day time window - Satisfy various temporal constraints: - **"before"**: Event must end before a specific time - **"after"**: Event must start after a specific time - **"between"**: Event must start and end within a time window - **"at"**: Event must start at an exact time - Ensure no time conflicts between events - Match exact event durations - Stay within global min/max time boundaries This environment tests an agent's ability to: - Parse and understand natural language constraints. - Follow instructions that are mentioned in multiple user messages. - Infer scheduling conflicts and satisfy multiple constraints simultaneously. - Perform temporal reasoning and arithmetic. - **4 constraint types**: before, after, between, at - **Time window enforcement**: Global min/max boundaries for all events - **Conflict detection**: Automatic validation of event overlaps - **Duration matching**: Exact duration requirements per event The server includes a robust verification pipeline that: - Extracts JSON schedules from model responses - Validates all temporal constraints - Detects overlapping events - Returns binary rewards (1 for valid, 0 for invalid) - Filters out responses with thinking tags (`<think>`) - Script to generate diverse scheduling scenarios - Configurable number of events and constraint types - Natural language constraint descriptions - Validation data included - Tests for each constraint type (valid and violation cases) - Edge cases: empty schedules, wrong event counts, time conflicts - Complex multi-event scenarios Qwen3-8b shows steady improvement in rewards when trained with GRPO with a dataset of 4K synthetic samples. Wandb logs are below. https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/t4v06nbg https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/70yc23ew https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/1jnwuhi3 --------- Signed-off-by: Sanjay Kariyappa <skariyappa@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit a182171 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 12:58:07 2025 -0800 Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material (NVIDIA-NeMo#470) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit d8ecb8b Author: Chris Wing <cwing@nvidia.com> Date: Wed Dec 10 10:56:29 2025 -0800 Add benefits to About page aligned with README (NVIDIA-NeMo#452) Fixes NVIDIA-NeMo#451 Signed-off-by: Chris Wing <cwing@nvidia.com> commit e08906c Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 10:35:33 2025 -0800 docs: Moved configuration system under about (NVIDIA-NeMo#420) Moved configuration systems under "About" instead of "About>Concepts". Also removed configuration mentions and examples from core abstraction pages Closes NVIDIA-NeMo#392 and NVIDIA-NeMo#393 --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> Signed-off-by: L.B <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: L.B <llane@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 7aa8306 Author: Chris Wing <cwing@nvidia.com> Date: Wed Dec 10 05:59:38 2025 -0800 Add Data Designer and links to ecosystem page (NVIDIA-NeMo#462) Fixes NVIDIA-NeMo#450 Signed-off-by: Chris Wing <cwing@nvidia.com> commit 287d08d Author: Chris Wing <cwing@nvidia.com> Date: Tue Dec 9 12:45:35 2025 -0800 Change NeMo Gym from framework to library (NVIDIA-NeMo#456) Changed description of NeMo Gym from a framework to library for consistency across NeMo products Signed-off-by: Chris Wing <cwing@nvidia.com>

…VIDIA-NeMo#481) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com>

@bxyu-nvidia

commit 647d1e5 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Fri Dec 19 18:40:39 2025 -0800 Remove PlainTextResponse response_class (NVIDIA-NeMo#544) https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079 Initially in NVIDIA-NeMo#290 , the `response_class=PlainTextResponse` was added to the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt to debug parsing server info for the `ng_status` command. This lead to a parsing error in `load_from_global_config`. This command now uses it's own separate endpoint `server_instances`, so this needs to be removed. Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit f250e0c Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 16:38:29 2025 -0800 docs: remove trl docs (NVIDIA-NeMo#543) remove trl from docs, leaving just unsloth. was unclear that they are together. will make a trl section when we have a standalone trl notebook, or a section on trl's docs too. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> commit 34a2b0f Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 14:01:56 2025 -0800 add unsloth and trl to docs (NVIDIA-NeMo#536) adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both, but could be confusing. not sure if we should also add more info about multi-step (hopefully) coming soon. Signed-off-by: Christian Munley <cmunley@nvidia.com> commit 146b1a5 Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 12:56:33 2025 -0800 python flag for colab venv installation (NVIDIA-NeMo#526) need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For NVIDIA-NeMo#370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> commit ba2153a Author: cmunley1 <cmunley@nvidia.com> Date: Fri Dec 19 10:42:44 2025 -0800 Salesforce xlam-function-calling-60k resources server (NVIDIA-NeMo#262) function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> commit 29d3511 Author: pjin-nvidia <pjin@nvidia.com> Date: Fri Dec 19 10:28:28 2025 -0800 VLLMModel supports chat template kwargs (NVIDIA-NeMo#538) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> commit 7d8fdda Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 17 18:38:18 2025 -0800 List running server health and status (NVIDIA-NeMo#290) This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 076d002 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Tue Dec 16 10:25:14 2025 -0800 Debug server package versions (NVIDIA-NeMo#406) Adds `ng_pip_list` command to see the underlying uv pip list of the specified environment. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit c192ee4 Author: Lawrence Lane <llane@nvidia.com> Date: Tue Dec 16 12:19:31 2025 -0500 docs settings update (NVIDIA-NeMo#525) Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 8ca39d6 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Mon Dec 15 19:56:03 2025 -0800 docs: Miscellaneous GRPO tutorial fixes (NVIDIA-NeMo#512) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 1539b2b Author: Lawrence Lane <llane@nvidia.com> Date: Mon Dec 15 18:28:11 2025 -0500 docs: redirect setup (NVIDIA-NeMo#513) Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 96ccdfc Author: cmunley1 <cmunley@nvidia.com> Date: Mon Dec 15 14:31:59 2025 -0800 reasoning-gym resource server (NVIDIA-NeMo#113) single turn tasks across various domains: "Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL). The goal is to generate virtually infinite training data with adjustable complexity. It currently provides more than 100 tasks over many domains, including but not limited to algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games." Tested all 100+ environments for errors, and tested training on many, demonstrated convergence. This dataset of 100+ environments is also used in ProRL (https://arxiv.org/abs/2505.24864) --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Co-authored-by: ARC Bot <arc-bot@example.com> commit 8c4c5e3 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:38:21 2025 -0800 Bump to v0.2.0 (NVIDIA-NeMo#510) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 3897ff4 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:28:58 2025 -0800 Change to v0.1.1 release version (NVIDIA-NeMo#509) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit b1bf0f4 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:24:49 2025 -0800 Update dataset configs with HuggingFace links (NVIDIA-NeMo#508) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 9a9177e Author: bxyu-nvidia <bxyu@nvidia.com> Date: Sun Dec 14 16:12:06 2025 -0800 docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] (NVIDIA-NeMo#481) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com> commit d3646c5 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 12:20:25 2025 -0800 Reorder README structure (NVIDIA-NeMo#501) move available environments higher up in the README after the quickstart Signed-off-by: Chris Wing <cwing@nvidia.com> commit b9cf8b2 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 08:13:32 2025 -0800 Simplify contributing.md (NVIDIA-NeMo#500) added links to contribute section of docs site and removed redundant content. links need to be verified after NVIDIA-NeMo#498 is merged to main --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit eabcbcf Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 07:43:04 2025 -0800 FAQ cleanup (NVIDIA-NeMo#499) This PR removes redundant content from the FAQ and better organizes the documentation structure. **Removed redundant FAQ sections** now covered in dedicated documentation: - `ng_version` → `docs/reference/cli-commands.md` - Config anatomy → `docs/reference/configuration.md` (section was incomplete TODO) - DCO and commit signing → `CONTRIBUTING.md` and `docs/contribute/development-setup.md` - Copyright errors → `docs/contribute/development-setup.md` - CI/CD requirements → `docs/contribute/development-setup.md` **Reorganized FAQ placement:** - Moved `docs/how-to-faq.md` → `docs/reference/faq.md` (consistent with other reference docs) - Repositioned FAQ to bottom of Reference section (after Configuration, CLI Commands, API Reference) - Updated intro to clarify FAQ provides quick answers while comprehensive docs are developed --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit fc59615 Author: Chris Wing <cwing@nvidia.com> Date: Fri Dec 12 07:38:48 2025 -0800 Add environment contribution docs (NVIDIA-NeMo#498) Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> commit 39ee39e Author: Chris Wing <cwing@nvidia.com> Date: Thu Dec 11 15:52:13 2025 -0800 Docs: Contribution Home & Dev Setup (NVIDIA-NeMo#494) Added types of contribution to contribution overview and replicated dev setup instructions from contributing.md to docs --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> commit aa48c20 Author: Chris Wing <cwing@nvidia.com> Date: Thu Dec 11 14:16:47 2025 -0800 improve framing of training framework integration guide for contributing (NVIDIA-NeMo#493) Make it more clear this guide is for contributing training framework integrations Signed-off-by: Chris Wing <cwing@nvidia.com> commit a4cfd5e Author: pjin-nvidia <pjin@nvidia.com> Date: Thu Dec 11 13:31:09 2025 -0800 Misc rollout fixes (NVIDIA-NeMo#447) Signed-off-by: Peter Jin <pjin@nvidia.com> commit def5fdd Author: L.B. <llane@nvidia.com> Date: Thu Dec 11 15:00:38 2025 -0500 docs: contribute section (NVIDIA-NeMo#490) - move training content into new contribute section - create contributing overview page - add contributing section on home page with link to RL integrations content hub --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 8f4d638 Author: L.B. <llane@nvidia.com> Date: Thu Dec 11 14:17:03 2025 -0500 docs: move FAQ (NVIDIA-NeMo#489) moves how-to-faq to render under "references" and display as FAQ. no material changes to the content. Signed-off-by: Lawrence Lane <llane@nvidia.com> commit 54b21db Author: bxyu-nvidia <bxyu@nvidia.com> Date: Thu Dec 11 10:27:28 2025 -0800 Fix NeMo Gym Pyproject links (NVIDIA-NeMo#486) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 82f0f0c Author: fsiino-nvidia <fsiino@nvidia.com> Date: Thu Dec 11 10:18:58 2025 -0800 More single tool call filename updates cont (NVIDIA-NeMo#484) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 8654ecf Author: L.B. <llane@nvidia.com> Date: Wed Dec 10 22:08:20 2025 -0500 docs: home pg, quickstart move, gh icon (NVIDIA-NeMo#463) - adds GH icon + link to global top nav - rebuilds the home page to standard layout - adds CTA to quickstart and tutorials - moves quickstart into get started - clarifies differences between the quickstart and more detailed onboarding materials --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> commit c345e5d Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 19:05:20 2025 -0800 Fix duplicate reference sections (NVIDIA-NeMo#483) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit be25806 Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 17:24:13 2025 -0800 docs: Fix wrong count vs actual (NVIDIA-NeMo#482) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit a3417ce Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 16:58:55 2025 -0800 More single tool call filename updates (NVIDIA-NeMo#480) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit 25808bf Author: fsiino-nvidia <fsiino@nvidia.com> Date: Wed Dec 10 16:36:05 2025 -0800 Rename examples simple_weather and stateful_counter (NVIDIA-NeMo#479) Signed-off-by: Frankie Siino <fsiino@nvidia.com> commit bf0b0c5 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 15:44:25 2025 -0800 Expose server host and port in dataset viewer CLI (NVIDIA-NeMo#476) Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/126 @bxyu-nvidia Per the issue, the PR also changes the default `server_host` to `0.0.0.0` (accessible from everywhere). But I would advise against this for security reasons. I think keeping the default to `127.0.0.1` is the right call even if the user needs to modify the command to access the server. --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit 993543a Author: pjin-nvidia <pjin@nvidia.com> Date: Wed Dec 10 14:38:22 2025 -0800 Miscellaneous infra improvements/fixes (NVIDIA-NeMo#317) should resolve NVIDIA-NeMo#342 Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Peter Jin <pjin@nvidia.com> commit 845bf71 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 14:15:07 2025 -0800 pyproject typos and grammar fixes (NVIDIA-NeMo#473) Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/132 Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit 81a0013 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 14:11:08 2025 -0800 docs: Improve server reference info (NVIDIA-NeMo#474) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 1d78f22 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 13:50:27 2025 -0800 Bug: inconsistent documentation around servers running (NVIDIA-NeMo#472) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit 9f26473 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 13:25:42 2025 -0800 docs: Training framework integration (NVIDIA-NeMo#439) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit f67fa48 Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 13:19:24 2025 -0800 Remove penguin references (NVIDIA-NeMo#469) After this PR, the only remaining penguin references are in the NeMo-RL tutorial, but these should be fixed with tutorial rewrite. Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/131 Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> commit eecb93c Author: L.B. <llane@nvidia.com> Date: Wed Dec 10 16:13:44 2025 -0500 docs(readme): fix Example Resource Servers table - correct Multi Step… (NVIDIA-NeMo#464) Update 'Demonstrates' column for Multi Step example: - Before: Instruction_Following example - After: Multi-step tool calling Fixes NVIDIA-NeMo#417 --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 0e367c2 Author: Sanjay Kariyappa <sanjaykariyappa@users.noreply.github.com> Date: Thu Dec 11 02:38:51 2025 +0530 add calendar env for multi-turn IF (NVIDIA-NeMo#297) This PR introduces the **Calendar Resource Server**, a new training environment that challenges models to schedule multiple events on a calendar while satisfying complex temporal constraints. The constraints are mentioned in a multi-turn conversation format (generated synthetically using a role-playing model). Achieving high performance on this benchmark requires the model to satisfy constraints mentioned in different user turns. When trained on this synthetic dataset, we observe an improvement in the model's multi-turn instruction following ability. The Calendar environment simulates a realistic scheduling task where an AI agent must: - Schedule multiple events within a working day time window - Satisfy various temporal constraints: - **"before"**: Event must end before a specific time - **"after"**: Event must start after a specific time - **"between"**: Event must start and end within a time window - **"at"**: Event must start at an exact time - Ensure no time conflicts between events - Match exact event durations - Stay within global min/max time boundaries This environment tests an agent's ability to: - Parse and understand natural language constraints. - Follow instructions that are mentioned in multiple user messages. - Infer scheduling conflicts and satisfy multiple constraints simultaneously. - Perform temporal reasoning and arithmetic. - **4 constraint types**: before, after, between, at - **Time window enforcement**: Global min/max boundaries for all events - **Conflict detection**: Automatic validation of event overlaps - **Duration matching**: Exact duration requirements per event The server includes a robust verification pipeline that: - Extracts JSON schedules from model responses - Validates all temporal constraints - Detects overlapping events - Returns binary rewards (1 for valid, 0 for invalid) - Filters out responses with thinking tags (`<think>`) - Script to generate diverse scheduling scenarios - Configurable number of events and constraint types - Natural language constraint descriptions - Validation data included - Tests for each constraint type (valid and violation cases) - Edge cases: empty schedules, wrong event counts, time conflicts - Complex multi-event scenarios Qwen3-8b shows steady improvement in rewards when trained with GRPO with a dataset of 4K synthetic samples. Wandb logs are below. https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/t4v06nbg https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/70yc23ew https://wandb.ai/nvidia/skariyappa-nemo-gym-rl-integration/runs/1jnwuhi3 --------- Signed-off-by: Sanjay Kariyappa <skariyappa@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit a182171 Author: bxyu-nvidia <bxyu@nvidia.com> Date: Wed Dec 10 12:58:07 2025 -0800 Explain where the name Gym comes from; Gym Key Terminology doc is missing some of the old material (NVIDIA-NeMo#470) Signed-off-by: Brian Yu <bxyu@nvidia.com> commit d8ecb8b Author: Chris Wing <cwing@nvidia.com> Date: Wed Dec 10 10:56:29 2025 -0800 Add benefits to About page aligned with README (NVIDIA-NeMo#452) Fixes NVIDIA-NeMo#451 Signed-off-by: Chris Wing <cwing@nvidia.com> commit e08906c Author: Ahmad Kiswani <kiswani.ahmad@gmail.com> Date: Wed Dec 10 10:35:33 2025 -0800 docs: Moved configuration system under about (NVIDIA-NeMo#420) Moved configuration systems under "About" instead of "About>Concepts". Also removed configuration mentions and examples from core abstraction pages Closes NVIDIA-NeMo#392 and NVIDIA-NeMo#393 --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> Signed-off-by: L.B <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: L.B <llane@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> commit 7aa8306 Author: Chris Wing <cwing@nvidia.com> Date: Wed Dec 10 05:59:38 2025 -0800 Add Data Designer and links to ecosystem page (NVIDIA-NeMo#462) Fixes NVIDIA-NeMo#450 Signed-off-by: Chris Wing <cwing@nvidia.com> commit 287d08d Author: Chris Wing <cwing@nvidia.com> Date: Tue Dec 9 12:45:35 2025 -0800 Change NeMo Gym from framework to library (NVIDIA-NeMo#456) Changed description of NeMo Gym from a framework to library for consistency across NeMo products Signed-off-by: Chris Wing <cwing@nvidia.com>

@pjin-nvidia

Add Data Designer and links to ecosystem page (NVIDIA-NeMo#462) Fixes NVIDIA-NeMo#450 Signed-off-by: Chris Wing <cwing@nvidia.com> docs: Moved configuration system under about (NVIDIA-NeMo#420) Moved configuration systems under "About" instead of "About>Concepts". Also removed configuration mentions and examples from core abstraction pages Closes NVIDIA-NeMo#392 and NVIDIA-NeMo#393 --------- Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> Signed-off-by: L.B <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: L.B <llane@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> docs: Training framework integration (NVIDIA-NeMo#439) Signed-off-by: Brian Yu <bxyu@nvidia.com> docs: Improve server reference info (NVIDIA-NeMo#474) Signed-off-by: Brian Yu <bxyu@nvidia.com> pyproject typos and grammar fixes (NVIDIA-NeMo#473) Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/132 Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com> docs: Fix wrong count vs actual (NVIDIA-NeMo#482) Signed-off-by: Frankie Siino <fsiino@nvidia.com> docs: home pg, quickstart move, gh icon (NVIDIA-NeMo#463) - adds GH icon + link to global top nav - rebuilds the home page to standard layout - adds CTA to quickstart and tutorials - moves quickstart into get started - clarifies differences between the quickstart and more detailed onboarding materials --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Chris Wing <cwing@nvidia.com> More single tool call filename updates cont (NVIDIA-NeMo#484) Signed-off-by: Frankie Siino <fsiino@nvidia.com> Fix NeMo Gym Pyproject links (NVIDIA-NeMo#486) Signed-off-by: Brian Yu <bxyu@nvidia.com> docs: move FAQ (NVIDIA-NeMo#489) moves how-to-faq to render under "references" and display as FAQ. no material changes to the content. Signed-off-by: Lawrence Lane <llane@nvidia.com> docs: contribute section (NVIDIA-NeMo#490) - move training content into new contribute section - create contributing overview page - add contributing section on home page with link to RL integrations content hub --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Misc rollout fixes (NVIDIA-NeMo#447) Signed-off-by: Peter Jin <pjin@nvidia.com> Docs: Contribution Home & Dev Setup (NVIDIA-NeMo#494) Added types of contribution to contribution overview and replicated dev setup instructions from contributing.md to docs --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Add environment contribution docs (NVIDIA-NeMo#498) Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> FAQ cleanup (NVIDIA-NeMo#499) This PR removes redundant content from the FAQ and better organizes the documentation structure. **Removed redundant FAQ sections** now covered in dedicated documentation: - `ng_version` → `docs/reference/cli-commands.md` - Config anatomy → `docs/reference/configuration.md` (section was incomplete TODO) - DCO and commit signing → `CONTRIBUTING.md` and `docs/contribute/development-setup.md` - Copyright errors → `docs/contribute/development-setup.md` - CI/CD requirements → `docs/contribute/development-setup.md` **Reorganized FAQ placement:** - Moved `docs/how-to-faq.md` → `docs/reference/faq.md` (consistent with other reference docs) - Repositioned FAQ to bottom of Reference section (after Configuration, CLI Commands, API Reference) - Updated intro to clarify FAQ provides quick answers while comprehensive docs are developed --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> Simplify contributing.md (NVIDIA-NeMo#500) added links to contribute section of docs site and removed redundant content. links need to be verified after NVIDIA-NeMo#498 is merged to main --------- Signed-off-by: Chris Wing <cwing@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: Lawrence Lane <llane@nvidia.com> docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] (NVIDIA-NeMo#481) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com> Change to v0.1.1 release version (NVIDIA-NeMo#509) Signed-off-by: Brian Yu <bxyu@nvidia.com> Bump to v0.2.0 (NVIDIA-NeMo#510) Signed-off-by: Brian Yu <bxyu@nvidia.com> reasoning-gym resource server (NVIDIA-NeMo#113) single turn tasks across various domains: "Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL). The goal is to generate virtually infinite training data with adjustable complexity. It currently provides more than 100 tasks over many domains, including but not limited to algebra, arithmetic, computation, cognition, geometry, graph theory, logic, and many common games." Tested all 100+ environments for errors, and tested training on many, demonstrated convergence. This dataset of 100+ environments is also used in ProRL (https://arxiv.org/abs/2505.24864) --------- Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Co-authored-by: ARC Bot <arc-bot@example.com> docs: Miscellaneous GRPO tutorial fixes (NVIDIA-NeMo#512) Signed-off-by: Brian Yu <bxyu@nvidia.com> docs settings update (NVIDIA-NeMo#525) Signed-off-by: Lawrence Lane <llane@nvidia.com> List running server health and status (NVIDIA-NeMo#290) This implements the `ng_status` command to list all running servers on the system and ping for health check. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> VLLMModel supports chat template kwargs (NVIDIA-NeMo#538) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Peter Jin <pjin@nvidia.com> Salesforce xlam-function-calling-60k resources server (NVIDIA-NeMo#262) function calling resources server based on https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> python flag for colab venv installation (NVIDIA-NeMo#526) need to set uv pip install python flag in colab environments when launching servers usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true ` defaults to false For NVIDIA-NeMo#370 Needed for notebook here: https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> add unsloth and trl to docs (NVIDIA-NeMo#536) adds a section for single-step training with unsloth and trl not sure if these should be broken into separate sections. Left as one since the same notebook works for both, but could be confusing. not sure if we should also add more info about multi-step (hopefully) coming soon. Signed-off-by: Christian Munley <cmunley@nvidia.com> docs: remove trl docs (NVIDIA-NeMo#543) remove trl from docs, leaving just unsloth. was unclear that they are together. will make a trl section when we have a standalone trl notebook, or a section on trl's docs too. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> Remove PlainTextResponse response_class (NVIDIA-NeMo#544) https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079 Initially in NVIDIA-NeMo#290 , the `response_class=PlainTextResponse` was added to the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt to debug parsing server info for the `ng_status` command. This lead to a parsing error in `load_from_global_config`. This command now uses it's own separate endpoint `server_instances`, so this needs to be removed. Signed-off-by: Frankie Siino <fsiino@nvidia.com> Increase test_train_data_utils coverage (NVIDIA-NeMo#553) Overall coverage failure threhsold is 95%, and test coverage is too low for train_data_utils which brings down overall coverage of the ng_dev_test suite. This covers some of those lingering test cases to bring it from 89% to 97%. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Generic Aviary integration (NVIDIA-NeMo#55) This PR enables running Gym on Aviary environments. The two main concepts: - `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and manages multiple environments - Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs, but an integer index into the `TaskDataset`. Otherwise we'd have data defined in two places - Instead of tool-specific endpoints, we have one `/step` endpoint. This is because: - Aviary environments define their transition function in `step()`. Simply calling the bare tools can have undefined behavior (e.g. state isn't updated properly) - Aviary tools are not guaranteed to be available until `reset()` is called. - A `/close` endpoint is added to tear down resources - `AviaryAgent`: analogous to `SimpleAgent`, but: - Request is an integer index (which is forwarded to `AviaryResourcesServer`). In general, we expect `env.reset()` to provide the first messages, not the calling code - All tool calls are sent to `/step` - We rely on the environment to tell us when we're done Two concrete Aviary datasets/environments are integrated: GSM8k with a calculator environment and BixBench with a notebook environment. Adding new ones is pretty lightweight (most of the code in `notebook_app.py` is from defining a BixBench-compatible environment, not the integration). --------- Signed-off-by: Siddharth Narayanan <sid@futurehouse.org> Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com> Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> Signed-off-by: cmunley1 <cmunley@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> ng_dump_config sanity removes API key values (NVIDIA-NeMo#567) Signed-off-by: Brian Yu <bxyu@nvidia.com> Feat: Add reward profiling and fractional reward (NVIDIA-NeMo#83) Adds more descriptive readme, reward profiling, and option for fractional or binary reward. Signed-off-by: abukharin-nv <abukharin@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com> Single step environments for SWE tasks (NVIDIA-NeMo#561) This PR adds new environments for SWE tasks. The environments can be used for single-step patch generation, test generation, and LLM-as-a-judge. They have been tested for instances from SWE-bench, SWE-Gym, and SWE-rebench. Patch and test generation environment runs them against unittests in a containerized environment (Singularity). --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> NL2Bash using Equivalency Judge (NVIDIA-NeMo#569) Integrating a new dataset using existing equivalency llm judge resource server. Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash License: https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE Train: 8040 unique samples Validation: 50 unique, randomly sampled from train Augmentation on the source (minimal): Added system prompt, output formatting requirement Example of env validation: - base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint) - Step 30 -> 12.50% on Terminal Bench Core - https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm Train: nl2bash-super-train-0901.jsonl Validation: nl2bash-super-validation-0901.jsonl https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/ ``` ng_download_dataset_from_gitlab \ +dataset_name=nl2bash-equivalency-judge \ +version=0.0.1 \ +artifact_fpath=nl2bash-super-train-0901.jsonl \ +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl ``` --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> enh: use agent ref from data in rollouts (NVIDIA-NeMo#568) Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to use `agent_ref` from each data row instead. The NeMo-RL training code already respects per-row `agent_ref`, but the Gym CLI (`ng_collect_rollouts`) required a single hardcoded `agent_name`. This prevented multi-agent rollout collection via CLI. - `rollout_collection.py`: Made `agent_name` field optional with `default=None` - Use `config.agent_name` if specified; otherwise fall back to `row["agent_ref"]["name"]` - Added validation error if neither source provides an agent name | Before | After | |--------|-------| | `+agent_name=...` required | `+agent_name=...` optional | | All rows use same agent | Rows can use different agents via `agent_ref` | --------- Signed-off-by: George Armstrong <georgea@nvidia.com> FastAPI worker support (NVIDIA-NeMo#566) Signed-off-by: Brian Yu <bxyu@nvidia.com> Local vLLM model and other misc improvements (NVIDIA-NeMo#558) Inspired by https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff by @pjin-nvidia --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Update math_with_judge artifact paths (NVIDIA-NeMo#582) The default artifact paths for the math_with_judge resource server doesn't match the filenames for the provided dataset (nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main). This results in an error when attempting to download the files automatically from Hugging Face. The artifact paths for both training and validation need to be updated with the names as shown on Hugging Face for proper downloading. Signed-off-by: Robert Clark <roclark@nvidia.com> Add Hugging Face identifier for coding resource (NVIDIA-NeMo#583) The competitive coding resource config is missing a Hugging Face identifier which prevents it from being downloaded via Hugging Face using the data preparation tools. Without the HF identifier run the following: ``` config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml" ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface ``` This will throw a warning: ``` Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend ``` And eventually this error: ``` Traceback (most recent call last): File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module> sys.exit(prepare_data()) ^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data data_processor.run(global_config_dict) File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics state = self._validate_samples_and_aggregate_metrics_single_dataset(d) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines with open(dataset_config.jsonl_fpath) as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl' ``` This fix will download the validation file as intended and resolve the errors. Signed-off-by: Robert Clark <roclark@nvidia.com> updating swerl_gen config (NVIDIA-NeMo#588) The train and val data paths are swapped in the config. This PR updates them. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> NeMo Skills Tools Resource (NVIDIA-NeMo#571) Adds a new resources server that integrates NeMo Skills tools (e.g., stateful Python code execution) with NeMo Gym's verification system. **Key features:** - Executes NeMo Skills tools via the ToolManager (e.g., `stateful_python_code_exec`) - Delegates verification to other resources servers (e.g., `math_with_judge`) The `ns_tools` server acts as a pass-through for verification. When `verify()` is called, it delegates to the configured verifier (default: `math_with_judge`): ``` ns_tools.verify(request) → POST to math_with_judge/verify → returns reward from math_with_judge ``` This allows using NeMo Skills tools while leveraging existing verification infrastructure. ```json { "id": "aime25-0", "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.", "expected_answer": "70", "verifier_type": "math_with_judge", "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"}, "responses_create_params": { "input": [ {"role": "user", "content": "Solve the following math problem..."} ], "tools": [{ "type": "function", "name": "stateful_python_code_exec", "description": "Execute Python code in a stateful environment.", "parameters": { "type": "object", "properties": {"code": {"type": "string"}}, "required": ["code"] } }] } } ``` --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Add math_formal_lean resource server for Lean4 proof verification (NVIDIA-NeMo#563) - Adds new `math_formal_lean` resource server for Lean4 formal theorem proving - Implements `/verify` endpoint that compiles proofs via sandbox container and returns reward 1.0/0.0 - Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned prompt format - Comprehensive test suite (31 tests) | File | Description | |------|-------------| | `app.py` | Resource server with verify endpoint | | `sandbox_client.py` | HTTP client for Lean4 sandbox | | `proof_utils.py` | Proof extraction/building utilities | | `prepare_minif2f.py` | Dataset preparation script | | `README.md` | Documentation with licensing info | - [x] Unit tests pass (31/31) - [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5 samples) - [x] Tested with gpt-5.1-codex-max model - [x] Pre-commit lint checks pass 🤖 Generated with [Claude Code](https://claude.ai/code) --------- Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Aviary rollouts can be configured to return transitions or not (NVIDIA-NeMo#590) Per title. This PR retains the current default of returning transitions, but it is reasonable to change that default to match the other Gym agents. Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com> openhands (NVIDIA-NeMo#343) Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> Terminus (judge only) Slicing Environment (NVIDIA-NeMo#594) Refactoring the equivalency llm judge resource server into another judge-based resource server. Main changes include removing regex logic and cleaning up related configs to that. Train data for this environment is still TBD, but a working version: Data source: Sliced terminus prompts from different sources train_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl` validation_jsonl_fpath: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl` example train config: `/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml` Example of env validation: base model: early sft checkpoint of nano v3 (`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`) Step 50 -> 21.25% on Terminal Bench Core https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi Next steps: Will expand this PR with configurable verification options including string matching, string similarity and openapi-based output schema validation. --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> 0.2.0 new doc stubs (NVIDIA-NeMo#581) Added new doc directories/article stubs for the topics identified in 0.2.0 IA. generated initial pass of structure and some starter content. This will enable contributors to focus more on the topic itself rather than the site build/toctree elements. **Feel free to blow away any initial content in these pages**. All stubbed pages have been marked with 🟡 in the toctree for easy discovery. remove 🟡 once the page is finished. <img width="1800" height="1009" alt="image" src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db" /> --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Add tutorial for custom data preparation (NVIDIA-NeMo#596) Added a complete example of preparing a custom dataset for usage with NeMo Gym. The tutorial walks through downloading a dataset from Hugging Face or modifying from a different source, adding the "responses_create_params" field, writing a new resource server config, and preparing the data with "ng_prepare_data". This tutorial can be used as a guide for taking most arbitrary text-based datasets and modifying them to a format that is compatible with NeMo Gym for post-training. Signed-off-by: Robert Clark <roclark@nvidia.com> Fix invalid ref in docs build (NVIDIA-NeMo#604) Signed-off-by: Brian Yu <bxyu@nvidia.com> Fix Nemo-Skills python tool to use http (NVIDIA-NeMo#606) Spawn python_tool as an HTTP server subprocess in ns_tools for better stability and ensure all rollouts get completed. This replaces stdio-based tool execution with HTTP transport. --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Expanding Terminus Slicing PR (NVIDIA-NeMo#597) Expanding PR to include reward logic for string similarity and schema validation --------- Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Updating swerl_gen to support custom parsers (NVIDIA-NeMo#624) Adding support for custom parsers and evaluation scripts. Prompt formats for this environment are also simplified. --------- Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com> Co-authored-by: Test User <test@example.com> docs: unsloth fix (NVIDIA-NeMo#622) for 5826079 Signed-off-by: cmunley1 <cmunley@nvidia.com> arc-agi resource server (NVIDIA-NeMo#105) Signed-off-by: cmunley1 <cmunley@nvidia.com> Signed-off-by: Christian Munley <cmunley@nvidia.com> arc readme (NVIDIA-NeMo#634) Signed-off-by: Christian Munley <cmunley@nvidia.com> VLLMModel: Add chat template kwargs on tokenize request (NVIDIA-NeMo#636) Signed-off-by: Brian Yu <bxyu@nvidia.com> [docs] Add architecture diagrams (NVIDIA-NeMo#574) Fixes NVIDIA-NeMo#292 This PR covers the rollout collection within NeMo Gym for standalone usage (ie not used in conjuction with an RL training framework). For the NeMo RL + Gym integration summary, I will add docs to the NeMo RL page, and update Gym docs with a pointer to those for reference. These docs cover: - Control plane: `ng_run` startup sequence (CLI parsing, config loading, Ray init, server spawning) - Server architecture: Head server, uvicorn/FastAPI initialization - HTTP request flow: Example rollout showing Agent -> Model -> Resources interactions - Data plane: `ng_collect_rollouts` flow starting from the headserver discovery This change also adds an extra dependency on `sphinxcontrib.mermaid` for mermaid diagrams to render in the docs page --------- Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> feat: reward profiling (NVIDIA-NeMo#621) addresses NVIDIA-NeMo#614 aggregates metrics from rollouts with num_repeats to create a reward profiled dataset. New cli command `ng_profile` accepts - input_jsonl_fpath - original task dataset - rollouts_jsonl_fpath - rollouts file from `ng_collect_rollouts` with num_repeats > 1 - output_jsonl_fpath - output path for reward profiled task dataset - pass_threshold - the reward threshold to count as pass in pass@k calculations. Needed because some envs return partial rewards, or rewards > 1 , so its not simple to just say reward = 1 or reward > 0 is pass. Example usage ``` ng_collect_rollouts \ +agent_name=reasoning_gym_simple_agent \ +input_jsonl_fpath=resources_servers/reasoning_gym/data/train_all.jsonl \ +output_jsonl_fpath=results/reasoning_gym_alltask_rollouts.jsonl \ +num_repeats=16 ng_profile \ +input_jsonl_fpath=resources_servers/reasoning_gym/data/train_all.jsonl \ +rollouts_jsonl_fpath=results/reasoning_gym_alltask_rollouts.jsonl \ +output_jsonl_fpath=resources_servers/reasoning_gym/data/train_all_profiled.jsonl \ +pass_threshold=1.0 ``` This creates a new dataset with fields added, for example: ``` "avg_reward": 1.0, "std_reward": 0.0, "min_reward": 1.0, "max_reward": 1.0, "total_samples": 16, "pass_rate": 1.0, "pass_rate_total": 16, "pass_rate_passed": 16, "pass_threshold": 1.0 ``` As a full example, the original dataset look like: ``` { "responses_create_params": { "input": [ { "role": "user", "content": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name." } ] }, "question": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name.", "answer": "Richard", "metadata": { "source_dataset": "needle_haystack", "source_index": 0, "question": "Who savors playing the accordion? Reply only with a name.", "num_statements": 45, "difficulty": { "num_statements": [ 10, 100 ] } } } ``` The output profiled dataset look like: ``` { "responses_create_params": { "input": [ { "role": "user", "content": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name." } ] }, "question": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name.", "answer": "Richard", "metadata": { "source_dataset": "needle_haystack", "source_index": 0, "question": "Who savors playing the accordion? Reply only with a name.", "num_statements": 45, "difficulty": { "num_statements": [ 10, 100 ] } }, "avg_reward": 1.0, "std_reward": 0.0, "min_reward": 1.0, "max_reward": 1.0, "total_samples": 16, "pass_rate": 1.0, "pass_rate_total": 16, "pass_rate_passed": 16, "pass_threshold": 1.0 } ``` In this example, 16/16 got reward=1, so not a great example, but just for example. --------- Signed-off-by: Christian Munley <cmunley@nvidia.com> docs: issue 626 (NVIDIA-NeMo#638) Signed-off-by: Lawrence Lane <llane@nvidia.com> ci: Enable the test job to build a wheel and publiish to test.pypi (NVIDIA-NeMo#651) Enable the test job to build a wheel and publiish to test.pypi * The workflow expects a .python-version to help build the wheel * Update the package name from NeMo-Gym to nemo-gym to align with how other packages are named * This test job currently only runs when it merges to main or a release branch. We sometimes get too many request errors with test pypi if it runs too frequently Example publishing to test pypi: https://test.pypi.org/project/nemo-gym/0.2.2640rc0/ The Github job was already in the repo. I just had to flip an env var to enable it. --------- Signed-off-by: Charlie Truong <chtruong@nvidia.com> v1 of text-to-sql (NVIDIA-NeMo#648) Text-to-SQL environment using LLM-as-a-judge --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Yev/text to sql v1.1 (NVIDIA-NeMo#653) minor update to simplify code and use xml tags --------- Signed-off-by: Yev Meyer <ymeyer@nvidia.com> Upstream Super 3 dev 20260205 (NVIDIA-NeMo#654) Signed-off-by: Brian Yu <bxyu@nvidia.com> ns tools stability (NVIDIA-NeMo#658) Summary - Add disable_session_restore flag to skip O(n) session history replay after sandbox worker restarts (enabled by default in config) - Add verbose_tool_logging flag to optionally collect per-session timing metrics and log per-call execution times (disabled by default) - Decrease code execution timeout from 30s to 10s for faster failure on runaway code - Pass --disable-session-restore CLI flag through to python_tool subprocess - Update nemo-skills requirement to georgea/super-rl-02062026 (nemo-skills side changes for session restore and dependencies) --------- Signed-off-by: George Armstrong <georgea@nvidia.com> Bump package versions to fix security vulnerabilities (NVIDIA-NeMo#667) Bump package versions to fix security vulnerabilities. - urllib3: 2.5.0 -> 2.6.3 - mlflow: 3.3.2 -> 3.9.0 - fonttools: 4.59.2 -> 4.61.1 - aiohttp: 3.12.15 -> 3.13.3 - python-multipart: 0.0.20 -> 0.0.22 - ray: 2.50.1 -> 2.52.1 ## Related Issue NVIDIA-NeMo/Internal-Planning#145 Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>

…VIDIA-NeMo#481) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Lawrence Lane <llane@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Co-authored-by: L.B. <llane@nvidia.com> Co-authored-by: Frankie Siino <fsiino@nvidia.com>

remove old tutorial

799cf60

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…

1defada

…2e-grpo-tut

bxyu-nvidia and others added 22 commits December 10, 2025 21:21

add placeholder files

9465ccb

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add grpo to key terminology

909ef78

Signed-off-by: Brian Yu <bxyu@nvidia.com>

move

51fb276

Signed-off-by: Brian Yu <bxyu@nvidia.com>

fix order

1e643b2

Signed-off-by: Brian Yu <bxyu@nvidia.com>

fix link

af5cc42

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add temp files

5cc5190

Signed-off-by: Brian Yu <bxyu@nvidia.com>

merge

5bdcafb

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean

1281d1d

Signed-off-by: Brian Yu <bxyu@nvidia.com>

shorten

bb34cdf

Signed-off-by: Brian Yu <bxyu@nvidia.com>

reorganize

671584d

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add links

9d23dce

Signed-off-by: Brian Yu <bxyu@nvidia.com>

impl

9cf3297

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean

c981d1a

Signed-off-by: Brian Yu <bxyu@nvidia.com>

about workplace assistant

5afb380

Signed-off-by: Brian Yu <bxyu@nvidia.com>

gym config todo

29d08cb

Signed-off-by: Brian Yu <bxyu@nvidia.com>

gym config

7ca97b8

Signed-off-by: Brian Yu <bxyu@nvidia.com>

nemo rl configuration

e47c780

Signed-off-by: Brian Yu <bxyu@nvidia.com>

start setup

cffe67d

Signed-off-by: Brian Yu <bxyu@nvidia.com>

setup

3282478

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean

3a98f9f

Signed-off-by: Brian Yu <bxyu@nvidia.com>

copy over remaining info

7d8bf7f

Signed-off-by: Brian Yu <bxyu@nvidia.com>

style edits (#495)

935fabe

Signed-off-by: Lawrence Lane <llane@nvidia.com>

shashank3959 reviewed Dec 11, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/multi-node-training.md Outdated Show resolved Hide resolved

bxyu-nvidia added 3 commits December 11, 2025 16:05

Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…

fccf97b

…2e-grpo-tut Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean single node

58219ad

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean multi node

74d07b3

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia added 3 commits December 11, 2025 17:11

fix ref

b2dd897

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean

c1ba2fc

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clean

295c265

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia linked an issue Dec 12, 2025 that may be closed by this pull request

E2E Training flow example - GRPO/DAPO on Nemotron nano v2 9B + workplace assistant #376

Closed

2 tasks

This was referenced Dec 12, 2025

docs: GRPO Training with NeMo RL: Workplace Assistant on Nemotron Nano v2 9B #448

Closed

docs: improving RL tutorial #467

Closed

lbliii and others added 2 commits December 12, 2025 14:06

Merge branch 'main' into bxyu/e2e-grpo-tut

82e06a6

Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…

cd72d6a

…2e-grpo-tut

shashank3959 reviewed Dec 12, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/single-node-training.md Show resolved Hide resolved

shashank3959 reviewed Dec 12, 2025

View reviewed changes

shashank3959 reviewed Dec 13, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/setup.md Outdated Show resolved Hide resolved

shashank3959 reviewed Dec 13, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/setup.md Show resolved Hide resolved

shashank3959 reviewed Dec 13, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/single-node-training.md Outdated Show resolved Hide resolved

shashank3959 reviewed Dec 13, 2025

View reviewed changes

docs/tutorials/nemo-rl-grpo/single-node-training.md Outdated Show resolved Hide resolved

bxyu-nvidia mentioned this pull request Dec 14, 2025

Add Huggingface support for data preparation #419

Closed

add validation identifiers from e17558d#diff-c9326bd1d43d81a2fd061dba…

56ce064

…b914746e00835f449d2e5649997f6ed6523615f1 Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia marked this pull request as ready for review December 14, 2025 23:58

bxyu-nvidia requested a review from a team as a code owner December 14, 2025 23:58

bxyu-nvidia mentioned this pull request Dec 14, 2025

fix: Upload other splits to Huggingface #460

Closed

empty commit for QA

4e5e3d9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia merged commit 9a9177e into main Dec 15, 2025
5 checks passed

bxyu-nvidia deleted the bxyu/e2e-grpo-tut branch December 15, 2025 00:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: End-to-end GRPO Training with NeMo RL tutorial [master branch]#481

docs: End-to-end GRPO Training with NeMo RL tutorial [master branch]#481
bxyu-nvidia merged 35 commits intomainfrom
bxyu/e2e-grpo-tut

bxyu-nvidia commented Dec 11, 2025

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

shashank3959 Dec 12, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		# Initialize all submodules (Megatron, AutoModel, etc.)
		git submodule update --init --recursive

Conversation

bxyu-nvidia commented Dec 11, 2025

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

copy-pr-bot bot commented Dec 11, 2025

Uh oh!

Uh oh!

Uh oh!

shashank3959 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants