Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
799cf60
remove old tutorial
bxyu-nvidia Dec 11, 2025
1defada
Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…
bxyu-nvidia Dec 11, 2025
9465ccb
add placeholder files
bxyu-nvidia Dec 11, 2025
909ef78
add grpo to key terminology
bxyu-nvidia Dec 11, 2025
51fb276
move
bxyu-nvidia Dec 11, 2025
1e643b2
fix order
bxyu-nvidia Dec 11, 2025
af5cc42
fix link
bxyu-nvidia Dec 11, 2025
5cc5190
add temp files
bxyu-nvidia Dec 11, 2025
5bdcafb
merge
bxyu-nvidia Dec 11, 2025
1281d1d
clean
bxyu-nvidia Dec 11, 2025
bb34cdf
shorten
bxyu-nvidia Dec 11, 2025
671584d
reorganize
bxyu-nvidia Dec 11, 2025
9d23dce
add links
bxyu-nvidia Dec 11, 2025
9cf3297
impl
bxyu-nvidia Dec 11, 2025
c981d1a
clean
bxyu-nvidia Dec 11, 2025
5afb380
about workplace assistant
bxyu-nvidia Dec 11, 2025
29d08cb
gym config todo
bxyu-nvidia Dec 11, 2025
7ca97b8
gym config
bxyu-nvidia Dec 11, 2025
e47c780
nemo rl configuration
bxyu-nvidia Dec 11, 2025
cffe67d
start setup
bxyu-nvidia Dec 11, 2025
3282478
setup
bxyu-nvidia Dec 11, 2025
3a98f9f
clean
bxyu-nvidia Dec 11, 2025
7d8bf7f
copy over remaining info
bxyu-nvidia Dec 11, 2025
935fabe
style edits (#495)
lbliii Dec 11, 2025
fccf97b
Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…
bxyu-nvidia Dec 12, 2025
58219ad
clean single node
bxyu-nvidia Dec 12, 2025
74d07b3
clean multi node
bxyu-nvidia Dec 12, 2025
b2dd897
fix ref
bxyu-nvidia Dec 12, 2025
c1ba2fc
clean
bxyu-nvidia Dec 12, 2025
295c265
clean
bxyu-nvidia Dec 12, 2025
82e06a6
Merge branch 'main' into bxyu/e2e-grpo-tut
lbliii Dec 12, 2025
cd72d6a
Merge branch 'main' of https://github.com/NVIDIA-NeMo/Gym into bxyu/e…
bxyu-nvidia Dec 12, 2025
c63b1c7
End-to-end GRPO tutorial - HuggingFace datasets support (#497)
bxyu-nvidia Dec 14, 2025
56ce064
add validation identifiers from https://github.com/NVIDIA-NeMo/Gym/pu…
bxyu-nvidia Dec 14, 2025
4e5e3d9
empty commit for QA
bxyu-nvidia Dec 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ Purpose: Training-ready environments with curated datasets.
| Google Search | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa</a> | Multi-choice question answering problems with search tools integrated | Improve knowledge-related benchmarks with search tools | <a href='resources_servers/google_search/configs/google_search.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Math Advanced Calculations | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations</a> | An instruction following math environment with counter-intuitive calculators | Improve instruction following capabilities in specific math environments | <a href='resources_servers/math_advanced_calculations/configs/math_advanced_calculations.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Workplace Assistant | agent | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant</a> | Workplace assistant multi-step tool-using environment | Improve multi-step tool use capability | <a href='resources_servers/workplace_assistant/configs/workplace_assistant.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Mini Swe Agent | coding | <a href='https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified'>SWE-bench_Verified</a> | A software development with mini-swe-agent orchestration | Improve software development capabilities, like SWE-bench | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a> | ✓ | ✓ | MIT |
| Mini Swe Agent | coding | <a href='https://huggingface.co/datasets/SWE-Gym/SWE-Gym'>SWE-Gym</a> | A software development with mini-swe-agent orchestration | Improve software development capabilities, like SWE-bench | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a> | ✓ | ✓ | MIT |
| Instruction Following | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following'>Nemotron-RL-instruction_following</a> | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>config</a> | ✓ | - | Apache 2.0 |
| Structured Outputs | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs</a> | Check if responses are following structured output requirements in prompts | Improve instruction following capabilities | <a href='resources_servers/structured_outputs/configs/structured_outputs_json.yaml'>config</a> | ✓ | ✓ | Apache 2.0 |
| Equivalence Llm Judge | knowledge | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-openQA'>Nemotron-RL-knowledge-openQA</a> | Short answer questions with LLM-as-a-judge | Improve knowledge-related benchmarks like GPQA / HLE | <a href='resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml'>config</a> | ✓ | - | Apache 2.0 |
Expand Down
13 changes: 8 additions & 5 deletions docs/about/concepts/key-terminology.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,15 +68,18 @@ Reward / Reward Signal
SFT (Supervised Fine-Tuning)
Training approach using examples of good model behavior. Shows successful rollouts as training data.

DPO (Direct Preference Optimization)
Training approach using pairs of rollouts where one is preferred over another. Teaches better vs worse responses.

RL (Reinforcement Learning)
Training approach where models learn through trial-and-error interaction with environments using reward signals.

Online vs Offline Training
- **Online**: Model learns while interacting with environment in real-time (RL)
- **Offline**: Model learns from pre-collected rollout data (SFT/DPO)
- **Online**: Model learns while interacting with environment in real-time
- **Offline**: Model learns from pre-collected rollout data

DPO (Direct Preference Optimization)
An offline RL training approach using pairs of rollouts where one is preferred over another. Teaches better vs worse responses.

GRPO (Group Relative Policy Optimization)
Reinforcement learning algorithm that optimizes policies by comparing groups of rollouts relative to each other. Used for online RL training with language models.
```

## Interaction Patterns
Expand Down
2 changes: 1 addition & 1 deletion docs/contribute/rl-framework-integration/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ These guides cover how to integrate NeMo Gym into a new RL training framework. U
- Contributing NeMo Gym integration for a training framework that does not have one yet

:::{tip}
Just want to train models? Use {ref}`NeMo RL <rl-training-with-nemo-rl>` instead.
Just want to train models? Use {ref}`NeMo RL <training-nemo-rl-grpo-index>` instead.
:::

## Prerequisites
Expand Down
23 changes: 12 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,7 @@ Collect and view rollouts

::::

<!-- This section needs to match the content in docs/tutorials/index.md -->
## Tutorials

Hands-on tutorials to build and customize your training environments.
Expand All @@ -120,23 +121,23 @@ Hands-on tutorials to build and customize your training environments.
:link-type: doc
Implement or integrate existing tools and define task verification logic.
+++
{bdg-secondary}`custom-environments` {bdg-secondary}`tools`
{bdg-primary}`beginner` {bdg-secondary}`30 min` {bdg-secondary}`custom-environments` {bdg-secondary}`tools`
:::

:::{grid-item-card} {octicon}`database;1.5em;sd-mr-1` Offline Training (SFT, DPO)
:link: tutorials/offline-training-w-rollouts
:link-type: doc
Train with SFT or DPO using collected rollouts.
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` Offline Training with Rollouts
:link: offline-training-w-rollouts
:link-type: ref
Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <SFT (Supervised Fine-Tuning)>` and {term}`direct preference optimization (DPO) <DPO (Direct Preference Optimization)>`.
+++
{bdg-secondary}`sft` {bdg-secondary}`dpo`
:::

:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` RL Training with NeMo RL
:link: tutorials/rl-training-with-nemo-rl
:link-type: doc
Train with GRPO using NeMo RL and NeMo Gym.
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` GRPO with NeMo RL
:link: training-nemo-rl-grpo-index
:link-type: ref
Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
+++
{bdg-secondary}`grpo` {bdg-secondary}`nemo-rl`
{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo`
:::

::::
Expand Down Expand Up @@ -200,8 +201,8 @@ Rollout Collection <get-started/rollout-collection.md>

tutorials/index.md
tutorials/creating-resource-server
tutorials/nemo-rl-grpo/index.md
tutorials/offline-training-w-rollouts
tutorials/rl-training-with-nemo-rl
```

```{toctree}
Expand Down
107 changes: 96 additions & 11 deletions docs/reference/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,27 +14,27 @@ Tests are strongly encouraged and you must have at least one test for every serv


# How To: Upload and download a dataset from HuggingFace
The huggingface client requires that your credentials are in `env.yaml`, along with some other pertinent details needed to upload to the designated place.
The huggingface client requires that your credentials are in `env.yaml`, along with some other pertinent details needed to upload to the designated place.
```yaml
hf_token: {your huggingface token}
hf_organization: {your huggingface org}
hf_collection_name: {your collection}
hf_collection_slug: {your collection slug} # alphanumeric string found at the end of a collection URI

# optional:
hf_dataset_prefix: str # field to override the default value "NeMo-Gym" prepended to the dataset name
hf_dataset_prefix: str # field to override the default value "Nemotron-RL" prepended to the dataset name
```

Naming convention for Huggingface datasets is as follows.

`{hf_organization}/{hf_dataset_prefix}-{domain}–{resource_server_name}-{your dataset name}`
`{hf_organization}/{hf_dataset_prefix}-{domain}–{resource_server OR dataset_name}`

E.g.:

`NVIDIA/Nemo-Gym-Math-math_with_judge-dapo17k`
`nvidia/Nemotron-RL-math-OpenMathReasoning`


You will only need to manually input the `{your dataset name}` portion of the above when inputting the `dataset_name` flag in the upload command (refer to the command below). Everything preceding it will be automatically populated using your config prior to upload.
You will only need to manually input the `{dataset_name}` portion of the above when inputting the `dataset_name` flag in the upload command (refer to the command below). Everything preceding it will be automatically populated using your config prior to upload. Note that it is optional, and overrides `resource_server` if used.

To upload to Huggingface, use the below command:
```bash
Expand All @@ -47,6 +47,45 @@ ng_upload_dataset_to_hf \

Because of the required dataset nomenclature, the resource server config path is required when uploading. Specifically, `domain` is used in the naming of a dataset in Huggingface.

By default, the `split` parameter for uploading is set to `train`, which will run a check on the required fields `{"responses_create_params", "reward_profiles", "expected_answer"}`. Specifying `validation` or `test` bypasses this check:

```bash
resource_config_path="resources_servers/multineedle/configs/multineedle.yaml"
ng_gitlab_to_hf_dataset \
+dataset_name={your dataset name} \
+input_jsonl_fpath=data/multineedle_benchmark_validation.jsonl \
+resource_config_path=${resource_config_path} \
+split=validation
```

## Uploading with Pull Request workflow
When uploading to an organization repository where you don't have direct write access (e.g., nvidia/), use the `+create_pr=true` flag to create a Pull Request instead of pushing directly. You can also customize the commit message and description.

If you want to specify the revision (branch name), you can add the `+revision={your branch name}` flag. Excluding `create_pr` (or setting it to `false`) assumes you are committing to an existing branch. Including it assumes it will be a brand new branch.

```bash
ng_upload_dataset_to_hf \
+dataset_name=OpenMathReasoning \
+input_jsonl_fpath=data/validation.jsonl \
+resource_config_path=${resource_config_path} \
+split=validation \
+create_pr=true \
+revision=my-branch-name \
+commit_message="Add validation set" \
+commit_description="Includes 545 examples"
```

The command will output a link to the created Pull Request:
```bash
[Nemo-Gym] - Pull Request created: https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/discussions/1
```

:::{note}
The commit_message and commit_description parameters work for both direct pushes and Pull Requests. If not provided, HuggingFace auto-generates a commit message based on the filename.
:::


## Deleting Datasets from Gitlab
You can optionally pass a `+delete_from_gitlab=true` flag to the above command, which will delete the model and all of its artifacts from Gitlab. By default, this is set to `False`.
```bash
resource_config_path="resources_servers/multineedle/configs/multineedle.yaml"
Expand All @@ -59,7 +98,7 @@ ng_upload_dataset_to_hf \

There will be a confirmation dialog to confirm the deletion:
```bash
[Nemo-Gym] - Dataset uploaded successful
[Nemo-Gym] - Dataset upload successful
[Nemo-Gym] - Found model 'fs-test' in the registry. Are you sure you want to delete it from Gitlab? [y/N]:
```

Expand All @@ -83,13 +122,28 @@ ng_delete_dataset_from_gitlab \
Gitlab model names are case sensitive. There can be models named 'My_Model' and 'my_model' living simultaneously in the registry. When uploading to Huggingface with the intention of deleting Gitlab artifacts, be sure the casing of your Huggingface dataset name matches that of Gitlab's.
:::


## Downloading Datasets from Huggingface
Downloading a dataset from Huggingface is straightforward:

**For structured datasets (with train/validation/test splits):**
```bash
ng_download_dataset_from_hf \
+repo_id=NVIDIA/NeMo-Gym-Instruction_Following-multineedle-{your dataset name} \
+artifact_fpath=multineedle_benchmark.jsonl \
+output_fpath=data/multineedle_benchmark_hf.jsonl
+repo_id=nvidia/Nemotron-RL-knowledge-mcqa \
+output_dirpath=data/mcqa \
+split=train
```
The `split` parameter is optional. If omitted, all available splits will be downloaded as separate JSONL files.


**For raw file repositories (with specific JSONL files):**
```bash
ng_download_dataset_from_hf \
+repo_id=nvidia/Nemotron-RL-instruction_following \
+output_dirpath=data/instruction_following \
+artifact_fpath=instruction_following.jsonl
```
Use `artifact_fpath` when the HuggingFace repo contains raw/arbitrary JSONL files rather than structured dataset splits. You cannot specify both `split` and `artifact_fpath`.


# How To: Prepare and validate data for PR submission or RL training
Expand Down Expand Up @@ -120,6 +174,9 @@ example_multi_step_simple_agent:
dataset_name: example_multi_step
version: 0.0.1
artifact_fpath: example_multi_step/train.jsonl
huggingface_identifier:
repo_id: nvidia/Nemotron-RL-instruction_following
artifact_fpath: instruction_following.jsonl
license: Apache 2.0
- name: validation
type: validation
Expand All @@ -130,6 +187,9 @@ example_multi_step_simple_agent:
dataset_name: example_multi_step
version: 0.0.1
artifact_fpath: example_multi_step/validation.jsonl
huggingface_identifier:
repo_id: nvidia/Nemotron-RL-instruction_following
artifact_fpath: if_validation.jsonl
license: Apache 2.0
- name: example
type: example
Expand All @@ -142,7 +202,8 @@ A dataset object consists of:
- Type: train, validation, or example. Train and validation are as used in NeMo RL or other train frameworks. More information about the example type is in the next section.
- Jsonl fpath: the local file path to your jsonl file for this dataset.
- Num repeats: optionally repeat each row when preparing or collating data. Defaults to 1 if unspecified.
- Gitlab identifier: The remote path to the dataset as held in the Gitlab dataset registry. This field is required for train and validation datasets. (Not required for example datasets since those are required to be committed to Git).
- Gitlab identifier: (NVIDIA internal) The remote path to the dataset as held in the Gitlab dataset registry. This field is required for train and validation datasets. (Not required for example datasets since those are required to be committed to Git).
- HuggingFace identifier: (Public) The remote path to the dataset on HuggingFace. Contains `repo_id` (required) and optionally `artifact_fpath` for raw file repos. If `artifact_fpath` is omitted, the datasets library will infer the `split` from the dataset `type`.
- License: The license of that dataset. Required for train and validation datasets and not required for example datasets, similar in principle to the Gitlab identifier.
- Start idx, end idx: used for slicing your dataset.
```yaml
Expand All @@ -153,6 +214,9 @@ A dataset object consists of:
dataset_name: example_multi_step
version: 0.0.1
artifact_fpath: example_multi_step/validation.jsonl
huggingface_identifier:
repo_id: nvidia/example_multi_step
artifact_fpath: example_validation.jsonl
license: Apache 2.0
```

Expand All @@ -165,11 +229,32 @@ responses_api_models/openai_model/configs/openai_model.yaml"
ng_prepare_data "+config_paths=[$config_paths]" \
+output_dirpath=data/example_multi_step \
+mode=example_validation
```

# Run NeMo Gym servers the exact same way with the same configs!
To download missing datasets automatically, add +should_download=true. By default, datasets are downloaded from HuggingFace:
```bash
ng_prepare_data "+config_paths=[$config_paths]" \
+output_dirpath=data/example_multi_step \
+mode=train_preparation \
+should_download=true
```

For NVIDIA internal users, you can download from GitLab instead:

```bash
ng_prepare_data "+config_paths=[$config_paths]" \
+output_dirpath=data/example_multi_step \
+mode=train_preparation \
+should_download=true \
+data_source=gitlab
```

Run NeMo Gym servers the exact same way with the same configs!
```bash
ng_run "+config_paths=[$config_paths]"
```


The `ng_prepare_data` command will:
1. Attempt to load all the datasets you specified from disk. Missing datasets will be reported before any processing is done.
2. For each dataset, read example by example. Check the format and report the filepaths and indices/ranges of offending examples if any.
Expand Down
Loading