NVIDIA-NeMo · bxyu-nvidia · Dec 15, 2025 · Dec 11, 2025 · Dec 11, 2025 · Dec 11, 2025
diff --git a/README.md b/README.md
@@ -155,7 +155,7 @@ Purpose: Training-ready environments with curated datasets.
 | Google Search              | agent                 | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-web_search-mcqa'>Nemotron-RL-knowledge-web_search-mcqa</a>                               | Multi-choice question answering problems with search tools integrated                                | Improve knowledge-related benchmarks with search tools                   | <a href='resources_servers/google_search/configs/google_search.yaml'>config</a>                           | ✓     | -          | Apache 2.0                                     |
 | Math Advanced Calculations | agent                 | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-math-advanced_calculations'>Nemotron-RL-math-advanced_calculations</a>                             | An instruction following math environment with counter-intuitive calculators                         | Improve instruction following capabilities in specific math environments | <a href='resources_servers/math_advanced_calculations/configs/math_advanced_calculations.yaml'>config</a> | ✓     | -          | Apache 2.0                                     |
 | Workplace Assistant        | agent                 | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-agent-workplace_assistant'>Nemotron-RL-agent-workplace_assistant</a>                               | Workplace assistant multi-step tool-using environment                                                | Improve multi-step tool use capability                                   | <a href='resources_servers/workplace_assistant/configs/workplace_assistant.yaml'>config</a>               | ✓     | ✓          | Apache 2.0                                     |
-| Mini Swe Agent             | coding                | <a href='https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified'>SWE-bench_Verified</a>                                                              | A software development with mini-swe-agent orchestration                                             | Improve software development capabilities, like SWE-bench                | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a>                         | ✓     | ✓          | MIT                                            |
+| Mini Swe Agent             | coding                | <a href='https://huggingface.co/datasets/SWE-Gym/SWE-Gym'>SWE-Gym</a>                                                                                          | A software development with mini-swe-agent orchestration                                             | Improve software development capabilities, like SWE-bench                | <a href='resources_servers/mini_swe_agent/configs/mini_swe_agent.yaml'>config</a>                         | ✓     | ✓          | MIT                                            |
 | Instruction Following      | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following'>Nemotron-RL-instruction_following</a>                                       | Instruction following datasets targeting IFEval and IFBench style instruction following capabilities | Improve IFEval and IFBench                                               | <a href='resources_servers/instruction_following/configs/instruction_following.yaml'>config</a>           | ✓     | -          | Apache 2.0                                     |
 | Structured Outputs         | instruction_following | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-instruction_following-structured_outputs'>Nemotron-RL-instruction_following-structured_outputs</a> | Check if responses are following structured output requirements in prompts                           | Improve instruction following capabilities                               | <a href='resources_servers/structured_outputs/configs/structured_outputs_json.yaml'>config</a>            | ✓     | ✓          | Apache 2.0                                     |
 | Equivalence Llm Judge      | knowledge             | <a href='https://huggingface.co/datasets/nvidia/Nemotron-RL-knowledge-openQA'>Nemotron-RL-knowledge-openQA</a>                                                 | Short answer questions with LLM-as-a-judge                                                           | Improve knowledge-related benchmarks like GPQA / HLE                     | <a href='resources_servers/equivalence_llm_judge/configs/equivalence_llm_judge.yaml'>config</a>           | ✓     | -          | Apache 2.0                                     |

diff --git a/docs/about/concepts/key-terminology.md b/docs/about/concepts/key-terminology.md
@@ -68,15 +68,18 @@ Reward / Reward Signal
 SFT (Supervised Fine-Tuning)
     Training approach using examples of good model behavior. Shows successful rollouts as training data.
 
-DPO (Direct Preference Optimization)
-    Training approach using pairs of rollouts where one is preferred over another. Teaches better vs worse responses.
-
 RL (Reinforcement Learning)
     Training approach where models learn through trial-and-error interaction with environments using reward signals.
 
 Online vs Offline Training
-    - **Online**: Model learns while interacting with environment in real-time (RL)
-    - **Offline**: Model learns from pre-collected rollout data (SFT/DPO)
+    - **Online**: Model learns while interacting with environment in real-time
+    - **Offline**: Model learns from pre-collected rollout data
+
+DPO (Direct Preference Optimization)
+    An offline RL training approach using pairs of rollouts where one is preferred over another. Teaches better vs worse responses.
+
+GRPO (Group Relative Policy Optimization)
+    Reinforcement learning algorithm that optimizes policies by comparing groups of rollouts relative to each other. Used for online RL training with language models.
 ```
 
 ## Interaction Patterns

diff --git a/docs/contribute/rl-framework-integration/index.md b/docs/contribute/rl-framework-integration/index.md
@@ -8,7 +8,7 @@ These guides cover how to integrate NeMo Gym into a new RL training framework. U
 - Contributing NeMo Gym integration for a training framework that does not have one yet
 
 :::{tip}
-Just want to train models? Use {ref}`NeMo RL <rl-training-with-nemo-rl>` instead.
+Just want to train models? Use {ref}`NeMo RL <training-nemo-rl-grpo-index>` instead.
 :::
 
 ## Prerequisites

diff --git a/docs/index.md b/docs/index.md
@@ -108,6 +108,7 @@ Collect and view rollouts
 
 ::::
 
+<!-- This section needs to match the content in docs/tutorials/index.md -->
 ## Tutorials
 
 Hands-on tutorials to build and customize your training environments.
@@ -120,23 +121,23 @@ Hands-on tutorials to build and customize your training environments.
 :link-type: doc
 Implement or integrate existing tools and define task verification logic.
 +++
-{bdg-secondary}`custom-environments` {bdg-secondary}`tools`
+{bdg-primary}`beginner` {bdg-secondary}`30 min` {bdg-secondary}`custom-environments` {bdg-secondary}`tools`
 :::
 
-:::{grid-item-card} {octicon}`database;1.5em;sd-mr-1` Offline Training (SFT, DPO)
-:link: tutorials/offline-training-w-rollouts
-:link-type: doc
-Train with SFT or DPO using collected rollouts.
+:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` Offline Training with Rollouts
+:link: offline-training-w-rollouts
+:link-type: ref
+Transform rollouts into training data for {term}`supervised fine-tuning (SFT) <SFT (Supervised Fine-Tuning)>` and {term}`direct preference optimization (DPO) <DPO (Direct Preference Optimization)>`.
 +++
 {bdg-secondary}`sft` {bdg-secondary}`dpo`
 :::
 
-:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` RL Training with NeMo RL
-:link: tutorials/rl-training-with-nemo-rl
-:link-type: doc
-Train with GRPO using NeMo RL and NeMo Gym.
+:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` GRPO with NeMo RL
+:link: training-nemo-rl-grpo-index
+:link-type: ref
+Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
 +++
-{bdg-secondary}`grpo` {bdg-secondary}`nemo-rl`
+{bdg-primary}`training` {bdg-secondary}`rl` {bdg-secondary}`grpo`
 :::
 
 ::::
@@ -200,8 +201,8 @@ Rollout Collection <get-started/rollout-collection.md>
 
 tutorials/index.md
 tutorials/creating-resource-server
+tutorials/nemo-rl-grpo/index.md
 tutorials/offline-training-w-rollouts
-tutorials/rl-training-with-nemo-rl
 ```
 
 ```{toctree}

diff --git a/docs/reference/faq.md b/docs/reference/faq.md
@@ -14,27 +14,27 @@ Tests are strongly encouraged and you must have at least one test for every serv
 
 
 # How To: Upload and download a dataset from HuggingFace
-The huggingface client requires that your credentials are in `env.yaml`, along with some other pertinent details needed to upload to the designated place. 
+The huggingface client requires that your credentials are in `env.yaml`, along with some other pertinent details needed to upload to the designated place.
 ```yaml
 hf_token: {your huggingface token}
 hf_organization: {your huggingface org}
 hf_collection_name: {your collection}
 hf_collection_slug: {your collection slug}  # alphanumeric string found at the end of a collection URI
 
 # optional:
-hf_dataset_prefix: str  # field to override the default value "NeMo-Gym" prepended to the dataset name
+hf_dataset_prefix: str  # field to override the default value "Nemotron-RL" prepended to the dataset name
 ```
 
 Naming convention for Huggingface datasets is as follows.
 
-`{hf_organization}/{hf_dataset_prefix}-{domain}–{resource_server_name}-{your dataset name}`
+`{hf_organization}/{hf_dataset_prefix}-{domain}–{resource_server OR dataset_name}`
 
 E.g.:
 
-`NVIDIA/Nemo-Gym-Math-math_with_judge-dapo17k`
+`nvidia/Nemotron-RL-math-OpenMathReasoning`
 
 
-You will only need to manually input the `{your dataset name}` portion of the above when inputting the `dataset_name` flag in the upload command (refer to the command below). Everything preceding it will be automatically populated using your config prior to upload.
+You will only need to manually input the `{dataset_name}` portion of the above when inputting the `dataset_name` flag in the upload command (refer to the command below). Everything preceding it will be automatically populated using your config prior to upload. Note that it is optional, and overrides `resource_server` if used.
 
 To upload to Huggingface, use the below command:
 ```bash
@@ -47,6 +47,45 @@ ng_upload_dataset_to_hf \
 
 Because of the required dataset nomenclature, the resource server config path is required when uploading. Specifically, `domain` is used in the naming of a dataset in Huggingface.
 
+By default, the `split` parameter for uploading is set to `train`, which will run a check on the required fields `{"responses_create_params", "reward_profiles", "expected_answer"}`. Specifying `validation` or `test` bypasses this check:
+
+```bash
+resource_config_path="resources_servers/multineedle/configs/multineedle.yaml"
+ng_gitlab_to_hf_dataset \
+    +dataset_name={your dataset name} \
+    +input_jsonl_fpath=data/multineedle_benchmark_validation.jsonl \
+    +resource_config_path=${resource_config_path} \
+    +split=validation
+```
+
+## Uploading with Pull Request workflow
+When uploading to an organization repository where you don't have direct write access (e.g., nvidia/), use the `+create_pr=true` flag to create a Pull Request instead of pushing directly. You can also customize the commit message and description.
+
+If you want to specify the revision (branch name), you can add the `+revision={your branch name}` flag. Excluding `create_pr` (or setting it to `false`) assumes you are committing to an existing branch. Including it assumes it will be a brand new branch.
+
+```bash
+ng_upload_dataset_to_hf \
+    +dataset_name=OpenMathReasoning \
+    +input_jsonl_fpath=data/validation.jsonl \
+    +resource_config_path=${resource_config_path} \
+    +split=validation \
+    +create_pr=true \
+    +revision=my-branch-name \
+    +commit_message="Add validation set" \
+    +commit_description="Includes 545 examples"
+```
+
+The command will output a link to the created Pull Request:
+```bash
+[Nemo-Gym] - Pull Request created: https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/discussions/1
+```
+
+:::{note}
+The commit_message and commit_description parameters work for both direct pushes and Pull Requests. If not provided, HuggingFace auto-generates a commit message based on the filename.
+:::
+
+
+## Deleting Datasets from Gitlab
 You can optionally pass a `+delete_from_gitlab=true` flag to the above command, which will delete the model and all of its artifacts from Gitlab. By default, this is set to `False`.
 ```bash
 resource_config_path="resources_servers/multineedle/configs/multineedle.yaml"
@@ -59,7 +98,7 @@ ng_upload_dataset_to_hf \
 
 There will be a confirmation dialog to confirm the deletion:
 ```bash
-[Nemo-Gym] - Dataset uploaded successful
+[Nemo-Gym] - Dataset upload successful
 [Nemo-Gym] - Found model 'fs-test' in the registry. Are you sure you want to delete it from Gitlab? [y/N]:
 ```
 
@@ -83,13 +122,28 @@ ng_delete_dataset_from_gitlab \
 Gitlab model names are case sensitive. There can be models named 'My_Model' and 'my_model' living simultaneously in the registry. When uploading to Huggingface with the intention of deleting Gitlab artifacts, be sure the casing of your Huggingface dataset name matches that of Gitlab's.
 :::
 
+
+## Downloading Datasets from Huggingface
 Downloading a dataset from Huggingface is straightforward:
+
+**For structured datasets (with train/validation/test splits):**
 ```bash
 ng_download_dataset_from_hf \
-    +repo_id=NVIDIA/NeMo-Gym-Instruction_Following-multineedle-{your dataset name} \
-    +artifact_fpath=multineedle_benchmark.jsonl \
-    +output_fpath=data/multineedle_benchmark_hf.jsonl
+    +repo_id=nvidia/Nemotron-RL-knowledge-mcqa \
+    +output_dirpath=data/mcqa \
+    +split=train
 ```
+The `split` parameter is optional. If omitted, all available splits will be downloaded as separate JSONL files.
+
+
+**For raw file repositories (with specific JSONL files):**
+```bash
+ng_download_dataset_from_hf \
+    +repo_id=nvidia/Nemotron-RL-instruction_following \
+    +output_dirpath=data/instruction_following \
+    +artifact_fpath=instruction_following.jsonl
+```
+Use `artifact_fpath` when the HuggingFace repo contains raw/arbitrary JSONL files rather than structured dataset splits. You cannot specify both `split` and `artifact_fpath`.
 
 
 # How To: Prepare and validate data for PR submission or RL training
@@ -120,6 +174,9 @@ example_multi_step_simple_agent:
           dataset_name: example_multi_step
           version: 0.0.1
           artifact_fpath: example_multi_step/train.jsonl
+        huggingface_identifier:
+          repo_id: nvidia/Nemotron-RL-instruction_following
+          artifact_fpath: instruction_following.jsonl
         license: Apache 2.0
       - name: validation
         type: validation
@@ -130,6 +187,9 @@ example_multi_step_simple_agent:
           dataset_name: example_multi_step
           version: 0.0.1
           artifact_fpath: example_multi_step/validation.jsonl
+        huggingface_identifier:
+          repo_id: nvidia/Nemotron-RL-instruction_following
+          artifact_fpath: if_validation.jsonl
         license: Apache 2.0
       - name: example
         type: example
@@ -142,7 +202,8 @@ A dataset object consists of:
 - Type: train, validation, or example. Train and validation are as used in NeMo RL or other train frameworks. More information about the example type is in the next section.
 - Jsonl fpath: the local file path to your jsonl file for this dataset.
 - Num repeats: optionally repeat each row when preparing or collating data. Defaults to 1 if unspecified.
-- Gitlab identifier: The remote path to the dataset as held in the Gitlab dataset registry. This field is required for train and validation datasets. (Not required for example datasets since those are required to be committed to Git).
+- Gitlab identifier: (NVIDIA internal) The remote path to the dataset as held in the Gitlab dataset registry. This field is required for train and validation datasets. (Not required for example datasets since those are required to be committed to Git).
+- HuggingFace identifier: (Public) The remote path to the dataset on HuggingFace. Contains `repo_id` (required) and optionally `artifact_fpath` for raw file repos. If `artifact_fpath` is omitted, the datasets library will infer the `split` from the dataset `type`.
 - License: The license of that dataset. Required for train and validation datasets and not required for example datasets, similar in principle to the Gitlab identifier.
 - Start idx, end idx: used for slicing your dataset.
 ```yaml
@@ -153,6 +214,9 @@ A dataset object consists of:
     dataset_name: example_multi_step
     version: 0.0.1
     artifact_fpath: example_multi_step/validation.jsonl
+  huggingface_identifier:
+    repo_id: nvidia/example_multi_step
+    artifact_fpath: example_validation.jsonl
   license: Apache 2.0
 ```
 
@@ -165,11 +229,32 @@ responses_api_models/openai_model/configs/openai_model.yaml"
 ng_prepare_data "+config_paths=[$config_paths]" \
     +output_dirpath=data/example_multi_step \
     +mode=example_validation
+```
 
-# Run NeMo Gym servers the exact same way with the same configs!
+To download missing datasets automatically, add +should_download=true. By default, datasets are downloaded from HuggingFace:
+```bash
+ng_prepare_data "+config_paths=[$config_paths]" \
+    +output_dirpath=data/example_multi_step \
+    +mode=train_preparation \
+    +should_download=true
+```
+
+For NVIDIA internal users, you can download from GitLab instead:
+
+```bash
+ng_prepare_data "+config_paths=[$config_paths]" \
+    +output_dirpath=data/example_multi_step \
+    +mode=train_preparation \
+    +should_download=true \
+    +data_source=gitlab
+```
+
+Run NeMo Gym servers the exact same way with the same configs!
+```bash
 ng_run "+config_paths=[$config_paths]"
 ```
 
+
 The `ng_prepare_data` command will:
 1. Attempt to load all the datasets you specified from disk. Missing datasets will be reported before any processing is done.
 2. For each dataset, read example by example. Check the format and report the filepaths and indices/ranges of offending examples if any.