From 960723ad71910778877209b5e36abc26779f6efd Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Tue, 9 Dec 2025 12:29:15 -0500 Subject: [PATCH 1/2] docs(ecosystem): add NeMo Data Designer section with Gym differentiation and usage guidance Signed-off-by: Lawrence Lane --- docs/about/ecosystem.md | 66 +++++++++++++++++++++++++++++++++++++++++ docs/conf.py | 7 +++++ 2 files changed, 73 insertions(+) diff --git a/docs/about/ecosystem.md b/docs/about/ecosystem.md index 8947c3609..7af8aa823 100644 --- a/docs/about/ecosystem.md +++ b/docs/about/ecosystem.md @@ -19,8 +19,74 @@ The [NeMo Framework](https://github.com/NVIDIA-NeMo) is NVIDIA's GPU-accelerated * **NeMo RL**: Scalable reinforcement learning toolkit * **NeMo Gym**: RL environment infrastructure and rollout collection (this project) * **NeMo Curator**: Data preprocessing and curation +* **NeMo Data Designer**: Synthetic data generation for post-training * **NeMo Evaluator**: Model evaluation and benchmarking * **NeMo Guardrails**: Programmable safety guardrails * And more... **NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks. + +--- + +## NeMo Gym and NeMo Data Designer + +[NeMo Data Designer](https://nvidia-nemo.github.io/DataDesigner/) is a general framework for generating high-quality synthetic data from scratch or using seed data. Both tools generate training data, but they serve different use cases and employ different generation strategies. + +### Key Differences + +::::{grid} 1 1 2 2 +:gutter: 3 + +:::{grid-item-card} NeMo Data Designer +**Synthetic data generation** + +Generates training data using LLM prompting combined with statistical samplers. Data Designer excels at creating diverse datasets with controlled distributions, meaningful correlations between fields, and built-in validation. + +**Best for:** +- Generating diverse post-training datasets at scale +- Creating data with specific statistical properties +- Simulating tool call patterns and responses +- Rapid iteration on data characteristics +::: + +:::{grid-item-card} NeMo Gym +**Real environment interactions** + +Generates training data through actual interactions with live environments. Gym executes real tool calls, runs actual verification logic, and produces reward scores from genuine environment feedback. + +**Best for:** +- Collecting rollouts with verified reward signals +- Training agents that need real tool execution +- Environments requiring actual API calls or code execution +- RL training with ground-truth verification +::: + +:::: + +### When to Use Each Tool + +| Use Case | Recommended Tool | +|----------|------------------| +| Generate diverse SFT data with controlled distributions | Data Designer | +| Simulate tool calling patterns for initial training | Data Designer | +| Collect RL rollouts with real reward signals | Gym | +| Execute actual tools (code, APIs, search) during generation | Gym | +| Create datasets with statistical diversity guarantees | Data Designer | +| Train agents on verified real-world interactions | Gym | +| Rapid prototyping of data characteristics | Data Designer | +| Ground-truth verification from live environments | Gym | + +### Complementary Workflows + +Data Designer and Gym complement each other in a typical training pipeline: + +1. **Bootstrap with Data Designer**: Generate initial synthetic datasets for supervised fine-tuning (SFT). Use statistical samplers to ensure diversity and LLM columns to create realistic tool-calling patterns. + +2. **Refine with Gym**: Transition to Gym for reinforcement learning. Collect rollouts from real environment interactions where tool calls execute against actual systems and verification produces ground-truth rewards. + +3. **Iterate**: Use insights from Gym rollouts to refine Data Designer configurations for the next training cycle. + +:::{seealso} +- [NeMo Data Designer Documentation](https://nvidia-nemo.github.io/DataDesigner/) +- [NeMo Data Designer GitHub Repository](https://github.com/NVIDIA-NeMo/DataDesigner) +::: diff --git a/docs/conf.py b/docs/conf.py index 5b6bdc379..1079f9cd1 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -84,6 +84,13 @@ html_theme = "nvidia_sphinx_theme" html_theme_options = { + "icon_links": [ + { + "name": "GitHub", + "url": "https://github.com/NVIDIA-NeMo/Gym", + "icon": "fa-brands fa-github", + } + ], "switcher": { "json_url": "../versions1.json", "version_match": release, From 93d4769ba16fae3aecaf17b8f8b968aa2a13e353 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Tue, 9 Dec 2025 12:30:50 -0500 Subject: [PATCH 2/2] docs(ecosystem): replace grid cards with list-table for key differences Signed-off-by: Lawrence Lane --- docs/about/ecosystem.md | 51 +++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 28 deletions(-) diff --git a/docs/about/ecosystem.md b/docs/about/ecosystem.md index 7af8aa823..473834466 100644 --- a/docs/about/ecosystem.md +++ b/docs/about/ecosystem.md @@ -34,34 +34,29 @@ The [NeMo Framework](https://github.com/NVIDIA-NeMo) is NVIDIA's GPU-accelerated ### Key Differences -::::{grid} 1 1 2 2 -:gutter: 3 - -:::{grid-item-card} NeMo Data Designer -**Synthetic data generation** - -Generates training data using LLM prompting combined with statistical samplers. Data Designer excels at creating diverse datasets with controlled distributions, meaningful correlations between fields, and built-in validation. - -**Best for:** -- Generating diverse post-training datasets at scale -- Creating data with specific statistical properties -- Simulating tool call patterns and responses -- Rapid iteration on data characteristics -::: - -:::{grid-item-card} NeMo Gym -**Real environment interactions** - -Generates training data through actual interactions with live environments. Gym executes real tool calls, runs actual verification logic, and produces reward scores from genuine environment feedback. - -**Best for:** -- Collecting rollouts with verified reward signals -- Training agents that need real tool execution -- Environments requiring actual API calls or code execution -- RL training with ground-truth verification -::: - -:::: +```{list-table} +:header-rows: 1 +:widths: 20 40 40 + +* - Aspect + - NeMo Data Designer + - NeMo Gym +* - **Approach** + - Synthetic data generation using LLM prompting combined with statistical samplers + - Real environment interactions with live tool execution +* - **Tool Calls** + - Simulates tool calling patterns and responses + - Executes actual tool calls against real systems +* - **Verification** + - Built-in validation with quality checks + - Ground-truth verification from live environments +* - **Rewards** + - Validation-based scoring + - Reward signals from genuine environment feedback +* - **Best For** + - Generating diverse post-training datasets at scale, creating data with specific statistical properties, rapid prototyping + - Collecting RL rollouts with verified rewards, training agents on real APIs or code execution +``` ### When to Use Each Tool