diff --git a/README.md b/README.md index 39d8b59c1..ab9785519 100644 --- a/README.md +++ b/README.md @@ -74,7 +74,7 @@ policy_model_name: gpt-4.1-2025-04-14" > env.yaml ``` > [!NOTE] -> We use GPT-4.1 in this quickstart because it provides low latency (no reasoning step) and works reliably out-of-the-box. NeMo Gym is **not limited to OpenAI models**—you can use self-hosted models via vLLM or any OpenAI-compatible inference server. See the [documentation](https://docs.nvidia.com/nemo/gym/latest/get-started/setup-installation.html) for details. +> We use GPT-4.1 in this quickstart because it provides low latency (no reasoning step) and works reliably out-of-the-box. NeMo Gym is **not limited to OpenAI models**—you can use self-hosted models via vLLM or any OpenAI-compatible inference server. See the [documentation](https://docs.nvidia.com/nemo/gym/latest/get-started/detailed-setup.html) for details. ### Start Servers diff --git a/docs/about/concepts/index.md b/docs/about/concepts/index.md index 75f67af47..78f32cc93 100644 --- a/docs/about/concepts/index.md +++ b/docs/about/concepts/index.md @@ -23,25 +23,25 @@ Each explainer below covers one foundational idea and links to deeper material. :::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Core Components :link: core-components :link-type: ref -Understand how Models, Resources, and Agents remain decoupled yet coordinated as independent HTTP services, including which endpoints each component exposes. +Understand the three server components that make up a training environment. +::: + +:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Configuration System +:link: configuration-concepts +:link-type: ref +Understand how servers are configured and connected. ::: :::{grid-item-card} {octicon}`check-circle;1.5em;sd-mr-1` Task Verification :link: task-verification :link-type: ref -Explore how resource servers score agent outputs with `verify()` implementations that transform correctness, quality, and efficiency checks into reward signals. +Understand the importance of verification and common implementation patterns. ::: :::{grid-item-card} {octicon}`iterations;1.5em;sd-mr-1` Key Terminology :link: key-terminology :link-type: ref -Essential vocabulary for agent training, RL workflows, and NeMo Gym. This glossary defines terms you'll encounter throughout the tutorials and documentation. -::: - -:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Configuration System -:link: configuration-concepts -:link-type: ref -Understand the three-level config pattern and why server IDs and implementations are independent choices. +Essential vocabulary for agent training, RL workflows, and NeMo Gym. ::: :::: diff --git a/docs/about/concepts/task-verification.md b/docs/about/concepts/task-verification.md index 5bfc3e5f6..c9750bf2d 100644 --- a/docs/about/concepts/task-verification.md +++ b/docs/about/concepts/task-verification.md @@ -1,6 +1,6 @@ (task-verification)= -# Task verification +# Task Verification **Goal**: Understand what task verification is and how rewards drive model training. diff --git a/docs/about/index.md b/docs/about/index.md index 8705eaa13..24fff6726 100644 --- a/docs/about/index.md +++ b/docs/about/index.md @@ -28,17 +28,16 @@ Embedding custom training environments directly within training frameworks is co - Interoperable with existing environments, systems, and RL training frameworks - Growing collection of training environments and datasets for Reinforcement Learning from Verifiable Reward (RLVR) +NeMo Gym achieves this through a modular, server-based architecture. + :::{tip} The name "NeMo Gym" comes from historical reinforcement learning literature, where the word "Gym" refers to a collection of RL training environments! ::: ## Core Components -A training environment consists of three server components: +A training environment in NeMo Gym consists of three server components: - **Agents**: Orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification. - **Models**: Stateless text generation using LLM inference endpoints (OpenAI-compatible or vLLM). - **Resources**: Define tasks, tool implementations, and verification logic. Provide what agents need to run and score rollouts. - - **Example - Web Search**: Task = answer knowledge questions; Tools = `search()` and `browse()`; Verification = checks if answer matches expected result - - **Example - Math with Code**: Task = solve math problems; Tool = `execute_python()`; Verification = checks if final answer is mathematically correct - - **Example - Code Generation**: Task = implement solution to coding problem; Tools = none; Verification = runs unit tests against generated code diff --git a/docs/conf.py b/docs/conf.py index 5b6bdc379..1079f9cd1 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -84,6 +84,13 @@ html_theme = "nvidia_sphinx_theme" html_theme_options = { + "icon_links": [ + { + "name": "GitHub", + "url": "https://github.com/NVIDIA-NeMo/Gym", + "icon": "fa-brands fa-github", + } + ], "switcher": { "json_url": "../versions1.json", "version_match": release, diff --git a/docs/get-started/setup-installation.md b/docs/get-started/detailed-setup.md similarity index 99% rename from docs/get-started/setup-installation.md rename to docs/get-started/detailed-setup.md index 8ddea69b9..838e98a84 100644 --- a/docs/get-started/setup-installation.md +++ b/docs/get-started/detailed-setup.md @@ -1,6 +1,6 @@ -(gs-setup-installation)= +(gs-detailed-setup)= -# Setup and Installation +# Detailed Setup Guide :::{card} diff --git a/docs/get-started/index.md b/docs/get-started/index.md index 062619e26..3f9209c5b 100644 --- a/docs/get-started/index.md +++ b/docs/get-started/index.md @@ -1,58 +1,137 @@ (gs-index)= -# Get Started with NeMo Gym +# Quickstart -**Estimated Time**: 25-30 minutes - -This guided tutorial is designed for users new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale. - -**By the end of this tutorial series, you will have:** - -✅ A working NeMo Gym installation with servers running -✅ The ability to generate rollouts for RL training +This is the quickstart—get running in under 5 minutes. For a more detailed walkthrough, see the [Detailed Setup Guide](detailed-setup.md). ## Before You Start -Make sure you have these prerequisites ready before beginning the tutorials: +Make sure you have these prerequisites ready: - **Git** for cloning the repository - **OpenAI API key** with available credits (requires ~$0.01-0.10 for all tutorials) -- Basic command-line familiarity --- -## Tutorial Path +(gs-quickstart)= +## Quickstart + +Follow the tabs sequentially to install NeMo Gym, start the servers, and collect your first verified rollouts for RL training. + +::::{tab-set} + +:::{tab-item} 1. Set Up + +**Install NeMo Gym** + +Get NeMo Gym installed and ready to use: + +```bash +# Clone the repository +git clone git@github.com:NVIDIA-NeMo/Gym.git +cd Gym + +# Install UV (Python package manager) +curl -LsSf https://astral.sh/uv/install.sh | sh +source $HOME/.local/bin/env + +# Create virtual environment +uv venv --python 3.12 +source .venv/bin/activate + +# Install NeMo Gym +uv sync --extra dev --group docs +``` + +**Configure Your API Key** + +Create an `env.yaml` file that contains your OpenAI API key and the {term}`Policy Model` you want to use. Replace `your-openai-api-key` with your actual key. This file helps keep your secrets out of version control while still making them available to NeMo Gym. + +```bash +echo "policy_base_url: https://api.openai.com/v1 +policy_api_key: your-openai-api-key +policy_model_name: gpt-4.1-2025-04-14" > env.yaml +``` + +> **Note:** We use GPT-4.1 in this quickstart because it provides low latency (no reasoning step) and works reliably out-of-the-box. NeMo Gym is **not limited to OpenAI models**—you can use self-hosted models via vLLM or any OpenAI-compatible inference server that supports function calling. Refer to the [Detailed Setup Guide](detailed-setup.md) for details. + +::: + +:::{tab-item} 2. Start Servers + +**Terminal 1** (start servers): + +```bash +# Start servers (this will keep running) +config_paths="resources_servers/example_simple_weather/configs/simple_weather.yaml,\ +responses_api_models/openai_model/configs/openai_model.yaml" +ng_run "+config_paths=[${config_paths}]" +``` + +**Terminal 2** (interact with agent): -Follow these tutorials in sequence to start collecting rollouts with NeMo Gym: +```bash +# In a NEW terminal, activate environment +source .venv/bin/activate -::::{grid} 1 1 1 1 +# Interact with your agent +python responses_api_agents/simple_agent/client.py +``` + +::: + +:::{tab-item} 3. Collect Rollouts + +**Terminal 2** (keep servers running in Terminal 1): + +```bash +# Create a simple dataset with one query +echo '{"responses_create_params":{"input":[{"role":"developer","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather in Seattle?"}]}}' > weather_query.jsonl + +# Collect verified rollouts +ng_collect_rollouts \ + +agent_name=simple_weather_simple_agent \ + +input_jsonl_fpath=weather_query.jsonl \ + +output_jsonl_fpath=weather_rollouts.jsonl + +# View the result +cat weather_rollouts.jsonl | python -m json.tool +``` + +This generates training data with verification scores! + +::: + +:::{tab-item} 4. Clean Up Servers + +**Terminal 1** with the running servers: Ctrl+C to stop the `ng_run` process. + +::: +:::: + +## What's Next? + +Now that you can generate rollouts, choose your path: + +::::{grid} 1 1 2 2 :gutter: 3 -:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` 1. Setup and Installation -:link: setup-installation -:link-type: doc +:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Use an Existing Training Environment +:link: https://github.com/NVIDIA-NeMo/Gym#-available-resource-servers -Get NeMo Gym installed and servers running with your first successful agent interaction. +Browse the available resource servers on GitHub to find a training-ready environment that matches your goals. +++ -{bdg-secondary}`environment` {bdg-secondary}`first-run` +{bdg-secondary}`github` {bdg-secondary}`resource-servers` ::: -:::{grid-item-card} {octicon}`iterations;1.5em;sd-mr-1` 2. Rollout Collection -:link: rollout-collection +:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Build a Custom Training Environment +:link: ../tutorials/creating-resource-server :link-type: doc -Generate your first batch of rollouts and understand how they become training data. +Implement or integrate existing tools and define task verification logic. +++ -{bdg-secondary}`training-data` {bdg-secondary}`scale` +{bdg-secondary}`tutorial` {bdg-secondary}`custom-tools` ::: :::: ---- - -:::{tip} -**New to reinforcement learning?** Do not worry—these tutorials introduce RL concepts naturally as you learn rollout collection. - -- For deeper conceptual understanding, explore the [About](../about/index.md) section. -- For quick definitions, refer to the [Glossary](../about/concepts/key-terminology.md). -::: \ No newline at end of file diff --git a/docs/get-started/rollout-collection.md b/docs/get-started/rollout-collection.md index 693c01a1d..64250d1af 100644 --- a/docs/get-started/rollout-collection.md +++ b/docs/get-started/rollout-collection.md @@ -1,6 +1,6 @@ (gs-collecting-rollouts)= -# Collecting Rollouts +# Rollout Collection In the previous tutorial, you set up NeMo Gym and ran your first agent interaction. But to train an agent with reinforcement learning, you need hundreds or thousands of these interactions—each one scored and saved. That's what rollout collection does. @@ -18,12 +18,12 @@ In the previous tutorial, you set up NeMo Gym and ran your first agent interacti ::: -:::{button-ref} setup-installation +:::{button-ref} detailed-setup :color: secondary :outline: :ref-type: doc -← Previous: Setup and Installation +← Previous: Detailed Setup Guide ::: --- @@ -32,7 +32,7 @@ In the previous tutorial, you set up NeMo Gym and ran your first agent interacti Make sure you have: -- ✅ Completed [Setup and Installation](setup-installation.md) +- ✅ Completed [Detailed Setup Guide](detailed-setup.md) - ✅ Servers still running (or ready to restart them) - ✅ `env.yaml` configured with your OpenAI API key - ✅ Virtual environment activated @@ -56,7 +56,7 @@ Each line contains a `responses_create_params` object with: ## 2. Verify Servers Are Running -If you still have servers running from the [Setup and Installation](setup-installation.md) tutorial, proceed to the next step. +If you still have servers running from the [Detailed Setup Guide](detailed-setup.md) tutorial, proceed to the next step. If not, start them again: @@ -198,7 +198,9 @@ Congratulations! You now have a working NeMo Gym installation and understand how :::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Use an Existing Training Environment :link: https://github.com/NVIDIA-NeMo/Gym#-available-resource-servers -Browse the available resource servers to find a training-ready environment that matches your goals. +Browse the available resource servers on GitHub to find a training-ready environment that matches your goals. ++++ +{bdg-secondary}`github` {bdg-secondary}`resource-servers` ::: :::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Build a Custom Training Environment @@ -206,6 +208,8 @@ Browse the available resource servers to find a training-ready environment that :link-type: doc Implement or integrate existing tools and define task verification logic. ++++ +{bdg-secondary}`tutorial` {bdg-secondary}`custom-tools` ::: :::: diff --git a/docs/index.md b/docs/index.md index db07d5ce8..f7007fc14 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,130 +1,148 @@ +--- +description: "NeMo Gym is an open-source library for building reinforcement learning (RL) training environments for large language models (LLMs)" +categories: + - documentation + - home +tags: + - reinforcement-learning + - llm-training + - rollout-collection + - agent-environments +personas: + - Data Scientists + - Machine Learning Engineers + - RL Researchers +difficulty: beginner +content_type: index +--- + (gym-home)= # NeMo Gym Documentation -[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is a library for building reinforcement learning (RL) training environments for large language models (LLMs). It provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. +[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym) is a library for building reinforcement learning (RL) training environments for large language models (LLMs). NeMo Gym provides infrastructure to develop environments, scale rollout collection, and integrate seamlessly with your preferred training framework. A training environment consists of three server components: **Agents** orchestrate the rollout lifecycle—calling models, executing tool calls via resources, and coordinating verification. **Models** provide stateless text generation using LLM inference endpoints. **Resources** define tasks, tool implementations, and verification logic. -## Quickstart - -Run a training environment and start collecting rollouts for training in under 5 minutes. - -::::{tab-set} - -:::{tab-item} 1. Set Up +````{div} sd-d-flex-row +```{button-ref} gs-quickstart +:ref-type: ref +:color: primary +:class: sd-rounded-pill sd-mr-3 -**Install NeMo Gym** - -Get NeMo Gym installed and ready to use: - -```bash -# Clone the repository -git clone git@github.com:NVIDIA-NeMo/Gym.git -cd Gym - -# Install UV (Python package manager) -curl -LsSf https://astral.sh/uv/install.sh | sh -source $HOME/.local/bin/env - -# Create virtual environment -uv venv --python 3.12 -source .venv/bin/activate - -# Install NeMo Gym -uv sync --extra dev --group docs +Quickstart ``` -**Configure Your API Key** - -Create an `env.yaml` file that contains your OpenAI API key and the {term}`Policy Model` you want to use. Replace `your-openai-api-key` with your actual key. This file helps keep your secrets out of version control while still making them available to NeMo Gym. +```{button-ref} tutorials/index +:ref-type: doc +:color: secondary +:class: sd-rounded-pill -```bash -echo "policy_base_url: https://api.openai.com/v1 -policy_api_key: your-openai-api-key -policy_model_name: gpt-4.1-2025-04-14" > env.yaml +Explore Tutorials ``` +```` -> **Note:** We use GPT-4.1 in this quickstart because it provides low latency (no reasoning step) and works reliably out-of-the-box. NeMo Gym is **not limited to OpenAI models**—you can use self-hosted models via vLLM or any OpenAI-compatible inference server that supports function calling. See the setup guide for details. - -::: - -:::{tab-item} 2. Start Servers - -**Terminal 1** (start servers): - -```bash -# Start servers (this will keep running) -config_paths="resources_servers/example_single_tool_call/configs/example_single_tool_call.yaml,\ -responses_api_models/openai_model/configs/openai_model.yaml" -ng_run "+config_paths=[${config_paths}]" -``` +--- -**Terminal 2** (interact with agent): +## Introduction to NeMo Gym -```bash -# In a NEW terminal, activate environment -source .venv/bin/activate +Understand NeMo Gym's purpose and core components before diving into tutorials. -# Interact with your agent -python responses_api_agents/simple_agent/client.py -``` +::::{grid} 1 2 2 2 +:gutter: 1 1 1 2 +:::{grid-item-card} {octicon}`book;1.5em;sd-mr-1` About NeMo Gym +:link: about/index +:link-type: doc +Motivation and benefits of NeMo Gym. ++++ +{bdg-secondary}`motivation` {bdg-secondary}`benefits` ::: -:::{tab-item} 3. Collect Rollouts +:::{grid-item-card} {octicon}`gear;1.5em;sd-mr-1` Concepts +:link: about/concepts/index +:link-type: doc +Core components, configuration, verification and RL terminology. ++++ +{bdg-secondary}`agents` {bdg-secondary}`models` {bdg-secondary}`resources` +::: -**Terminal 2** (keep servers running in Terminal 1): +:::{grid-item-card} {octicon}`globe;1.5em;sd-mr-1` Ecosystem +:link: about/ecosystem +:link-type: doc +Understand how NeMo Gym fits within the NVIDIA NeMo Framework. ++++ +{bdg-secondary}`nemo-framework` +::: -```bash -# Create a simple dataset with one query -echo '{"responses_create_params":{"input":[{"role":"developer","content":"You are a helpful assistant."},{"role":"user","content":"What is the weather in Seattle?"}]}}' > weather_query.jsonl +:::: -# Collect verified rollouts -ng_collect_rollouts \ - +agent_name=single_tool_call_simple_agent \ - +input_jsonl_fpath=weather_query.jsonl \ - +output_jsonl_fpath=weather_rollouts.jsonl +## Get Started -# View the result -cat weather_rollouts.jsonl | python -m json.tool -``` +Install and run NeMo Gym to start collecting rollouts. -This generates training data with verification scores! +::::{grid} 1 2 2 2 +:gutter: 1 1 1 2 +:::{grid-item-card} {octicon}`rocket;1.5em;sd-mr-1` Quickstart +:link: get-started/index +:link-type: doc +Run a training environment and start collecting rollouts in under 5 minutes. ::: -:::{tab-item} 4. Clean Up Servers - -**Terminal 1** with the running servers: Ctrl+C to stop the `ng_run` process. - +:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Detailed Setup Guide +:link: get-started/detailed-setup +:link-type: doc +Detailed walkthrough of running your first training environment. ++++ +{bdg-secondary}`environment` {bdg-secondary}`configuration` ::: -:::: ---- +:::{grid-item-card} {octicon}`iterations;1.5em;sd-mr-1` Rollout Collection +:link: get-started/rollout-collection +:link-type: doc +Collect and view rollouts ++++ +{bdg-secondary}`rollouts` {bdg-secondary}`training-data` +::: -## What's Next? +:::: -Now that you can generate rollouts, choose your path: +## Tutorials -::::{grid} 1 1 2 2 -:gutter: 3 +Hands-on tutorials to build and customize your training environments. -:::{grid-item-card} {octicon}`package;1.5em;sd-mr-1` Use an Existing Training Environment -:link: https://github.com/NVIDIA-NeMo/Gym#-available-resource-servers +::::{grid} 1 2 2 2 +:gutter: 1 1 1 2 -Browse the available resource servers to find a training-ready environment that matches your goals. +:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Build a Resource Server +:link: tutorials/creating-resource-server +:link-type: doc +Implement or integrate existing tools and define task verification logic. ++++ +{bdg-secondary}`custom-environments` {bdg-secondary}`tools` ::: -:::{grid-item-card} {octicon}`tools;1.5em;sd-mr-1` Build a Custom Training Environment -:link: tutorials/creating-resource-server +:::{grid-item-card} {octicon}`database;1.5em;sd-mr-1` Offline Training (SFT, DPO) +:link: tutorials/offline-training-w-rollouts :link-type: doc +Train with SFT or DPO using collected rollouts. ++++ +{bdg-secondary}`sft` {bdg-secondary}`dpo` +::: -Implement or integrate existing tools and define task verification logic. +:::{grid-item-card} {octicon}`zap;1.5em;sd-mr-1` RL Training with NeMo RL +:link: tutorials/rl-training-with-nemo-rl +:link-type: doc +Train with GRPO using NeMo RL and NeMo Gym. ++++ +{bdg-secondary}`grpo` {bdg-secondary}`nemo-rl` ::: :::: +--- + ```{toctree} :hidden: Home @@ -135,7 +153,7 @@ Home :hidden: :maxdepth: 2 -about/index.md +Overview Concepts Ecosystem ``` @@ -145,12 +163,11 @@ Ecosystem :hidden: :maxdepth: 1 -Overview -get-started/setup-installation.md -get-started/rollout-collection.md +Quickstart +Detailed Setup Guide +Rollout Collection ``` - ```{toctree} :caption: Tutorials :hidden: diff --git a/docs/tutorials/creating-resource-server.md b/docs/tutorials/creating-resource-server.md index b54dbf101..9f8ffd255 100644 --- a/docs/tutorials/creating-resource-server.md +++ b/docs/tutorials/creating-resource-server.md @@ -13,7 +13,7 @@ Learn how to create a custom resource server to implement tools, verifiers, and :::{grid-item-card} {octicon}`bookmark;1em;` **Prerequisites** -- Completed {doc}`../get-started/setup-installation` +- Completed {doc}`../get-started/detailed-setup` - Basic Python and FastAPI knowledge :::