Skip to content

arc-agi resource server#105

Merged
bxyu-nvidia merged 14 commits intomainfrom
cmunley1/arc-agi
Jan 29, 2026
Merged

arc-agi resource server#105
bxyu-nvidia merged 14 commits intomainfrom
cmunley1/arc-agi

Conversation

@cmunley1
Copy link
Contributor

No description provided.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cmunley1 cmunley1 changed the title arc-agi resources server arc-agi Sep 26, 2025
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1 cmunley1 marked this pull request as ready for review October 1, 2025 17:43
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
@cmunley1
Copy link
Contributor Author

cmunley1 commented Oct 3, 2025

might remove berman agent before merging

@cmunley1 cmunley1 requested a review from bxyu-nvidia October 3, 2025 19:13
@cwing-nvidia cwing-nvidia added the resources-server Resources servers (math, code, etc.) label Oct 23, 2025
@cmunley1 cmunley1 changed the title arc-agi arc-agi resource server Oct 31, 2025
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
@bxyu-nvidia bxyu-nvidia merged commit e51ad14 into main Jan 29, 2026
5 checks passed
@bxyu-nvidia bxyu-nvidia deleted the cmunley1/arc-agi branch January 29, 2026 18:17
Kelvin0110 pushed a commit to Kelvin0110/Gym that referenced this pull request Feb 16, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Kelvin0110 added a commit to Kelvin0110/Gym that referenced this pull request Feb 16, 2026
Add Data Designer and links to ecosystem page (NVIDIA-NeMo#462)

Fixes NVIDIA-NeMo#450

Signed-off-by: Chris Wing <cwing@nvidia.com>

docs: Moved configuration system under about (NVIDIA-NeMo#420)

Moved configuration systems under "About" instead of "About>Concepts".
Also removed configuration mentions and examples from core abstraction
pages

Closes NVIDIA-NeMo#392 and
NVIDIA-NeMo#393

---------

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
Signed-off-by: L.B <llane@nvidia.com>
Signed-off-by: Chris Wing <cwing@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: L.B <llane@nvidia.com>
Co-authored-by: Chris Wing <cwing@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

docs: Training framework integration (NVIDIA-NeMo#439)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

docs: Improve server reference info (NVIDIA-NeMo#474)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

pyproject typos and grammar fixes (NVIDIA-NeMo#473)

Closes https://github.com/NVIDIA-NeMo/Internal-Planning/issues/132

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>

docs: Fix wrong count vs actual (NVIDIA-NeMo#482)

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

docs: home pg, quickstart move, gh icon (NVIDIA-NeMo#463)

- adds GH icon + link to global top nav
- rebuilds the home page to standard layout
- adds CTA to quickstart and tutorials
- moves quickstart into get started
- clarifies differences between the quickstart and more detailed
onboarding materials

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Chris Wing <cwing@nvidia.com>
Co-authored-by: Chris Wing <cwing@nvidia.com>

More single tool call filename updates cont (NVIDIA-NeMo#484)

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Fix NeMo Gym Pyproject links (NVIDIA-NeMo#486)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

docs: move FAQ (NVIDIA-NeMo#489)

moves how-to-faq to render under "references" and display as FAQ. no
material changes to the content.

Signed-off-by: Lawrence Lane <llane@nvidia.com>

docs: contribute section (NVIDIA-NeMo#490)

- move training content into new contribute section
- create contributing overview page
- add contributing section on home page with link to RL integrations
content hub

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Misc rollout fixes (NVIDIA-NeMo#447)

Signed-off-by: Peter Jin <pjin@nvidia.com>

Docs: Contribution Home & Dev Setup (NVIDIA-NeMo#494)

Added types of contribution to contribution overview and replicated dev
setup instructions from contributing.md to docs

---------

Signed-off-by: Chris Wing <cwing@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>

Add environment contribution docs (NVIDIA-NeMo#498)

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Chris Wing <cwing@nvidia.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

FAQ cleanup (NVIDIA-NeMo#499)

This PR removes redundant content from the FAQ and better organizes the
documentation structure.

**Removed redundant FAQ sections** now covered in dedicated
documentation:
- `ng_version` → `docs/reference/cli-commands.md`
- Config anatomy → `docs/reference/configuration.md` (section was
incomplete TODO)
- DCO and commit signing → `CONTRIBUTING.md` and
`docs/contribute/development-setup.md`
- Copyright errors → `docs/contribute/development-setup.md`
- CI/CD requirements → `docs/contribute/development-setup.md`

**Reorganized FAQ placement:**
- Moved `docs/how-to-faq.md` → `docs/reference/faq.md` (consistent with
other reference docs)
- Repositioned FAQ to bottom of Reference section (after Configuration,
CLI Commands, API Reference)
- Updated intro to clarify FAQ provides quick answers while
comprehensive docs are developed

---------

Signed-off-by: Chris Wing <cwing@nvidia.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>

Simplify contributing.md (NVIDIA-NeMo#500)

added links to contribute section of docs site and removed redundant
content.
links need to be verified after NVIDIA-NeMo#498 is merged to main

---------

Signed-off-by: Chris Wing <cwing@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: Lawrence Lane <llane@nvidia.com>

docs: End-to-end GRPO Training with NeMo RL tutorial [master branch] (NVIDIA-NeMo#481)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Co-authored-by: L.B. <llane@nvidia.com>
Co-authored-by: Frankie Siino <fsiino@nvidia.com>

Change to v0.1.1 release version (NVIDIA-NeMo#509)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Bump to v0.2.0 (NVIDIA-NeMo#510)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

reasoning-gym resource server (NVIDIA-NeMo#113)

single turn tasks across various domains: "Reasoning Gym is a
community-created Python library of procedural dataset generators and
algorithmically verifiable reasoning environments for training reasoning
models with reinforcement learning (RL). The goal is to generate
virtually infinite training data with adjustable complexity.

It currently provides more than 100 tasks over many domains, including
but not limited to algebra, arithmetic, computation, cognition,
geometry, graph theory, logic, and many common games."

Tested all 100+ environments for errors, and tested training on many,
demonstrated convergence.

This dataset of 100+ environments is also used in ProRL
(https://arxiv.org/abs/2505.24864)

---------

Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Co-authored-by: ARC Bot <arc-bot@example.com>

docs: Miscellaneous GRPO tutorial fixes (NVIDIA-NeMo#512)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

docs settings update (NVIDIA-NeMo#525)

Signed-off-by: Lawrence Lane <llane@nvidia.com>

List running server health and status (NVIDIA-NeMo#290)

This implements the `ng_status` command to list all running servers on
the system and ping for health check.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

VLLMModel supports chat template kwargs (NVIDIA-NeMo#538)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Peter Jin <pjin@nvidia.com>

Salesforce xlam-function-calling-60k resources server (NVIDIA-NeMo#262)

function calling resources server based on
https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>

python flag for colab venv installation (NVIDIA-NeMo#526)

need to set uv pip install python flag in colab environments when
launching servers

usage: `ng_run "+config_paths=[...]" +uv_pip_set_python=true `

defaults to false

For NVIDIA-NeMo#370

Needed for notebook here:
https://docs.unsloth.ai/models/nemotron-3#reinforcement-learning--nemo-gym

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>

add unsloth and trl to docs  (NVIDIA-NeMo#536)

adds a section for single-step training with unsloth and trl

not sure if these should be broken into separate sections. Left as one
since the same notebook works for both, but could be confusing.

not sure if we should also add more info about multi-step (hopefully)
coming soon.

Signed-off-by: Christian Munley <cmunley@nvidia.com>

docs: remove trl docs (NVIDIA-NeMo#543)

remove trl from docs, leaving just unsloth.

was unclear that they are together.

will make a trl section when we have a standalone trl notebook, or a
section on trl's docs too.

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>

Remove PlainTextResponse response_class (NVIDIA-NeMo#544)

https://nvidia.slack.com/archives/C08TG7CLEGY/p1766191655660079

Initially in NVIDIA-NeMo#290 , the `response_class=PlainTextResponse` was added to
the `/global_config_dict_yaml` endpoint of the HeadServer as an attempt
to debug parsing server info for the `ng_status` command. This lead to a
parsing error in `load_from_global_config`. This command now uses it's
own separate endpoint `server_instances`, so this needs to be removed.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Increase test_train_data_utils coverage (NVIDIA-NeMo#553)

Overall coverage failure threhsold is 95%, and test coverage is too low
for train_data_utils which brings down overall coverage of the
ng_dev_test suite. This covers some of those lingering test cases to
bring it from 89% to 97%.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

Generic Aviary integration (NVIDIA-NeMo#55)

This PR enables running Gym on Aviary environments. The two main
concepts:

- `AviaryResourcesServer`: maps to an Aviary `TaskDataset`: spawns and
manages multiple environments
- Unlike other `ResourcesServer`s, it doesn't take arbitrary task specs,
but an integer index into the `TaskDataset`. Otherwise we'd have data
defined in two places
- Instead of tool-specific endpoints, we have one `/step` endpoint. This
is because:
- Aviary environments define their transition function in `step()`.
Simply calling the bare tools can have undefined behavior (e.g. state
isn't updated properly)
- Aviary tools are not guaranteed to be available until `reset()` is
called.
  - A `/close` endpoint is added to tear down resources
- `AviaryAgent`: analogous to `SimpleAgent`, but:
- Request is an integer index (which is forwarded to
`AviaryResourcesServer`). In general, we expect `env.reset()` to provide
the first messages, not the calling code
  - All tool calls are sent to `/step`
  - We rely on the environment to tell us when we're done

Two concrete Aviary datasets/environments are integrated: GSM8k with a
calculator environment and BixBench with a notebook environment. Adding
new ones is pretty lightweight (most of the code in `notebook_app.py` is
from defining a BixBench-compatible environment, not the integration).

---------

Signed-off-by: Siddharth Narayanan <sid@futurehouse.org>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Siddharth Narayanan <sidnarayanan@users.noreply.github.com>
Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>

ng_dump_config sanity removes API key values (NVIDIA-NeMo#567)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Feat: Add reward profiling and fractional reward (NVIDIA-NeMo#83)

Adds more descriptive readme, reward profiling, and option for
fractional or binary reward.

Signed-off-by: abukharin-nv <abukharin@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>

Single step environments for SWE tasks (NVIDIA-NeMo#561)

This PR adds new environments for SWE tasks. The environments can be
used for single-step patch generation, test generation, and
LLM-as-a-judge. They have been tested for instances from SWE-bench,
SWE-Gym, and SWE-rebench. Patch and test generation environment runs
them against unittests in a containerized environment (Singularity).

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>

NL2Bash using Equivalency Judge  (NVIDIA-NeMo#569)

Integrating a new dataset using existing equivalency llm judge resource
server.

Data source: https://huggingface.co/datasets/jiacheng-ye/nl2bash
License:
https://github.com/TellinaTool/nl2bash/blob/3d1997669ac21c8e19fc1d12f60054d3142ef6c7/LICENSE
Train: 8040 unique samples
Validation: 50 unique, randomly sampled from train
Augmentation on the source (minimal): Added system prompt, output
formatting requirement

Example of env validation:
- base model: `nemotron-nano-3-30b-a3b-bf16` (GA checkpoint)
- Step 30 -> 12.50% on Terminal Bench Core
- https://wandb.ai/nvidia/nl2bash/runs/mxp1c3mm

Train:  nl2bash-super-train-0901.jsonl
Validation:  nl2bash-super-validation-0901.jsonl

https://gitlab-master.nvidia.com/bxyu/nemo-gym/-/ml/models/152/versions/176#/
```
ng_download_dataset_from_gitlab \
    +dataset_name=nl2bash-equivalency-judge \
    +version=0.0.1 \
    +artifact_fpath=nl2bash-super-train-0901.jsonl \
    +output_fpath=Gym/data/nl2bash/nl2bash-super-train-0901.jsonl
```

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

enh: use agent ref from data in rollouts (NVIDIA-NeMo#568)

Makes `agent_name` optional in `ng_collect_rollouts` CLI, allowing it to
use `agent_ref` from each data row instead.

The NeMo-RL training code already respects per-row `agent_ref`, but the
Gym CLI (`ng_collect_rollouts`) required a single hardcoded
`agent_name`. This prevented multi-agent rollout collection via CLI.

- `rollout_collection.py`: Made `agent_name` field optional with
`default=None`
- Use `config.agent_name` if specified; otherwise fall back to
`row["agent_ref"]["name"]`
- Added validation error if neither source provides an agent name

| Before | After |
|--------|-------|
| `+agent_name=...` required | `+agent_name=...` optional |
| All rows use same agent | Rows can use different agents via
`agent_ref` |

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>

FastAPI worker support (NVIDIA-NeMo#566)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Local vLLM model and other misc improvements (NVIDIA-NeMo#558)

Inspired by
https://github.com/NVIDIA-NeMo/Gym/pull/318/files#diff-b56c7f31b7793b3a4ac265f84f4c84216f1ed15a3fbee855da9674a7da8714ff
by @pjin-nvidia

---------

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Update math_with_judge artifact paths (NVIDIA-NeMo#582)

The default artifact paths for the math_with_judge resource server
doesn't match the filenames for the provided dataset
(nvidia/Nemotron-RL-math-OpenMathReasoning) [as saved on Hugging
Face](https://huggingface.co/datasets/nvidia/Nemotron-RL-math-OpenMathReasoning/tree/main).
This results in an error when attempting to download the files
automatically from Hugging Face. The artifact paths for both training
and validation need to be updated with the names as shown on Hugging
Face for proper downloading.

Signed-off-by: Robert Clark <roclark@nvidia.com>

Add Hugging Face identifier for coding resource (NVIDIA-NeMo#583)

The competitive coding resource config is missing a Hugging Face
identifier which prevents it from being downloaded via Hugging Face
using the data preparation tools.

Without the HF identifier run the following:

```
config_paths="responses_api_models/vllm_model/configs/vllm_model_for_training.yaml,resources_servers/math_with_judge/configs/math_with_judge.yaml,resources_servers/code_gen/configs/code_gen.yaml,resources_servers/workplace_assistant/configs/workplace_assistant.yaml,resources_servers/mcqa/configs/mcqa.yaml,resources_servers/instruction_following/configs/instruction_following.yaml,resources_servers/structured_outputs/configs/structured_outputs_json.yaml"
ng_prepare_data "+config_paths=[${config_paths}]" +output_dirpath=data/ +mode=train_preparation +should_download=true +data_source=huggingface
```

This will throw a warning:

```
Dataset `livecodebench_v5_validation` missing huggingface_identifier for HuggingFace backend
```

And eventually this error:

```
Traceback (most recent call last):
  File "/opt/nemo_rl_venv/bin/ng_prepare_data", line 10, in <module>
    sys.exit(prepare_data())
             ^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 819, in prepare_data
    data_processor.run(global_config_dict)
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 350, in run
    dataset_type_to_aggregate_metrics = self.validate_samples_and_aggregate_metrics(server_instance_configs)
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 657, in validate_samples_and_aggregate_metrics
    state = self._validate_samples_and_aggregate_metrics_single_dataset(d)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 553, in _validate_samples_and_aggregate_metrics_single_dataset
    for sample_idx, sample_dict_str in enumerate(self._iter_dataset_lines(dataset_config)):
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/nemo-rl/3rdparty/Gym-workspace/Gym/nemo_gym/train_data_utils.py", line 542, in _iter_dataset_lines
    with open(dataset_config.jsonl_fpath) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'resources_servers/code_gen/data/livecodebench_v5_2024-07-01_2025-02-01_validation.jsonl'
```

This fix will download the validation file as intended and resolve the
errors.

Signed-off-by: Robert Clark <roclark@nvidia.com>

updating swerl_gen config (NVIDIA-NeMo#588)

The train and val data paths are swapped in the config. This PR updates
them.

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>

NeMo Skills Tools Resource (NVIDIA-NeMo#571)

Adds a new resources server that integrates NeMo Skills tools (e.g.,
stateful Python code execution) with NeMo Gym's verification system.

**Key features:**
- Executes NeMo Skills tools via the ToolManager (e.g.,
`stateful_python_code_exec`)
- Delegates verification to other resources servers (e.g.,
`math_with_judge`)

The `ns_tools` server acts as a pass-through for verification. When
`verify()` is called, it delegates to the configured verifier (default:
`math_with_judge`):

```
ns_tools.verify(request)
    → POST to math_with_judge/verify
    → returns reward from math_with_judge
```

This allows using NeMo Skills tools while leveraging existing
verification infrastructure.

```json
{
  "id": "aime25-0",
  "question": "Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b$.",
  "expected_answer": "70",
  "verifier_type": "math_with_judge",
  "agent_ref": {"type": "responses_api_agents", "name": "ns_tools_simple_agent"},
  "responses_create_params": {
    "input": [
      {"role": "user", "content": "Solve the following math problem..."}
    ],
    "tools": [{
      "type": "function",
      "name": "stateful_python_code_exec",
      "description": "Execute Python code in a stateful environment.",
      "parameters": {
        "type": "object",
        "properties": {"code": {"type": "string"}},
        "required": ["code"]
      }
    }]
  }
}
```

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>

Add math_formal_lean resource server for Lean4 proof verification (NVIDIA-NeMo#563)

- Adds new `math_formal_lean` resource server for Lean4 formal theorem
proving
- Implements `/verify` endpoint that compiles proofs via sandbox
container and returns reward 1.0/0.0
- Includes MiniF2F dataset (244 test problems) with NeMo-Skills aligned
prompt format
- Comprehensive test suite (31 tests)

| File | Description |
|------|-------------|
| `app.py` | Resource server with verify endpoint |
| `sandbox_client.py` | HTTP client for Lean4 sandbox |
| `proof_utils.py` | Proof extraction/building utilities |
| `prepare_minif2f.py` | Dataset preparation script |
| `README.md` | Documentation with licensing info |

- [x] Unit tests pass (31/31)
- [x] End-to-end test with `ng_collect_rollouts` (0.2 reward on 5
samples)
- [x] Tested with gpt-5.1-codex-max model
- [x] Pre-commit lint checks pass

🤖 Generated with [Claude Code](https://claude.ai/code)

---------

Signed-off-by: Stephen Ge <stepheng@nvidia.com>
Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>

Aviary rollouts can be configured to return transitions or not (NVIDIA-NeMo#590)

Per title. This PR retains the current default of returning transitions,
but it is reasonable to change that default to match the other Gym
agents.

Signed-off-by: Siddharth Narayanan <sid@edisonscientific.com>

openhands (NVIDIA-NeMo#343)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

Terminus (judge only) Slicing Environment (NVIDIA-NeMo#594)

Refactoring the equivalency llm judge resource server into another
judge-based resource server. Main changes include removing regex logic
and cleaning up related configs to that.

Train data for this environment is still TBD, but a working version:
Data source: Sliced terminus prompts from different sources
train_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-traindata-char-tokenlen-32768.jsonl`
validation_jsonl_fpath:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/my-envs/terminus-sliced/char/nano3-ga-valdata-char-tokenlen-16384.jsonl`
example train config:
`/lustre/fsw/portfolios/llmservice/users/kbhardwaj/dev/nemo-rl-internal-yifu/training_configs/grpo_nanov3-nickel-capybara-4-nodes-judge-roff-512-49k-seq-reasoning-off-char-data-64x16-temp1-iter-1600.yaml`

Example of env validation:

base model: early sft checkpoint of nano v3
(`nano-v3-sft-64gbs-nickel-capybara-5e-5-constant-wd-0-load-bal-1e-4-lcx3-pretool-base-temp1-iter-0013600-hf`)
Step 50 -> 21.25% on Terminal Bench Core
https://wandb.ai/nvidia/terminus-sliced/runs/rs7c40hi

Next steps:
Will expand this PR with configurable verification options including
string matching, string similarity and openapi-based output schema
validation.

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

0.2.0 new doc stubs (NVIDIA-NeMo#581)

Added new doc directories/article stubs for the topics identified in
0.2.0 IA. generated initial pass of structure and some starter content.
This will enable contributors to focus more on the topic itself rather
than the site build/toctree elements. **Feel free to blow away any
initial content in these pages**.

All stubbed pages have been marked with 🟡 in the toctree for easy
discovery. remove 🟡 once the page is finished.

<img width="1800" height="1009" alt="image"
src="https://github.com/user-attachments/assets/a0bbc63d-05ce-44a2-b31f-fe4b8e0d43db"
/>

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Add tutorial for custom data preparation (NVIDIA-NeMo#596)

Added a complete example of preparing a custom dataset for usage with
NeMo Gym. The tutorial walks through downloading a dataset from Hugging
Face or modifying from a different source, adding the
"responses_create_params" field, writing a new resource server config,
and preparing the data with "ng_prepare_data". This tutorial can be used
as a guide for taking most arbitrary text-based datasets and modifying
them to a format that is compatible with NeMo Gym for post-training.

Signed-off-by: Robert Clark <roclark@nvidia.com>

Fix invalid ref in docs build (NVIDIA-NeMo#604)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Fix Nemo-Skills python tool to use http (NVIDIA-NeMo#606)

Spawn python_tool as an HTTP server subprocess in ns_tools for better
stability and ensure all rollouts get completed. This replaces
stdio-based tool execution with HTTP transport.

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>

Expanding Terminus Slicing PR (NVIDIA-NeMo#597)

Expanding PR to include reward logic for string similarity and schema
validation

---------

Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

Updating swerl_gen to support custom parsers (NVIDIA-NeMo#624)

Adding support for custom parsers and evaluation scripts. Prompt formats
for this environment are also simplified.

---------

Signed-off-by: Atefeh Sohrabizadeh <asohrabizade@nvidia.com>
Co-authored-by: Test User <test@example.com>

docs: unsloth fix (NVIDIA-NeMo#622)

for 5826079

Signed-off-by: cmunley1 <cmunley@nvidia.com>

arc-agi resource server (NVIDIA-NeMo#105)

Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>

arc readme (NVIDIA-NeMo#634)

Signed-off-by: Christian Munley <cmunley@nvidia.com>

VLLMModel: Add chat template kwargs on tokenize request (NVIDIA-NeMo#636)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

[docs] Add architecture diagrams (NVIDIA-NeMo#574)

Fixes NVIDIA-NeMo#292

This PR covers the rollout collection within NeMo Gym for standalone
usage (ie not used in conjuction with an RL training framework). For the
NeMo RL + Gym integration summary, I will add docs to the NeMo RL page,
and update Gym docs with a pointer to those for reference.

These docs cover:
- Control plane: `ng_run` startup sequence (CLI parsing, config loading,
Ray init, server spawning)
- Server architecture: Head server, uvicorn/FastAPI initialization
- HTTP request flow: Example rollout showing Agent -> Model -> Resources
interactions
- Data plane: `ng_collect_rollouts` flow starting from the headserver
discovery

This change also adds an extra dependency on `sphinxcontrib.mermaid` for
mermaid diagrams to render in the docs page

---------

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

feat: reward profiling (NVIDIA-NeMo#621)

addresses NVIDIA-NeMo#614

aggregates metrics from rollouts with num_repeats to create a reward
profiled dataset.

New cli command `ng_profile` accepts
- input_jsonl_fpath - original task dataset
- rollouts_jsonl_fpath - rollouts file from `ng_collect_rollouts` with
num_repeats > 1
- output_jsonl_fpath - output path for reward profiled task dataset
- pass_threshold - the reward threshold to count as pass in pass@k
calculations. Needed because some envs return partial rewards, or
rewards > 1 , so its not simple to just say reward = 1 or reward > 0 is
pass.

Example usage

```
ng_collect_rollouts \
+agent_name=reasoning_gym_simple_agent \
+input_jsonl_fpath=resources_servers/reasoning_gym/data/train_all.jsonl \
+output_jsonl_fpath=results/reasoning_gym_alltask_rollouts.jsonl \
+num_repeats=16

ng_profile \
    +input_jsonl_fpath=resources_servers/reasoning_gym/data/train_all.jsonl \
    +rollouts_jsonl_fpath=results/reasoning_gym_alltask_rollouts.jsonl \
    +output_jsonl_fpath=resources_servers/reasoning_gym/data/train_all_profiled.jsonl \
    +pass_threshold=1.0
```

This creates a new dataset with fields added, for example:

```
  "avg_reward": 1.0,
  "std_reward": 0.0,
  "min_reward": 1.0,
  "max_reward": 1.0,
  "total_samples": 16,
  "pass_rate": 1.0,
  "pass_rate_total": 16,
  "pass_rate_passed": 16,
  "pass_threshold": 1.0

```

As a full example, the original dataset look like:
```
{
  "responses_create_params": {
    "input": [
      {
        "role": "user",
        "content": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name."
      }
    ]
  },
  "question": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name.",
  "answer": "Richard",
  "metadata": {
    "source_dataset": "needle_haystack",
    "source_index": 0,
    "question": "Who savors playing the accordion? Reply only with a name.",
    "num_statements": 45,
    "difficulty": {
      "num_statements": [
        10,
        100
      ]
    }
  }
}
```

The output profiled dataset look like:
```
{
  "responses_create_params": {
    "input": [
      {
        "role": "user",
        "content": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name."
      }
    ]
  },
  "question": "Ghyll yearns for entrepreneurship. Oluwadamiloju finds joy in sculpting. Nikodem welcomes theatre. Sheriff is nuts about roller skating. Ata worships omelettes. Cahlum fancies ice cream. Geordan is devoted to organizing the pantry. Richard savors playing the accordion. Rholmark damns playing the clarinet. Jon savors cooking dinner. Taddy waves away swimming. Rubhan ridicules off-road vehicles. Bhaaldeen dotes martial arts. Grzegorz rejoices in collecting postcards. Niraj adores playing sudoku. Ritchie is fond of dumplings. Jakey esteems playing soccer. Modu disdains playing video games. Demetrius extols electric cars. Justinas desires botany. Shreeram respects segways. Rowen blasts limousines. Kalen prizes the color bronze. Ayyub is obsessed with folklore. Ryan-Lee finds joy in cleaning the refrigerator. Devan welcomes cleaning the gutters. Abu prizes weeding the garden. Kenzi mocks singing. Zenith adores camping. Ericlee admires snowboarding. Connan endorses eagles. Vrishin esteems playing the trumpet. Dissanayake extols ice cream. Marcel favors kindness. Nial laments building model airplanes. Craig-James sneers at goats. Mikee basks in zoology. Kyro is committed to the color black. Danniel approves of the color yellow. Dregan supports space exploration. Antoine bears scrubbing the tub. Alfy spurns the color blue. Madison-Jake disdains the color lemon. Lucus idolizes the color olive. Ramit curses washing the dishes. \nWho savors playing the accordion? Reply only with a name.",
  "answer": "Richard",
  "metadata": {
    "source_dataset": "needle_haystack",
    "source_index": 0,
    "question": "Who savors playing the accordion? Reply only with a name.",
    "num_statements": 45,
    "difficulty": {
      "num_statements": [
        10,
        100
      ]
    }
  },
  "avg_reward": 1.0,
  "std_reward": 0.0,
  "min_reward": 1.0,
  "max_reward": 1.0,
  "total_samples": 16,
  "pass_rate": 1.0,
  "pass_rate_total": 16,
  "pass_rate_passed": 16,
  "pass_threshold": 1.0
}
```

In this example, 16/16 got reward=1, so not a great example, but just
for example.

---------

Signed-off-by: Christian Munley <cmunley@nvidia.com>

docs: issue 626 (NVIDIA-NeMo#638)

Signed-off-by: Lawrence Lane <llane@nvidia.com>

ci: Enable the test job to build a wheel and publiish to test.pypi (NVIDIA-NeMo#651)

Enable the test job to build a wheel and publiish to test.pypi

* The workflow expects a .python-version to help build the wheel
* Update the package name from NeMo-Gym to nemo-gym to align with how
other packages are named
* This test job currently only runs when it merges to main or a release
branch. We sometimes get too many request errors with test pypi if it
runs too frequently

Example publishing to test pypi:
https://test.pypi.org/project/nemo-gym/0.2.2640rc0/

The Github job was already in the repo. I just had to flip an env var to
enable it.

---------

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

v1 of text-to-sql (NVIDIA-NeMo#648)

Text-to-SQL environment using LLM-as-a-judge

---------

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>

Yev/text to sql v1.1 (NVIDIA-NeMo#653)

minor update to simplify code and use xml tags

---------

Signed-off-by: Yev Meyer <ymeyer@nvidia.com>

Upstream Super 3 dev 20260205 (NVIDIA-NeMo#654)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

ns tools stability (NVIDIA-NeMo#658)

Summary
- Add disable_session_restore flag to skip O(n) session history replay
after sandbox worker restarts (enabled by default in config)
- Add verbose_tool_logging flag to optionally collect per-session timing
metrics and log per-call execution times (disabled by default)
- Decrease code execution timeout from 30s to 10s for faster failure on
runaway code
- Pass --disable-session-restore CLI flag through to python_tool
subprocess
- Update nemo-skills requirement to georgea/super-rl-02062026
(nemo-skills side changes for session restore and dependencies)

---------

Signed-off-by: George Armstrong <georgea@nvidia.com>

Bump package versions to fix security vulnerabilities (NVIDIA-NeMo#667)

Bump package versions to fix security vulnerabilities.
- urllib3: 2.5.0 -> 2.6.3
- mlflow: 3.3.2 -> 3.9.0
- fonttools: 4.59.2 -> 4.61.1
- aiohttp: 3.12.15 -> 3.13.3
- python-multipart: 0.0.20 -> 0.0.22
  - ray: 2.50.1 -> 2.52.1

  ## Related Issue
  NVIDIA-NeMo/Internal-Planning#145

Signed-off-by: Sebastian Rogawski <srogawski@nvidia.com>
fsiino-nvidia pushed a commit that referenced this pull request Feb 21, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
fsiino-nvidia pushed a commit that referenced this pull request Feb 21, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
fsiino-nvidia pushed a commit that referenced this pull request Feb 21, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
abubakaria56 pushed a commit to abubakaria56/Gym that referenced this pull request Mar 2, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
abubakaria56 pushed a commit to abubakaria56/Gym that referenced this pull request Mar 2, 2026
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

resources-server Resources servers (math, code, etc.)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants