Add nemo-skills-core subpackage for lightweight installs#1229
Conversation
8fa5c7d to
4e2fad9
Compare
d22246e to
76c2a18
Compare
Signed-off-by: George Armstrong <georgea@nvidia.com>
a2751f3 to
f0eb8d0
Compare
| _EVALUATOR_MAP_PATHS[eval_type] = None | ||
| _resolved_evaluator_map[eval_type] = eval_fn |
There was a problem hiding this comment.
Setting _EVALUATOR_MAP_PATHS[eval_type] = None creates a fragile state. If _resolved_evaluator_map is ever cleared or doesn't contain the eval_type, _get_evaluator_fn will call _resolve(None) and crash.
| _EVALUATOR_MAP_PATHS[eval_type] = None | |
| _resolved_evaluator_map[eval_type] = eval_fn | |
| # Store function directly, bypassing the lazy resolution path | |
| _resolved_evaluator_map[eval_type] = eval_fn |
There was a problem hiding this comment.
Good catch, switched to a "<dynamically-registered>" sentinel to be safe.
There was a problem hiding this comment.
Actually, reverting this back to None. The _resolved_evaluator_map cache is internal and never cleared, so this scenario cannot happen in practice. Per our project guidelines: "Don't add error handling, fallbacks, or validation for scenarios that can't happen." If the cache were somehow corrupted, crashing is the correct signal.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a lightweight core package and requirements, documents installation and the Core/Pipeline dependency boundary, reorganizes optional extras in packaging, adds a CI step for uv, implements lazy evaluator resolution, refactors dataset loading to prefer local modules and delegates cluster handling to a new pipeline dataset module, and adds a runtime guard for pipeline imports. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Core as nemo_skills.dataset.utils
participant Pipeline as nemo_skills.pipeline.dataset
participant Cluster
rect rgba(100, 150, 200, 0.5)
Note over User,Core: Local-only flow (default)
User->>Core: get_dataset_module(dataset, data_dir=None)
Core->>Core: import from nemo_skills.dataset or local path
Core-->>User: return dataset module
end
rect rgba(200, 100, 150, 0.5)
Note over User,Cluster: Cluster flow (deprecated in Core)
User->>Core: get_dataset_module(dataset, cluster_config=...)
Core->>Core: emit DeprecationWarning
Core->>Pipeline: delegate get_dataset_module(...)
Pipeline->>Cluster: fetch / download cluster module (remote)
Cluster-->>Pipeline: module content / init.py
Pipeline->>Core: imported module
Core-->>User: return dataset module
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@nemo_skills/evaluation/evaluator/__init__.py`:
- Around line 113-117: The error message incorrectly labels
`_EVALUATOR_MAP_PATHS.keys()` as "All supported types" when it only contains
function-based evaluators; update the ValueError text in the raise block (the
code that references eval_type) to clearly distinguish class-based vs
function-based types by either listing both maps together (combine
`_EVALUATOR_CLASS_MAP_PATHS.keys()` and `_EVALUATOR_MAP_PATHS.keys()`) or
renaming the second label to "Function-based evaluator types" so users see
accurate descriptions of `_EVALUATOR_CLASS_MAP_PATHS` and
`_EVALUATOR_MAP_PATHS`.
- Around line 106-107: register_evaluator currently stores None into
_EVALUATOR_MAP_PATHS[eval_type], which will cause AttributeError when code later
iterates or calls _resolve expecting a path string; change register_evaluator so
it stores a sentinel string (e.g. "<dynamic>") into
_EVALUATOR_MAP_PATHS[eval_type] instead of None, and ensure any
resolution/display logic in _resolve/_get_evaluator_fn treats that sentinel as a
dynamic entry (or filters it out) so rsplit is only called on real path strings;
update references to _EVALUATOR_MAP_PATHS, register_evaluator,
_resolved_evaluator_map, _get_evaluator_fn, and _resolve accordingly.
- Line 137: Remove the leftover debug print statement print(f"evaluator:
{evaluator}") from the module (it should not be in production code); either
delete that line or replace it with an appropriate logger.debug call using the
module logger (e.g., logger.debug("evaluator: %s", evaluator)) so diagnostics
use the configured logging system and not stdout—locate the print by searching
for the exact string and update in the __init__ module where the evaluator
variable is in scope.
- Around line 93-94: Remove the debug print statement print(f"evaluator:
{evaluator}") from the module so it no longer emits debug output; locate the
temporary print in the evaluator initialization block near where EVALUATOR_MAP
and EVALUATOR_CLASS_MAP are set and delete that single line, leaving the maps
and the helper functions (_get_evaluator_fn, _get_evaluator_cls, evaluate,
get_evaluator_class) intact so iteration via EVALUATOR_MAP/EVALUATOR_CLASS_MAP
still works per the documented design.
In `@nemo_skills/pipeline/dataset.py`:
- Around line 60-62: The check uses cluster_config.get("executor") which masks a
missing-key error; change it to access the key directly
(cluster_config["executor"]) so missing executor raises immediately, and keep
the logic that if cluster_config is None or cluster_config["executor"] in (None,
"none") then return _get_local_dataset_module(dataset, data_dir); update any
related code paths that assume executor exists (e.g., the code around
get_unmounted_path in nemo_skills/pipeline/utils/mounts.py) to rely on the same
direct-access semantics to fail fast on misconfiguration.
🧹 Nitpick comments (6)
CONTRIBUTING.md (1)
56-59: Fenced code block missing language specifier.Minor nit from markdownlint — adding a language (e.g.,
text) would silence MD040.Proposed fix
-``` +```text Pipeline can import from Core. Core CANNOT import from Pipeline. -``` +```core/requirements.txt (1)
17-27: Section label "math evaluation" is misleading — several packages below it aren't math-specific.
mcp,numpy,openai,requests,rich,tqdm, andtransformersare general-purpose dependencies, not math evaluation specific. Consider either reorganizing sections or using a broader label like# --- general / shared ---.nemo_skills/pipeline/dataset.py (3)
39-51: Imported module outlives its backing file.
import_from_pathis called inside aTemporaryDirectorycontext manager. Once thewithblock exits, the downloadedinit.pyis deleted, but the module object (and its__file__attribute) still references the now-removed path. This works at runtime because CPython caches the compiled bytecode in memory, but it can cause confusing errors if any downstream code inspectsmodule.__file__or attempts a reload.Consider moving the temp directory lifecycle to the caller or keeping it alive longer if module introspection is needed.
44-50: Chain the re-raised exception for clearer tracebacks.Per the static analysis hint (B904),
raise ... from errpreserves the original traceback context.Proposed fix
try: cluster_download_file(cluster_config, cluster_dataset_path, tmp_path) - except FileNotFoundError: - raise RuntimeError( + except FileNotFoundError as err: + raise RuntimeError( f"Init file {mounted_path} not found on the cluster. " f"Please check the dataset name you're using. Did you forget to run prepare data commands?" - ) + ) from err
109-113: Chain the re-raisedRuntimeErrorfor clearer tracebacks.Same B904 pattern — add
from errto preserve the originalModuleNotFoundErrorcontext.Proposed fix
- except ModuleNotFoundError: - raise RuntimeError( + except ModuleNotFoundError as err: + raise RuntimeError( f"Dataset {dataset} not found in any of the searched locations: " f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}" - ) + ) from errnemo_skills/dataset/utils.py (1)
116-135: Chain re-raised exceptions for clearer tracebacks.Same pattern as flagged in
pipeline/dataset.py— theraise RuntimeError(...)statements at Lines 120 and 126 insideexceptclauses should usefromto preserve the original exception context.Proposed fix
except ModuleNotFoundError: dataset = dataset.replace(".", "/") extra_datasets = extra_datasets or os.environ.get("NEMO_SKILLS_EXTRA_DATASETS") if extra_datasets is None: - raise RuntimeError(f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}") + raise RuntimeError( + f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}" + ) from None if extra_datasets_type == ExtraDatasetType.local or extra_datasets_type is None: with add_to_path(extra_datasets): try: dataset_module = importlib.import_module(dataset) - except ModuleNotFoundError: - raise RuntimeError( + except ModuleNotFoundError as err: + raise RuntimeError( f"Dataset {dataset} not found in any of the searched locations: " f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}" - ) + ) from err
| _EVALUATOR_MAP_PATHS[eval_type] = None | ||
| _resolved_evaluator_map[eval_type] = eval_fn |
There was a problem hiding this comment.
Setting _EVALUATOR_MAP_PATHS[eval_type] = None is fragile. If _resolved_evaluator_map gets cleared or doesn't contain eval_type, _get_evaluator_fn will call _resolve(None) and crash with ValueError: not enough values to unpack.
The current implementation works only because the function is immediately added to _resolved_evaluator_map, but this implicit dependency is error-prone. Consider either:
- Not setting
_EVALUATOR_MAP_PATHS[eval_type]at all (just use_resolved_evaluator_map) - Setting it to a sentinel string that provides a better error message if accidentally resolved
There was a problem hiding this comment.
Good catch, switched to a "<dynamically-registered>" sentinel to be safe.
There was a problem hiding this comment.
Actually, reverting this back to None. The _resolved_evaluator_map cache is internal and never cleared, so this scenario cannot happen in practice. Per our project guidelines: "Don't add error handling, fallbacks, or validation for scenarios that can't happen." If the cache were somehow corrupted, crashing is the correct signal.
Additional Comments (1)
|
Signed-off-by: George Armstrong <georgea@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@nemo_skills/pipeline/dataset.py`:
- Around line 54-62: The function _get_default_dataset_module currently drops
extra_datasets and extra_datasets_type when cluster_config is None by delegating
to _get_local_dataset_module(dataset, data_dir); update
_get_default_dataset_module to forward extra_datasets and extra_datasets_type
into the local call (e.g., call _get_local_dataset_module(dataset, data_dir,
extra_datasets=..., extra_datasets_type=...)) so get_dataset_module's outer
ModuleNotFoundError path remains reachable and callers' extra_datasets are
honored; ensure the function signature for _get_default_dataset_module accepts
the extra_* params and that _get_local_dataset_module is invoked with those
parameters.
🧹 Nitpick comments (3)
nemo_skills/pipeline/dataset.py (2)
39-51: Chain exception context withfromwhen re-raising.Static analysis (B904) correctly flags that re-raising inside
exceptwithoutfromloses the original traceback context. This applies here and at lines 109-113.Proposed fix
- except FileNotFoundError: - raise RuntimeError( + except FileNotFoundError as exc: + raise RuntimeError( f"Init file {mounted_path} not found on the cluster. " f"Please check the dataset name you're using. Did you forget to run prepare data commands?" - ) + ) from exc
91-113: Chain the innerRuntimeErrorre-raise withfrom.Same B904 issue as above — preserve context for debugging.
Proposed fix
- except ModuleNotFoundError: - raise RuntimeError( + except ModuleNotFoundError as exc: + raise RuntimeError( f"Dataset {dataset} not found in any of the searched locations: " f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}" - ) + ) from excnemo_skills/evaluation/evaluator/__init__.py (1)
93-94: Semantic change inEVALUATOR_MAP/EVALUATOR_CLASS_MAPvalues.These aliases now expose dotted-path strings instead of resolved callables/classes. Any downstream code (external plugins, scripts) that iterates
.values()expecting callables will break silently. The comment on lines 90-92 documents the intent, and the repo itself only uses these for key enumeration, so this is safe internally. Just worth noting for external consumers if this is a public API.
|
Added |
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
core/requirements.txt
Outdated
| # No cluster orchestration deps (nemo_run, typer, etc.) | ||
|
|
||
|
|
||
| # --- code evaluation --- |
There was a problem hiding this comment.
are you sure this covers all benchmarks? Generally, we should move to keeping this reqs really simple and move most benchmark-specific requirements to install at runtime, but for now probably we might need some more packages here? E.g. datasets is almost certainly needed and then other benchmark specific things, like sacrebleu, etc.
There was a problem hiding this comment.
I revisited the separation. This should contain all the reqs not needed for cluster orchestration now.
requirements/pipeline.txt
Outdated
| nemo-evaluator-launcher<0.1.47 | ||
| nemo_run @ git+https://github.com/NVIDIA-NeMo/Run | ||
| typer >= 0.13 | ||
| wandb |
There was a problem hiding this comment.
this is actually a core dependency, it's being used in summarize-results, which is required for core functionality. Currently summarize-results is kind of in a weird half-pipeline state, but we should fix it to cleanly separate it into pipeline and non-pipeline components via #779 (comment)
CONTRIBUTING.md
Outdated
| | CLI commands, cluster orchestration, experiment tracking | `requirements/pipeline.txt` | | ||
| | Everything else (dataset-specific deps, benchmark-specific packages) | `requirements/main.txt` only | | ||
|
|
||
| Dependencies in `core/requirements.txt` should be things that a typical `GenerationTask` run with PythonTool would need. Dataset-specific or benchmark-specific packages (e.g., `faiss-cpu`, `sacrebleu`, `func-timeout`) go only in `requirements/main.txt`. |
There was a problem hiding this comment.
this part I don't fully understand - I think benchmark-specific packages should go to core for now as otherwise the code will fail when those benchmarks are used e.g. in evaluator. Eventually we should migrate to jit install, but it's not done yet, so I'd put those into core
There was a problem hiding this comment.
Fair. My original scope was pretty PythonTool specific, but I think we can come up with something that makes a little more sense in terms of aligning the core code with core dependencies.
There was a problem hiding this comment.
yeah it's now in core and there is a clearer description of what's in pipeline vs core
CONTRIBUTING.md
Outdated
|
|
||
| Dependencies in `core/requirements.txt` should be things that a typical `GenerationTask` run with PythonTool would need. Dataset-specific or benchmark-specific packages (e.g., `faiss-cpu`, `sacrebleu`, `func-timeout`) go only in `requirements/main.txt`. | ||
|
|
||
| All core and pipeline deps must also appear in `requirements/main.txt` (the monolithic file used for default installs). |
There was a problem hiding this comment.
can we not link multiple requirements listed in pyproject.toml? We duplicate?
There was a problem hiding this comment.
we should be able to do that. It should be implemented that way now--I updated this at one point so it links against the file rather than duplicating. Will fix.
CONTRIBUTING.md
Outdated
| **When writing new core code:** | ||
|
|
||
| - If you need something from `nemo_skills.pipeline`, your code probably belongs in pipeline, not core. Move it. | ||
| - If you have a function that works locally but *also* needs a cluster variant, put the local version in core and a cluster-aware wrapper in `nemo_skills/pipeline/` (see `pipeline/dataset.py` for the pattern). |
There was a problem hiding this comment.
I actually think that if we have a case like this, it means we need to redesign something. Ideally separation should be clean, and we shouldn't need to duplicate functionality. E.g. the dataset module part is a bit messy and there is probably a way to do it better, such that there is a pipeline level that only manages pulling from cluster and then there is a local level that always assumes things are present locally and is being called inside pipeline directly
There was a problem hiding this comment.
Makes sense, updated the docs here to reflect that and made the implementation more consistent with the guidance here/there.
Per review feedback: all benchmark-specific packages should go to core for now since JIT install is not yet implemented. Previously only PythonTool-specific deps were in core while benchmark deps like datasets, sacrebleu, faiss-cpu, etc. were only in main.txt. This led to an inconsistent boundary where math grader deps were in core but BFCL deps were not, despite both being benchmark-specific. Addresses review comments #1, #4, #6 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
pyproject.toml now composes default dependencies from core/requirements.txt + requirements/pipeline.txt instead of maintaining a separate monolithic main.txt that duplicated both. This ensures a single source of truth for each dependency: it lives in exactly one requirements file, and pyproject.toml references both. Addresses review comment #5 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
Creates the test file referenced in docs/basics/installation.md that verifies the core/pipeline dependency boundary. Tests import each core module in a subprocess where nemo_run and nemo_skills.pipeline are blocked, ensuring core has no top-level pipeline dependencies. Addresses review comment #2 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
Rewrite the dependency boundary section to: - Define core as "everything needed for inference + evaluation" (not just PythonTool-specific deps) - Remove references to deleted requirements/main.txt - Clarify that all benchmark evaluator deps go to core until JIT install is implemented - Improve dataset module separation guidance (pipeline = cluster I/O only, core = all local logic) - Add note about summarize-results refactor (issue #779) Addresses review comments #3, #4, #6, #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
Refactor pipeline/dataset.py so it ONLY handles cluster I/O (SSH downloads, mount path resolution) and delegates all local import/resolution logic to core's dataset/utils.py. Key changes: - Extract cluster-specific loading into _get_cluster_dataset_module() - For local extra_datasets fallback, delegate to core instead of reimplementing add_to_path + import_module - For non-cluster cases, delegate entirely to core from the start - Remove duplicated local import logic that was parallel to core Addresses review comment #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
The section labels (agent runtime, math evaluation, code evaluation, benchmark evaluator deps) were misleading since many deps span multiple categories. Keep it as a flat alphabetical list. Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Pipeline no longer calls importlib.import_module or add_to_path directly — all import/module-resolution logic lives in core. Pipeline's only responsibilities are now: - Local executor: unmount paths via get_unmounted_path, then delegate to core, then map returned paths back to mounted form - Remote executor: SSH download via cluster_download_file for custom data_dir or cluster-type extra_datasets Addresses review comment #7 on PR #1229. Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Move detailed core/pipeline boundary docs from CONTRIBUTING.md and installation.md into docs/core-pipeline-boundary.md. Add symlink at requirements/core.txt pointing to core/requirements.txt for discoverability. Signed-off-by: George Armstrong <georgea@nvidia.com>
…rable-pipeline Signed-off-by: George Armstrong <georgea@nvidia.com> # Conflicts: # nemo_skills/dataset/utils.py # nemo_skills/evaluation/evaluator/__init__.py
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Directories with hyphens (e.g., answer-judge, math-500, llama3-instruct) cannot be imported via `import` statement. Use importlib.import_module() which handles arbitrary module names correctly. Signed-off-by: George Armstrong <georgea@nvidia.com>
Conflicts resolved: - evaluator/__init__.py: kept lazy loading, added critpt + dsbench entries - requirements/main.txt: deleted (using core/ + pipeline/ split), added openpyxl, pandas, pyxlsb to core/requirements.txt Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>
Adds a lightweight
nemo-skills-coresubpackage (core/subdirectory)with only inference, evaluation, and tool calling deps. Default
pip install nemo-skillsis unchanged (installs everything).Changes
core/pyproject.toml+core/requirements.txt: New subpackage installable viapip install ./coreor git URL with#subdirectory=core. Single source of truth for core deps, referenced by both core and rootpyproject.toml.nemo_skills/pipeline/__init__.py: Import guard usingimportlib.metadata-- importing pipeline modules with only core installed raises a clearImportErrorinstead of a crypticModuleNotFoundError.nemo_skills/_cli_stub.py: StubnsCLI entry point for core-only installs that prints a helpful message.nemo_skills/evaluation/evaluator/__init__.py: Lazy evaluator registry using string paths instead of eager imports, so core-only installs don't fail on benchmark-specific deps (faiss,func_timeout, etc.).nemo_skills/dataset/utils.py+nemo_skills/pipeline/dataset.py: Moved cluster-dependent dataset logic into pipeline module to keep core free ofnemo_runimports.requirements/pipeline.txt: New requirements file for pipeline-only deps (nemo_run,typer, etc.)..github/workflows/tests.yml: Installuvin CI for use with testing installation.Summary by CodeRabbit
New Features
Documentation
Chores