Add aime26 by bzantium · Pull Request #1256 · NVIDIA-NeMo/Skills

bzantium · 2026-02-19T00:27:06Z

resolved: #1255

Summary by CodeRabbit

New Features
- Added AIME 2026 dataset support with preconfigured math evaluation defaults (overridable at runtime).
- Added dataset preparation tooling to export AIME 2026 into JSONL for evaluation, including a command-line entry point and progress feedback.

coderabbitai · 2026-02-19T00:28:29Z

📝 Walkthrough

Walkthrough

Adds an AIME 2026 dataset package with module-level evaluation/generation constants and a CLI-able prepare script that loads the MathArena/aime_2026 dataset and exports split-specific JSONL manifests.

Changes

Cohort / File(s)	Summary
Package init `nemo_skills/dataset/aime26/__init__.py`	Adds module-level constants: `DATASET_GROUP="math"`, `METRICS_TYPE="math"`, `GENERATION_ARGS="++prompt_config=generic/math ++eval_type=math"`.
Data preparation script `nemo_skills/dataset/aime26/prepare.py`	Adds `format_entry()`, `write_data_to_file()`, and `main()`; loads `MathArena/aime_2026` (specified split), formats entries (`id` prefixed with `aime26-`, `problem`, `expected_answer`) and writes UTF-8 JSONL with progress reporting.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant Prepare as prepare.py
    participant HF as "HuggingFace Dataset\n(MathArena/aime_2026)"
    participant FS as "Local Filesystem"

    User->>CLI: run prepare.py --split <split>
    CLI->>Prepare: invoke main(args)
    Prepare->>HF: load_dataset(..., split=split)
    HF-->>Prepare: dataset iterator
    loop for each entry
        Prepare->>Prepare: format_entry(entry)
        Prepare->>FS: append JSONL line
        FS-->>Prepare: write ACK
    end
    Prepare->>User: exit (file written)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Add apex-shortlist dataset #1080: Adds the same dataset-level constants and a prepare.py that exports MathArena datasets to JSONL.
add musan dataset #1139: Adds a dataset package with identical module-level constants and a prepare.py entrypoint following the same export pattern.
first version of 3 imo-bench datasets #1047: Introduces dataset package initializers with the same configuration constants and analogous preparation scripts.

Suggested labels

run GPU tests

Suggested reviewers

gwarmstrong
Kipok

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add aime26' clearly and concisely describes the main change: adding support for the AIME 2026 benchmark to the codebase.
Linked Issues check	✅ Passed	The PR successfully implements the objective from issue `#1255` by adding the AIME 2026 benchmark with dataset loading, formatting, and preparation utilities.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with adding AIME 2026 benchmark support; no unrelated modifications are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

nemo_skills/dataset/aime26/prepare.py (1)

23-28: Missing type hints on functions.

As per coding guidelines, "Use type hints for simple types (dict, list, int, float, existing classes) in Python code."

Suggested type hints

-def format_entry(entry):
-    return {
+def format_entry(entry: dict) -> dict:
+    return {
         "id": f"aime26-{entry['problem_idx']}",
         "problem": entry["problem"],
         "expected_answer": str(entry["answer"]),
     }

-def write_data_to_file(output_file, data):
+def write_data_to_file(output_file: Path, data: list) -> None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/dataset/aime26/prepare.py` around lines 23 - 28, The function
format_entry lacks type hints; update its signature to accept a typed parameter
(e.g., entry: Dict[str, Any] or Mapping[str, Any]) and declare the return type
(e.g., Dict[str, str]) and add the necessary typing imports (from typing import
Any, Dict or Mapping) so format_entry(entry: Dict[str, Any]) -> Dict[str, str]
is explicit; ensure the keys "id", "problem", and "expected_answer" are
reflected in the return type annotation.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemo_skills/dataset/aime26/__init__.py`:
- Line 1: Update the copyright year in the module header of
nemo_skills/dataset/aime26/__init__.py from 2024 to 2026 to match prepare.py;
open the file, locate the top-of-file copyright comment (the string "Copyright
(c) 2024, NVIDIA CORPORATION.  All rights reserved."), and change the year to
2026 so both files use the same copyright year.

In `@nemo_skills/dataset/aime26/prepare.py`:
- Around line 38-43: In main(), the call load_dataset("MathArena/aime_2026",
split="train") is hardcoded to "train" while the output file is named using
args.split (output_file = data_dir / f"{args.split}.jsonl"); add a brief
clarifying comment above the load_dataset line (inside the main function)
explaining that the HuggingFace dataset only exposes a single "train" split but
the benchmark uses that data as the target split (e.g., test/validation), so we
intentionally load "train" regardless of args.split to produce the appropriately
named output file.
- Line 48: The parser.add_argument call for "--split" uses choices=("test")
which is a string, causing argparse to check membership against characters;
change it to a proper iterable like a single-element tuple or list (e.g.,
choices=("test",) or choices=["test"]) in the parser.add_argument call to ensure
only "test" is accepted; update the choices parameter in the
parser.add_argument("--split", ...) invocation and mirror the style used by
other dataset prepare.py files.

---

Nitpick comments:
In `@nemo_skills/dataset/aime26/prepare.py`:
- Around line 23-28: The function format_entry lacks type hints; update its
signature to accept a typed parameter (e.g., entry: Dict[str, Any] or
Mapping[str, Any]) and declare the return type (e.g., Dict[str, str]) and add
the necessary typing imports (from typing import Any, Dict or Mapping) so
format_entry(entry: Dict[str, Any]) -> Dict[str, str] is explicit; ensure the
keys "id", "problem", and "expected_answer" are reflected in the return type
annotation.

coderabbitai · 2026-02-19T00:32:35Z

nemo_skills/dataset/aime26/__init__.py

@@ -0,0 +1,18 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.


⚠️ Potential issue | 🟡 Minor

Copyright year mismatch with prepare.py.

This file says Copyright (c) 2024 while prepare.py says Copyright (c) 2026. Since this is a new file for a 2026 benchmark, the year should likely be 2026 to be consistent.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/dataset/aime26/__init__.py` at line 1, Update the copyright year in the module header of nemo_skills/dataset/aime26/__init__.py from 2024 to 2026 to match prepare.py; open the file, locate the top-of-file copyright comment (the string "Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved."), and change the year to 2026 so both files use the same copyright year.

coderabbitai · 2026-02-19T00:32:35Z

nemo_skills/dataset/aime26/prepare.py

+def main(args):
+    dataset = load_dataset("MathArena/aime_2026", split="train")
+    data_dir = Path(__file__).absolute().parent
+    data_dir.mkdir(exist_ok=True)
+    output_file = data_dir / f"{args.split}.jsonl"
+    write_data_to_file(output_file, dataset)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if other prepare.py scripts in this repo also load a different split than the output name rg -n 'load_dataset.*split=' --glob 'nemo_skills/dataset/*/prepare.py' -A2 -B2

Repository: NVIDIA-NeMo/Skills

Length of output: 10510

Add a clarifying comment explaining the hardcoded split.

The dataset is loaded with split="train" (line 39) regardless of args.split, but the output file is named {args.split}.jsonl. This pattern—where the HuggingFace split differs from the output filename—appears elsewhere in the codebase (e.g., simpleqa), suggesting it's intentional because the HuggingFace dataset only exposes a train split but the benchmark treats it as a test set. A brief comment explaining this would prevent future confusion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@nemo_skills/dataset/aime26/prepare.py` around lines 38 - 43, In main(), the call load_dataset("MathArena/aime_2026", split="train") is hardcoded to "train" while the output file is named using args.split (output_file = data_dir / f"{args.split}.jsonl"); add a brief clarifying comment above the load_dataset line (inside the main function) explaining that the HuggingFace dataset only exposes a single "train" split but the benchmark uses that data as the target split (e.g., test/validation), so we intentionally load "train" regardless of args.split to produce the appropriately named output file.

nemo_skills/dataset/aime26/prepare.py

Signed-off-by: bzantium <ryumin93@gmail.com>

coderabbitai

🧹 Nitpick comments (1)

nemo_skills/dataset/aime26/prepare.py (1)
31-35: Missing type hints; consider pre-computing entries before opening the output file.

Two points:

Type hints — same guideline violation as format_entry:
♻️ Proposed fix for type hints
-def write_data_to_file(output_file, data):
+def write_data_to_file(output_file: Path, data) -> None:
Computation / I/O ordering — format_entry is called inside the open() block. A KeyError on a malformed entry mid-stream would leave a partially-written JSONL file on disk with no indication. Pre-materializing the formatted list before opening the file costs negligible memory for an AIME dataset (30 problems) and eliminates partial-output risk:
♻️ Proposed fix for computation ordering
 def write_data_to_file(output_file: Path, data) -> None:
+    formatted = [format_entry(entry) for entry in tqdm(data, desc="Processing")]
     with open(output_file, "wt", encoding="utf-8") as fout:
-        for entry in tqdm(data, desc=f"Writing {output_file.name}"):
-            json.dump(format_entry(entry), fout, ensure_ascii=False)
+        for entry in formatted:
+            json.dump(entry, fout, ensure_ascii=False)
             fout.write("\n")
As per coding guidelines, "Use type hints for simple types" and "perform all computations before re-opening files for writing to avoid accidental data loss if code fails during execution."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemo_skills/dataset/aime26/prepare.py` around lines 31 - 35, Add type hints
to write_data_to_file (e.g., def write_data_to_file(output_file: Path, data:
Iterable[dict]) -> None) and precompute formatted entries before opening the
file to avoid partial writes: build a list (e.g., formatted =
[format_entry(entry) for entry in data]) then open the output_file and iterate
over formatted with tqdm, json.dump each item and write a newline; reference the
existing write_data_to_file and format_entry symbols when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@nemo_skills/dataset/aime26/prepare.py`:
- Around line 38-43: The call in main that uses
load_dataset("MathArena/aime_2026", split="train") is hardcoded and should be
annotated so future readers understand this is intentional; update the main
function (around load_dataset, dataset variable) to add a concise clarifying
comment noting that the HF dataset only exposes a "train" split and therefore
args.split is not used for loading (but still used for output_file), so the
hardcoded split should remain; ensure the comment references args.split and
write_data_to_file to make the intent obvious.

---

Nitpick comments:
In `@nemo_skills/dataset/aime26/prepare.py`:
- Around line 31-35: Add type hints to write_data_to_file (e.g., def
write_data_to_file(output_file: Path, data: Iterable[dict]) -> None) and
precompute formatted entries before opening the file to avoid partial writes:
build a list (e.g., formatted = [format_entry(entry) for entry in data]) then
open the output_file and iterate over formatted with tqdm, json.dump each item
and write a newline; reference the existing write_data_to_file and format_entry
symbols when making the change.

bzantium · 2026-02-19T02:26:08Z

@Kipok I fixed the bug above and attached an evaluation.

Model Evaluation: Qwen3-30B-A3B-Thinking-2507

The following table summarizes the performance scores across different AIME (American Invitational Mathematics Examination) datasets.

Task	Score
aime24	91.00
aime25	82.33
aime26	87.67

Click the arrow below to expand the full technical breakdown, including pass@k and majority@n statistics.

View JSON Execution Data

{
  "aime26": {
    "pass@1": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.66666666666667,
      "no_answer": 0.0
    },
    "pass@2": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 91.25925925925927,
      "no_answer": 0.0
    },
    "pass@1[avg-of-2]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 91.66666666666667,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 17334.65,
        "std_dev_across_runs": 215.1725935150659,
        "avg_sample_std_dev": 2548.012145553645,
        "std_err_across_runs": 152.14999999999964
      },
      "symbolic_correct_statistics": {
        "avg": 0.9166666666666666,
        "std_dev_across_runs": 0.02357022603955158,
        "avg_sample_std_dev": 0.023570226039551587,
        "std_err_across_runs": 0.016666666666666663
      }
    },
    "pass@3": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 92.25,
      "no_answer": 0.0
    },
    "pass@1[avg-of-3]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.77777777777777,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 17662.333333333332,
        "std_dev_across_runs": 587.6042233794223,
        "avg_sample_std_dev": 3435.5203326507794,
        "std_err_across_runs": 339.25345654507043
      },
      "symbolic_correct_statistics": {
        "avg": 0.8777777777777778,
        "std_dev_across_runs": 0.06938886664887108,
        "avg_sample_std_dev": 0.07698003589195011,
        "std_err_across_runs": 0.04006168083848876
      }
    },
    "pass@4": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 92.76190476190476,
      "no_answer": 0.0
    },
    "pass@1[avg-of-4]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 86.66666666666667,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18331.816666666666,
        "std_dev_across_runs": 1422.3282145833991,
        "avg_sample_std_dev": 4074.4175305478043,
        "std_err_across_runs": 711.1641072916996
      },
      "symbolic_correct_statistics": {
        "avg": 0.8666666666666667,
        "std_dev_across_runs": 0.06085806194501844,
        "avg_sample_std_dev": 0.0859116756396542,
        "std_err_across_runs": 0.03042903097250922
      }
    },
    "pass@5": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.05555555555557,
      "no_answer": 0.0
    },
    "pass@1[avg-of-5]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 86.66666666666667,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18328.713333333333,
        "std_dev_across_runs": 1231.791912576507,
        "avg_sample_std_dev": 4199.473728063009,
        "std_err_across_runs": 550.8740901311095
      },
      "symbolic_correct_statistics": {
        "avg": 0.8666666666666667,
        "std_dev_across_runs": 0.05270462766947298,
        "avg_sample_std_dev": 0.08123619671700687,
        "std_err_across_runs": 0.02357022603955158
      }
    },
    "pass@6": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.22222222222221,
      "no_answer": 0.0
    },
    "pass@1[avg-of-6]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 86.66666666666669,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18596.200000000004,
        "std_dev_across_runs": 1281.8516894624663,
        "avg_sample_std_dev": 4426.99811124109,
        "std_err_across_runs": 523.3137608512666
      },
      "symbolic_correct_statistics": {
        "avg": 0.8666666666666668,
        "std_dev_across_runs": 0.04714045207910316,
        "avg_sample_std_dev": 0.08885962402813656,
        "std_err_across_runs": 0.019245008972987525
      }
    },
    "pass@7": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.30555555555556,
      "no_answer": 0.0
    },
    "pass@1[avg-of-7]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.14285714285714,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18356.109523809522,
        "std_dev_across_runs": 1331.4617977182336,
        "avg_sample_std_dev": 4374.646122160459,
        "std_err_across_runs": 503.2452567064904
      },
      "symbolic_correct_statistics": {
        "avg": 0.8714285714285714,
        "std_dev_across_runs": 0.044839513942303275,
        "avg_sample_std_dev": 0.08814386586003546,
        "std_err_across_runs": 0.016947743257192553
      }
    },
    "pass@8": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "pass@1[avg-of-8]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.5,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18257.5375,
        "std_dev_across_runs": 1263.8300049487398,
        "avg_sample_std_dev": 4177.041148241753,
        "std_err_across_runs": 446.83138338314086
      },
      "symbolic_correct_statistics": {
        "avg": 0.875,
        "std_dev_across_runs": 0.04272466296411038,
        "avg_sample_std_dev": 0.08346764803530463,
        "std_err_across_runs": 0.015105449452916093
      }
    },
    "pass@9": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "pass@1[avg-of-9]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.03703703703702,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18251.462962962964,
        "std_dev_across_runs": 1182.345167975312,
        "avg_sample_std_dev": 4085.528060506555,
        "std_err_across_runs": 394.1150559917707
      },
      "symbolic_correct_statistics": {
        "avg": 0.8703703703703703,
        "std_dev_across_runs": 0.042309850588132816,
        "avg_sample_std_dev": 0.0914098904572087,
        "std_err_across_runs": 0.014103283529377605
      }
    },
    "pass@10": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "pass@1[avg-of-10]": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 87.66666666666667,
      "no_answer": 0.0,
      "reasoning_tokens_statistics": {
        "avg": 0.0,
        "std_dev_across_runs": 0.0,
        "avg_sample_std_dev": 0.0,
        "std_err_across_runs": 0.0
      },
      "answer_tokens_statistics": {
        "avg": 18102.21,
        "std_dev_across_runs": 1210.5279378880143,
        "avg_sample_std_dev": 4033.7664918246287,
        "std_err_across_runs": 382.8025454992963
      },
      "symbolic_correct_statistics": {
        "avg": 0.8766666666666667,
        "std_dev_across_runs": 0.044583116994975275,
        "avg_sample_std_dev": 0.08953305854857022,
        "std_err_across_runs": 0.01409841948938835
      }
    },
    "majority@2": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 91.66666666666667,
      "no_answer": 0.0
    },
    "majority@3": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 91.1111111111111,
      "no_answer": 0.0
    },
    "majority@4": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 90.83333333333333,
      "no_answer": 0.0
    },
    "majority@5": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 90.66666666666667,
      "no_answer": 0.0
    },
    "majority@6": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 90.55555555555557,
      "no_answer": 0.0
    },
    "majority@7": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "majority@8": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "majority@9": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    },
    "majority@10": {
      "num_entries": 30,
      "avg_tokens": 18102,
      "gen_seconds": 6223,
      "symbolic_correct": 93.33333333333333,
      "no_answer": 0.0
    }
  }
}

Kipok

thanks!

Signed-off-by: bzantium <ryumin93@gmail.com>

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com>

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: bzantium <ryumin93@gmail.com>

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>

bzantium force-pushed the feature/#1255 branch from bd0fab5 to 63188c9 Compare February 19, 2026 00:28

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Add aime26

ef883de

Signed-off-by: bzantium <ryumin93@gmail.com>

bzantium force-pushed the feature/#1255 branch from 63188c9 to ef883de Compare February 19, 2026 02:20

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

Kipok approved these changes Feb 20, 2026

View reviewed changes

Kipok merged commit 60905a7 into NVIDIA-NeMo:main Feb 20, 2026
1 check passed

talorabr pushed a commit to talorabr/Nemo-Skills that referenced this pull request Feb 22, 2026

Add aime26 (NVIDIA-NeMo#1256)

897eb37

Signed-off-by: bzantium <ryumin93@gmail.com>

bzantium deleted the feature/#1255 branch February 23, 2026 07:30

Kipok pushed a commit that referenced this pull request Feb 24, 2026

Add aime26 (#1256)

598c841

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Add aime26 (#1256)

9245104

Signed-off-by: bzantium <ryumin93@gmail.com>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Add aime26 (#1256)

d832f41

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add aime26#1256

Add aime26#1256
Kipok merged 1 commit intoNVIDIA-NeMo:mainfrom
bzantium:feature/#1255

bzantium commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 19, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

coderabbitai bot Feb 19, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

bzantium commented Feb 19, 2026

Uh oh!

Kipok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,18 @@
		# Copyright (c) 2024, NVIDIA CORPORATION. All rights reserved.

Conversation

bzantium commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

bzantium commented Feb 19, 2026

Model Evaluation: Qwen3-30B-A3B-Thinking-2507

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bzantium commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading