Evaluation on Livecodebench-pro by wasiahmad · Pull Request #1115 · NVIDIA-NeMo/Skills

wasiahmad · 2025-12-16T04:37:35Z

Dataset: https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro
Test cases: https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro-Testcase

Summary by CodeRabbit

Release Notes

New Features
- Added C++ language support for LiveCodeBench Pro code evaluation with configurable sandbox environments.
- Introduced enhanced evaluator configuration with adjustable timeout and process management settings.
Improvements
- Streamlined evaluation pipeline with improved code preprocessing and result handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai · 2025-12-16T04:43:07Z

📝 Walkthrough

Walkthrough

The PR introduces LiveCodeBench-Pro evaluation support by adding configuration constants, dataset preparation utilities for downloading and processing remote test cases, and a new evaluator function with automatic dependency installation and sample processing.

Changes

Cohort / File(s)	Summary
Configuration Updates `nemo_skills/dataset/livecodebench-pro/__init__.py`	Added new constant `EVAL_SPLIT` ("test_25q2"); updated `GENERATION_ARGS` to use cpp_codegen instead of python_codegen.
Dataset Preparation `nemo_skills/dataset/livecodebench-pro/prepare.py`	Added `download_testcases()` function for remote test case downloads; added `process_problem_splits()` for transforming problem data into per-split JSONL files; updated main workflow to execute download and processing pipeline instead of inline dataset handling.
Evaluation Implementation `nemo_skills/evaluation/evaluator/code.py`	Added `LiveCodeBenchProEvaluatorConfig` dataclass with sandbox, language, test paths, timeout, and process configuration; added `eval_livecodebench_pro()` function with automatic livecodebench package installation, sample preprocessing, evaluation execution, and result aggregation; extended preprocess logic to handle closing `</think>` tags.

Sequence Diagram

sequenceDiagram
    participant User
    participant Evaluator as eval_livecodebench_pro()
    participant HF as Hugging Face Hub
    participant LiveCodeBench as livecodebench lib
    participant Sandbox as Local Sandbox

    User->>Evaluator: Call with config (test_dir, language=cpp)
    Evaluator->>Evaluator: Import livecodebench
    alt Package not found
        Evaluator->>HF: Install from Git URL
        HF-->>Evaluator: Installation complete
    end
    
    Evaluator->>Evaluator: Read samples from JSONL
    Evaluator->>Evaluator: Preprocess samples (strip_whitespace=True)
    Evaluator->>Evaluator: Add code_list field per sample
    
    Evaluator->>LiveCodeBench: Call evaluate() with<br/>language, test_file, timeout
    LiveCodeBench->>Sandbox: Execute test cases<br/>(num_processes=12)
    Sandbox-->>LiveCodeBench: Test results
    LiveCodeBench-->>Evaluator: Evaluation results file
    
    Evaluator->>Evaluator: Load results & attach<br/>graded_list per sample
    Evaluator->>Evaluator: Rewrite JSONL with results
    Evaluator-->>User: Return evaluated samples

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Automatic package installation logic in eval_livecodebench_pro(): Review the try-except flow for livecodebench import and Git-based installation to ensure error handling is robust.
Sample preprocessing pipeline: Verify the preprocess behavior changes (whitespace stripping, code_list field addition) don't conflict with existing evaluation logic.
File I/O operations: Check JSONL read/write sequences and the intermediate results file handling to confirm no data loss or corruption paths.

Possibly related PRs

PyPy3 execution support for LiveCodeBench evaluation #614: Directly related; both PRs add/modify LiveCodeBench evaluation code with evaluator config and utility modules.

Suggested reviewers

Kipok

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly relates to the main objective of the PR, which adds evaluation support for the LiveCodeBench-Pro benchmark.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch livecodebench_pro

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

nemo_skills/evaluation/evaluator/code.py (1)
239-246: Consider using str | None type annotations for optional fields.

The test_file and test_dir fields default to None but are typed as str. For consistency with BaseEvaluatorConfig (which uses str | None), consider updating the type hints.
 @nested_dataclass(kw_only=True)
 class LiveCodeBenchProEvaluatorConfig(BaseEvaluatorConfig):
     sandbox: dict = field(default_factory=lambda: {"sandbox_type": "local"})
     language: str = "cpp"  # use either "python" or "cpp"
-    test_file: str = None
-    test_dir: str = None  # path to the unit tests directory
+    test_file: str | None = None
+    test_dir: str | None = None  # path to the unit tests directory
     timeout: int = 6
     num_processes: int = 12

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9ddefa5 and 6f73340.

📒 Files selected for processing (3)

nemo_skills/dataset/livecodebench-pro/__init__.py (1 hunks)
nemo_skills/dataset/livecodebench-pro/prepare.py (1 hunks)
nemo_skills/evaluation/evaluator/code.py (2 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

nemo_skills/evaluation/evaluator/code.py (3)

nemo_skills/utils.py (1)

nested_dataclass (69-102)

nemo_skills/evaluation/evaluator/base.py (1)

BaseEvaluatorConfig (27-31)

nemo_skills/evaluation/evaluator/__init__.py (1)

evaluate (117-131)

🪛 Ruff (0.14.8)

nemo_skills/dataset/livecodebench-pro/prepare.py

69-69: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: pre-commit
GitHub Check: unit-tests

🔇 Additional comments (7)

nemo_skills/dataset/livecodebench-pro/__init__.py (1)

18-19: LGTM!

The configuration constants are consistent with the rest of the PR: EVAL_SPLIT = "test_25q2" matches the 25q2 split defined in prepare.py, and cpp_codegen aligns with the default language: str = "cpp" in LiveCodeBenchProEvaluatorConfig.

nemo_skills/dataset/livecodebench-pro/prepare.py (3)

22-29: LGTM!

The repository constants and split definitions are clear and well-structured. The tuple format (tag, split_name, expected_count) provides useful validation data.

32-42: LGTM!

The error handling appropriately logs the failure and re-raises to propagate the error.

73-83: LGTM!

The main block properly validates the HF_TOKEN environment variable and orchestrates the two-step workflow.

nemo_skills/evaluation/evaluator/code.py (3)

125-134: LGTM!

Good defensive change to handle the edge case where the generation contains a </think> closing tag without the opening tag.

292-293: LGTM!

The pattern of moving the eval results file to prevent recomputation is consistent with other evaluators in this file.

285-290: Verify question_id field exists in the LiveCodeBench-Pro HuggingFace dataset.

The code at line 289 accesses sample["question_id"] to look up evaluation results in the grades dictionary. While the prepare.py script preserves all fields from the source dataset via output_record = dict(row), the presence of question_id in the original HuggingFace repository (QAQAQAQAQ/LiveCodeBench-Pro) should be explicitly confirmed in documentation or code comments to ensure the data pipeline is robust.

nemo_skills/dataset/livecodebench-pro/prepare.py

nemo_skills/evaluation/evaluator/code.py

wasiahmad · 2025-12-18T01:02:24Z

@gwarmstrong this PR is ready to be merged. I have checked it by evaluating Qwen3-30B-A3B-Thinking-2507 and Qwen3-235B-A22B-Thinking-2507 models on LCB-Pro dataset. The results align with our expectation.

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

…ize_robustness generic for more benchmarks, update docstrings. (#1079) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

…ontainers (#1116) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

…#1129) Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dlord <dlord@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dgitman <dgitman@nvidia.com>

wasiahmad added 5 commits December 4, 2025 19:26

data downloading modified

2f01be8

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

making test_25q2 as default split

f23866d

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

updating data prep logic

0b7bd0d

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

updating data prep logic

359e757

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

updating data prep logic

1560196

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

nemo_skills/dataset/livecodebench-pro/prepare.py Show resolved Hide resolved

nemo_skills/evaluation/evaluator/code.py Outdated Show resolved Hide resolved

wasiahmad enabled auto-merge (squash) December 18, 2025 01:02

wasiahmad requested a review from gwarmstrong December 18, 2025 01:03

gwarmstrong and others added 20 commits December 18, 2025 18:26

MAINT update langugage-data dependency (#1076)

dd376d6

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

MAINT: Add audio requirements to vllm image (#1081)

01384aa

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Add apex-shortlist dataset (#1080)

c91e459

Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Introduce regex for small differences of formatting from judge (#1082)

eb99be2

Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Add LCB Prompts, fix regex bug in robust_eval, remove CR, make summar…

a6f475c

…ize_robustness generic for more benchmarks, update docstrings. (#1079) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

MAINT pin nemo-evaluator (#1095)

6cb9b79

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Update issue templates

f96d242

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Delete .github/ISSUE_TEMPLATE directory

087d762

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

enable blank issues (#1096)

fdbefe9

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Fix input_file path handling when executor is "none" (#1089)

c2c38cd

Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

TST for #1089 (#1097)

a915e8d

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Stepheng/prover cleanup (#1078)

a5b3bd7

Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

FEAT Add Tavily Search (#1085)

f796b77

Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

updating code extraction logic (#1086)

f40f3a1

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Sandbox add stem (#1101)

0727665

Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Handle none output in wmtp24++ (#1091)

321edab

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

ENH enable sandbox env overrides in generate (#1107)

d9e6d23

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Search Tool Parameter updates (#1112)

f56614b

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad and others added 19 commits December 18, 2025 18:26

fixing metric issue and missing problem-id issue

d031317

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

adding metric type for lcb-pro

b2b06ac

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

debugging

8982271

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

fixed a minor issue

32ad110

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

SWE-bench: don't pass external environment variables into Apptainer c…

9bb52d2

…ontainers (#1116) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Adding clan PR with AudioBench and Librispeech PC. (#1103)

f99e5cb

Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Schema overrides for tool-calling (#1118)

ad51e99

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

FIX tool call error handling and search tool errors (#1120)

28e7567

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Use run.Script for generate pipeline (#1052)

464561d

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Port ICPC changes to IOI (#1046)

e3aad78

Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

replace raise error with LOG.warning in AA LCR dataset prepare (#1119)

58eb7d9

Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

FIX tavily search results return type (#1123)

5575646

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Revert "Use run.Script for generate pipeline (#1052)" (#1125)

0e64314

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Fix: add serialized_output on bad request (#1127)

f3f9c90

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

update paper link (#1128)

da917f6

Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

update paper link, references to dataset, self-correction differences (…

dca79a6

…#1129) Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

updating documentation

d94e953

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

FIX ioi ignore (#1131)

35dc934

Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

download AA-LCR_extracted-text.zip via hf_hub_download (#1126)

b149479

Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad force-pushed the livecodebench_pro branch from 6a0260d to 2ff5b51 Compare December 19, 2025 02:27

fixing conflicts

18e7043

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad force-pushed the livecodebench_pro branch from 24c9fa3 to 18e7043 Compare December 19, 2025 02:46

gwarmstrong approved these changes Dec 19, 2025

View reviewed changes

wasiahmad merged commit 7205c43 into main Dec 19, 2025
5 checks passed

wasiahmad deleted the livecodebench_pro branch December 19, 2025 03:03

blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026

Evaluation on Livecodebench-pro (NVIDIA-NeMo#1115)

a90cc44

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dlord <dlord@nvidia.com>

hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026

Evaluation on Livecodebench-pro (#1115)

8ddcdf4

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

wasiahmad added a commit that referenced this pull request Feb 4, 2026

Evaluation on Livecodebench-pro (#1115)

c52c04e

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Evaluation on Livecodebench-pro (#1115)

e545e46

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Evaluation on Livecodebench-pro (#1115)

490f6b8

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation on Livecodebench-pro#1115

Evaluation on Livecodebench-pro#1115
wasiahmad merged 54 commits intomainfrom
livecodebench_pro

wasiahmad commented Dec 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 16, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

wasiahmad commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

Conversation

wasiahmad commented Dec 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Dec 16, 2025

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wasiahmad commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

15 participants

wasiahmad commented Dec 16, 2025 •

edited by coderabbitai bot

Loading