LCB generic prompting by wasiahmad · Pull Request #1215 · NVIDIA-NeMo/Skills

wasiahmad · 2026-02-05T22:12:19Z

Summary by CodeRabbit

New Features
- Unified "default reasoning" prompt applied across LiveCodeBench Python/C++ variants.
- Dataset records now include explicit formatting hints and preserved starter code fields.
Refactor
- Prompt templates reorganized to use labeled Question/Format/Answer sections and require step‑by‑step reasoning.
- Starter code is stored as formatted code blocks and question assembly moved to distinct fields.

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

coderabbitai · 2026-02-05T22:14:49Z

📝 Walkthrough

Walkthrough

Consolidates livecodebench prompt templates to a new default_reasoning prompt and refactors dataset preparation to emit structured fields (question_content, formatting_message, starter_code) instead of assembling a composite question string. Several language-specific prompt YAMLs were removed or updated.

Changes

Cohort / File(s)	Summary
Dataset generation args `nemo_skills/dataset/livecodebench/__init__.py`, `nemo_skills/dataset/livecodebench-cpp/__init__.py`, `nemo_skills/dataset/livecodebench-pro/__init__.py`	Switched `GENERATION_ARGS` to use `eval/livecodebench/default_reasoning` (replaced language-specific prompt configs). C++ variant changed from a string to a single-element tuple.
Dataset preparation mapping `nemo_skills/dataset/livecodebench/prepare.py`, `nemo_skills/dataset/livecodebench-cpp/prepare.py`, `nemo_skills/dataset/livecodebench-pro/prepare.py`	Refactored map logic to set `formatting_message` and a fenced `starter_code` field (language-specific fences). Removed construction of composite `question`; retained `question_content` and no longer drop `question_content`/`starter_code` from outputs.
Prompt files removed `nemo_skills/prompt/config/eval/livecodebench/cpp_codegen.yaml`, `.../cpp_codegen_reasoning.yaml`, `.../python_codegen.yaml`, `.../python_codegen_reasoning.yaml`	Deleted language-specific and prior reasoning prompt templates.
Prompt files added/updated `nemo_skills/prompt/config/eval/livecodebench/default.yaml`, `.../default_reasoning.yaml`, `nemo_skills/prompt/config/gpt-oss/livecodebench.yaml`	Added `default_reasoning.yaml` (step-by-step reasoning template using `question_content`, `formatting_message`, `starter_code`) and updated other templates to use the new structured placeholders and explicit answer formatting.
Docs `docs/releases/openreasoning/training.md`	Replaced references from `python_codegen_reasoning` to `default_reasoning` in documentation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Prompting fix to improve LCB score for GPTOSS #1169: Modifies the same gpt-oss livecodebench prompt template with overlapping structural/format changes.
Add LCB Prompts, fix regex bug in robust_eval, remove CR, make summarize_robustness generic for more benchmarks, update docstrings. #1079: Related LiveCodeBench prompt/config updates and placeholder restructuring.

Suggested labels

enhancement

Suggested reviewers

titu1994
Kipok

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'LCB generic prompting' is vague and lacks specificity about what changes are being made to the LiveCodeBench prompting system.	Use a more descriptive title that clarifies the main objective, such as 'Generalize LiveCodeBench prompt templates to support multiple languages' or 'Refactor LiveCodeBench prompts to use unified default_reasoning template'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lcb_prompt_generalization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-05T22:16:40Z

Additional Comments (1)

nemo_skills/dataset/livecodebench/prepare.py
Invalid cast/map order
clean_data() removes public_test_cases/private_test_cases when keep_all_columns=False (they’re in remove_columns), but then calls dataset.cast_column("public_test_cases", ...) / cast_column("private_test_cases", ...) before map(remove_columns=...). This will raise at runtime because those columns don’t exist in the dataset after the map step (and the cast is pointless if the columns are about to be removed anyway). Either keep these columns (remove them from remove_columns) or move the casts into the keep_all_columns branch where the columns are retained.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/releases/openreasoning/training.md (1)
98-108: ⚠️ Potential issue | 🔴 Critical

Key mismatch will cause runtime failure: prompt template requires question_content, formatting_message, and starter_code fields.

The apply_format function passes {'question': question} to prompt.fill() at line 102, but the default_reasoning.yaml template uses {question_content}, {formatting_message}, and {starter_code} as placeholders. Since prompt.fill() uses Python's str.format() method, it will raise a KeyError when trying to fill these missing placeholders.

Update the apply_format function to provide all required fields:
elem['input'] = prompt.fill({'question_content': question, 'formatting_message': '...', 'starter_code': '...'}, format_as_string=True)
Alternatively, use a prompt template that matches the provided input (e.g., one that only requires {question}).

🧹 Nitpick comments (1)

nemo_skills/dataset/livecodebench-pro/prepare.py (1)
72-73: Consider re-raising the exception after logging.

Per coding guidelines, exceptions that aren't expected to be normally raised should cause clear failures rather than being silently handled. The current code prints the error but continues processing, which could result in partial data generation without a clear failure signal.
♻️ Proposed fix to fail clearly on error
         except Exception as e:
             print(f"    Error processing split {split}: {e}")
+            raise
Alternatively, if partial success is intentional, consider tracking failures and returning a non-zero exit code at the end.

greptile-apps

_{2 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T18:49:56Z

Additional Comments (2)

nemo_skills/prompt/config/eval/aai/livecodebench.yaml
Prompt placeholder mismatch

This prompt template uses {question}, but the LiveCodeBench prepare scripts in this PR stopped emitting a question field (they now emit question_content, formatting_message, and starter_code). Prompt.fill() calls str.format(**input_dict) (see nemo_skills/prompt/utils.py), so this will raise KeyError: 'question' at runtime for AAI LiveCodeBench evals unless the template or dataset fields are updated to match.

docs/releases/openreasoning/training.md
Docs example will break

This snippet uses prompt.fill({'question': question}), but the new eval/livecodebench/default_reasoning config expects question_content / formatting_message / starter_code placeholders. As written, users following the docs will hit KeyError during str.format() when {question_content} etc. are missing.

Kipok · 2026-02-06T23:50:59Z

docs/releases/openreasoning/training.md

 dataset = load_dataset("nvidia/Nemotron-Post-Training-Dataset-v1", split="code")

-prompt = get_prompt('eval/livecodebench/python_codegen_reasoning', tokenizer='Qwen/Qwen2.5-32B-Instruct', system_message="")
+prompt = get_prompt('eval/livecodebench/default_reasoning', tokenizer='Qwen/Qwen2.5-32B-Instruct', system_message="")


is this a breaking change?

No, because we reduced prompt configs and set the defaults accordingly.

Kipok · 2026-02-06T23:51:27Z

nemo_skills/dataset/livecodebench-cpp/prepare.py

-            question += f"{PromptConstants.FORMATTING_WITHOUT_STARTER_CODE}\n\n"
-            question += "```cpp\n// YOUR CODE HERE\n```\n\n"
+            data["formatting_message"] = PromptConstants.FORMATTING_WITHOUT_STARTER_CODE
+            data["starter_code"] = "```cpp\n// YOUR CODE HERE\n```\n\n"


do i understand correctly that all lcb jobs will fail unless prepare data is redone?

You are right @Kipok

greptile-apps

_{1 file reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-06T23:55:02Z

Additional Comments (1)

nemo_skills/prompt/config/robustness/code_prompts/ns_python_codegen.yaml
Stale prompt reference

This file still points to the deleted eval/livecodebench/python_codegen.yaml (see header on lines 1-2). As-is, anyone relying on this robustness prompt config will be using an out-of-date template/fields ({question}) that no longer matches the LiveCodeBench dataset outputs in this PR ({question_content}, formatting_message, starter_code). Update this robustness prompt (or remove the misleading header) to reference the new unified LiveCodeBench prompt and expected fields.

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{3 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

nemo_skills/dataset/livecodebench/prepare.py

nemo_skills/dataset/livecodebench-cpp/prepare.py

nemo_skills/dataset/livecodebench-pro/prepare.py

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

greptile-apps

_{No files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

nemo_skills/dataset/livecodebench-cpp/prepare.py (1)

76-80: ⚠️ Potential issue | 🟡 Minor

The keep_all_columns parameter in clean_data is inaccessible from prepare in the C++ variant.

The clean_data function accepts keep_all_columns (line 46), but the prepare function (line 76) does not expose this parameter and never passes it through—it always defaults to False. The Python variant's prepare function correctly accepts and forwards this parameter (see Python variant line 98 and 107), allowing column filtering to be controlled by the caller. The C++ variant is missing this feature.

🧹 Nitpick comments (1)

nemo_skills/dataset/livecodebench/prepare.py (1)

25-28: PromptConstants is duplicated across livecodebench and livecodebench-cpp.

Both prepare.py files define nearly identical PromptConstants classes, differing only in the language name ("python" vs "c++"). Consider extracting a shared base into a common module, parameterized by language, to avoid drift between the two copies.

commit a5da597 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Mar 6 12:13:36 2026 -0800 Revert "Eval kit support (#1239)" (#1294) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit b237e33 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Mar 6 20:25:37 2026 +0400 Eval kit support (#1239) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> commit dc28bbf Author: George Armstrong <georgea@nvidia.com> Date: Thu Mar 5 10:17:44 2026 -0800 Python direct tool calling without MCP (#1286) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 12454dd Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Mar 4 13:06:21 2026 -0800 Allow het servers for nemo-rl jobs (#1223) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 8884a68 Author: Prasoon Varshney <prasoon1995@gmail.com> Date: Wed Mar 4 10:24:02 2026 -0800 Support source_lang param for translation recipe (#1290) Signed-off-by: Prasoon Varshney <prasoonv@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 4618b19 Author: Meriem B. <113170426+ka00ri@users.noreply.github.com> Date: Wed Mar 4 18:59:28 2026 +0100 Add MMLU-Pro 10% optimized subset for checkpoint selection (#1285) Signed-off-by: Meriem Boubdir <mboubdir@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 5ac8609 Author: Talor Abramovich <talor19@gmail.com> Date: Wed Mar 4 02:30:06 2026 +0200 Add SPEED-Bench (within repo) (#1279) Signed-off-by: Talor Abramovich <talora@nvidia.com> Signed-off-by: talora <talora@nvidia.com> Signed-off-by: Talor Abramovich <talor19@gmail.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> commit c31eec5 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 12:18:15 2026 -0800 Fix os.getlogin() crash in ns setup (#1289) Signed-off-by: George Armstrong <georgea@nvidia.com> commit c228e66 Author: George Armstrong <georgea@nvidia.com> Date: Tue Mar 3 11:04:54 2026 -0800 Fix streaming TypeError when delta.content is None (#1267) (#1288) Signed-off-by: George Armstrong <georgea@nvidia.com> commit aa47923 Author: Matvei Novikov <mnovikov@nvidia.com> Date: Mon Mar 2 16:28:41 2026 -0800 Add LibTrace recipe for generating domain-specific reasoning data (#1224) Signed-off-by: jubick1337 <mnovikov@nvidia.com> Signed-off-by: mnovikov <mnovikov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 313cad7 Author: Stephen Ge <stepheng@nvidia.com> Date: Mon Mar 2 18:28:49 2026 -0500 fix: clean parse-failure retries in prover (#1284) Signed-off-by: Stephen Ge <stepheng@nvidia.com> commit 813cfa3 Author: George Armstrong <georgea@nvidia.com> Date: Mon Mar 2 15:10:08 2026 -0800 tst: rollback inference-api to integrate (#1287) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 31735f9 Author: Valentin Mendelev <vmendelev@nvidia.com> Date: Mon Mar 2 23:11:25 2026 +0100 Add backend-agnostic unified inference server with NeMo ASR and TTS backends (#1250) Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> commit d4ef8c0 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Fri Feb 27 23:58:54 2026 +0400 Update promt_config to working with openai format + inline setup (#1210) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit e879cbc Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:41:23 2026 -0800 Update noc tutorial (#1282) Signed-off-by: George Armstrong <georgea@nvidia.com> commit f6e3505 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 27 10:17:33 2026 -0800 Add noc reasoning tutorial (#1278) Signed-off-by: Amparo Canaveras <acanaveras@nvidia.com> Signed-off-by: rajeshwarid179 <rdevaramani@nvidia.com> Signed-off-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Amparo Canaveras <acanaveras@nvidia.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: acanaveras <142839082+acanaveras@users.noreply.github.com> Co-authored-by: rajeshwarid179 <rdevaramani@nvidia.com> commit fc2072a Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 27 10:10:25 2026 -0800 CritPt generation add prompt_format=None (#1280) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit c8abe5d Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 27 09:31:26 2026 -0800 New slurm customization parameters (account, containers) (#1209) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 2b38cce Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 25 17:59:52 2026 -0800 Add nemo-skills-core subpackage for lightweight installs (#1229) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 9fa8e83 Author: Dheeraj Peri <peri.dheeraj@gmail.com> Date: Wed Feb 25 12:56:35 2026 -0800 feat: add custom judge type support for external repo integration (#1274) Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Dheeraj Peri <dperi@nvidia.com> Signed-off-by: suriya <sgunasekar@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Yongqiang Wang <yongqiang.seagull@gmail.com> Co-authored-by: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> commit 8a32b13 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 24 15:24:42 2026 -0800 Exclude numb3rs form test_eval.py (#1275) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6da2219 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Mon Feb 23 18:37:46 2026 +0400 Numb3rs ds addition (#1174) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> commit ad034b5 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Sun Feb 22 11:55:24 2026 -0800 Add DSBench-DA evaluation (#1254) Squash merge of changes during code-review. Signed-off-by: suriya <sgunasekar@nvidia.com> commit 7593ab3 Author: Jiacheng Xu <jcxu@utexas.edu> Date: Fri Feb 20 16:42:01 2026 -0800 Add CritPt benchmark (#1200) Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 58c31b2 Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 20 16:19:22 2026 -0800 Fix no_answer metric overcounting in _compute_pass_at_k (#1245) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 1f1a2e7 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 15:58:40 2026 -0800 Fix incorrect prompt tokens count due to HF api update (#1264) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8ebc6f5 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 20 09:05:33 2026 -0800 Remove deprecated dataset group (#1263) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit ea4177f Author: Yongqiang Wang <yongqiang.seagull@gmail.com> Date: Thu Feb 19 19:57:25 2026 -0500 fix deps (#1258) commit 60905a7 Author: Minho Ryu <ryumin93@gmail.com> Date: Fri Feb 20 09:39:39 2026 +0900 Add aime26 (#1256) Signed-off-by: bzantium <ryumin93@gmail.com> commit b28afc5 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:18:25 2026 -0800 Rename custom -> external benchmarks (#1262) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 6cc9c45 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:10:33 2026 -0800 Add reference to internal benchmarks repo (#1261) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 5202af6 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 16:08:05 2026 -0800 Remove incorrect presence-penalty setting (#1259) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 144c70b Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 19 15:26:33 2026 -0800 Adding an option to store benchmarks in external repo (#1240) Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> commit 10e6e39 Author: George <37293288+Jorjeous@users.noreply.github.com> Date: Thu Feb 19 19:57:21 2026 +0400 update vllm miltimodal for api calls convenience (#1213) Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Co-authored-by: mmkrtchyan <mmkrtchyan@nvidia.com> commit 1ba4219 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Wed Feb 18 03:28:23 2026 +0400 Fix --server_container not being applied to dependent jobs (#1244) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit 9517614 Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Mon Feb 16 11:13:24 2026 -0800 Support mini-swe-agent as agent harness (#1212) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Signed-off-by: mmkrtchyan <mmkrtchyan@nvidia.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Minho Ryu <ryumin93@gmail.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George <37293288+Jorjeous@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Co-authored-by: Mateusz Winiarek <72758259+Froxyy-dev@users.noreply.github.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Meline Mkrtchyan <72409758+melllinia@users.noreply.github.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Co-authored-by: anowaczynski-nvidia <anowaczynski@nvidia.com> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> commit a3d44dc Author: Suriya Gunasekar <sgunasekar@users.noreply.github.com> Date: Fri Feb 13 22:32:15 2026 -0800 Add --installation_command support to prepare_data (#1243) Signed-off-by: suriya <sgunasekar@nvidia.com> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> commit e80d524 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 12 17:26:00 2026 -0800 Fix CI disk space for Docker image builds (#1241) Signed-off-by: George Armstrong <georgea@nvidia.com> commit d22236c Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Wed Feb 11 17:55:00 2026 -0800 Fix answerbench prompt parsing (#1235) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 2401628 Author: George Armstrong <georgea@nvidia.com> Date: Wed Feb 11 14:56:43 2026 -0800 feat: add lockfiles for reproducible sandbox builds (#1233) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5a0a84d Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Wed Feb 11 13:30:03 2026 -0800 removing datasets version restriction for LCB eval (#1230) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit ef0a890 Author: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Date: Wed Feb 11 12:03:16 2026 +0400 Gnalbandyan/add physics (#1214) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> commit bd9d30c Author: Wasi Ahmad <wasiahmad@ucla.edu> Date: Tue Feb 10 15:13:27 2026 -0800 LCB generic prompting (#1215) Signed-off-by: wasiahmad <wasiahmad@ucla.edu> commit 7d6c49a Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Sat Feb 7 08:45:46 2026 -0800 Add support for different variations of nemo-rl (#1220) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit b19ba96 Author: George Armstrong <georgea@nvidia.com> Date: Fri Feb 6 21:40:56 2026 -0800 Add multi-node sandbox support for SLURM clusters (#1218) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 8950bb0 Author: anowaczynski-nvidia <anowaczynski@nvidia.com> Date: Sat Feb 7 01:38:00 2026 +0100 support structured outputs in hle judge for optional AA compatibility (#1186) Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b84f7a2 Author: Igor Gitman <igitman@nvidia.com> Date: Fri Feb 6 14:51:02 2026 -0800 A small update on running tests docs (#1219) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 8e838e1 Author: George Armstrong <georgea@nvidia.com> Date: Thu Feb 5 18:01:35 2026 -0800 feat: add flag to disable sandbox replay (#1217) Signed-off-by: George Armstrong <georgea@nvidia.com> commit 5fd9085 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Feb 5 15:57:01 2026 -0800 Add an option to limit number of tool calls (#1216) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit d820200 Author: Igor Gitman <igitman@nvidia.com> Date: Tue Feb 3 10:43:55 2026 -0800 Add arena-hard v2 (#1205) Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: bzantium <ryumin93@gmail.com> commit a30920e Author: Igor Gitman <igitman@nvidia.com> Date: Mon Feb 2 10:53:55 2026 -0800 Fix mkdocs warnings (#1204) Signed-off-by: Igor Gitman <igitman@nvidia.com> commit 19d7788 Author: Ivan <imoshkov@nvidia.com> Date: Mon Feb 2 23:25:13 2026 +0500 Fix infinite wait in sandbox.wait_for_sandbox (#1206) Signed-off-by: i-vainn <imoshkov@nvidia.com> commit 3e65fbf Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Fri Jan 30 19:38:38 2026 -0800 Improve tts (#1203) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 250c862 Author: Nick Ludwig <nliudvig@nvidia.com> Date: Fri Jan 30 22:12:29 2026 +0400 SWE-bench: fix SWE-agent hanging, adjust expected scores (#1202) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> commit 7ded756 Author: Ivan <imoshkov@nvidia.com> Date: Fri Jan 30 09:57:41 2026 +0500 Add proper token counting to code execution model (#1184) Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> commit b986304 Author: Igor Gitman <igitman@nvidia.com> Date: Thu Jan 29 17:57:07 2026 -0800 Upgrade containers (#1198) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Sadegh Mahdavi <smahdavi@nvidia.com> commit 3b44f02 Author: Dan Lord <blahblahasdf@gmail.com> Date: Thu Jan 29 16:40:47 2026 -0800 Fix incorrect string format (#1199) Signed-off-by: dlord <dlord@nvidia.com> commit c4854b8 Author: Sadegh Mahdavi <smahdavi4@gmail.com> Date: Thu Jan 29 13:43:36 2026 -0800 Update nemo-rl to latest (#1087) Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dgitman <dgitman@nvidia.com>

wasiahmad added 2 commits February 5, 2026 14:11

lcb generalized prompting support

27e58bf

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

lcb generalized prompting support

e6d077a

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

wasiahmad requested review from Kipok and titu1994 February 5, 2026 22:13

wasiahmad enabled auto-merge (squash) February 5, 2026 22:13

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

Merge branch 'main' into lcb_prompt_generalization

92b064c

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

Kipok reviewed Feb 6, 2026

View reviewed changes

Merge branch 'main' into lcb_prompt_generalization

7e4c567

greptile-apps bot reviewed Feb 6, 2026

View reviewed changes

Merge branch 'main' into lcb_prompt_generalization

d8c7bb7

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

Merge branch 'main' into lcb_prompt_generalization

8a07420

greptile-apps bot reviewed Feb 7, 2026

View reviewed changes

nemo_skills/dataset/livecodebench/prepare.py Outdated Show resolved Hide resolved

nemo_skills/dataset/livecodebench-cpp/prepare.py Outdated Show resolved Hide resolved

nemo_skills/dataset/livecodebench-pro/prepare.py Outdated Show resolved Hide resolved

wasiahmad added 2 commits February 8, 2026 00:29

removing extra \n from starter_code

5ccb141

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

removing extra \n\n from starter_code

781e3f3

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

greptile-apps bot reviewed Feb 8, 2026

View reviewed changes

coderabbitai bot reviewed Feb 8, 2026

View reviewed changes

Kipok added the run GPU tests label Feb 10, 2026

Kipok approved these changes Feb 10, 2026

View reviewed changes

wasiahmad merged commit bd9d30c into main Feb 10, 2026
6 checks passed

wasiahmad deleted the lcb_prompt_generalization branch February 10, 2026 23:13

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

LCB generic prompting (#1215)

25d8a4e

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

LCB generic prompting (#1215)

a201d6a

Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dgitman <dgitman@nvidia.com>

Conversation

wasiahmad commented Feb 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 5, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

Kipok Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

wasiahmad Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Kipok Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

wasiahmad Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 6, 2026

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wasiahmad commented Feb 5, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 5, 2026 •

edited

Loading