PyPy3 execution support for LiveCodeBench evaluation#614
Conversation
Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Evelina <ebakhturina@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>
Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com>
Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Shubham Toshniwal <shtoshni@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <3909933+activatedgeek@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Wei Du <wedu@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/evaluation/code.md (1)
185-249: Fix fenced code block languages to satisfy markdownlint.markdownlint (MD040) is flagging each of these LiveCodeBench snippets because the fences lack a language hint. Tagging them with the appropriate language (bash/json) will unblock the lint job and keeps syntax highlighting consistent.
Apply this diff to cover all occurrences:
-``` +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05@@
-+bash
ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05 --keep_all_columns --cluster=<CLUSTER_NAME> --data_dir=<DATA_DIR>@@ -``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \ --model=nvidia/OpenReasoning-Nemotron-32B \ @@ ++inference.tokens_to_generate=65536@@
-+bash
--extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl"@@ -``` +```json { "livecodebench": { "pass@1": { "accuracy": 71.14 ... }}, "livecodebench-easy": { "pass@1": { "accuracy": 99.09 ... }}, "livecodebench-hard": { "pass@1": { "accuracy": 46.30 ... }}, "livecodebench-medium": { "pass@1": { "accuracy": 85.10 ... }} }@@
-+bash
--benchmarks=livecodebench:3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
185-185: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
192-192: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
226-226: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
247-247: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com>
Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com>
Signed-off-by: Igor Gitman <igitman@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/evaluation/code.md (1)
186-248: Add info strings to new code fences.All newly added code blocks (commands, JSON output) lack language specifiers, which triggers MD040 and hurts syntax highlighting. Please annotate them with something like
shell for CLI commands andjson for metrics snippets.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/evaluation/code.md(1 hunks)nemo_skills/evaluation/evaluator/__init__.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- nemo_skills/evaluation/evaluator/init.py
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
185-185: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
192-192: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
226-226: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
247-247: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: unit-tests
- GitHub Check: pre-commit
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
4970d6e to
4042286
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
nemo_skills/evaluation/evaluator/livecodebench.py (2)
40-71: Install packages in the same sandbox session used for evaluation.
install_packages()(lines 40-57) creates its own sandbox context, installs the LiveCodeBench library, then closes that sandbox. The actual evaluation at line 71 opens a new sandbox session. For isolated sandbox types (Docker, SSH, remote HTTP), the second sandbox starts clean and will not have access to the packages installed in the first session, causing thefrom livecodebench.evaluate import evaluatestatement to fail at runtime.Move the package installation inside the evaluation sandbox session:
- if not await install_packages(eval_config): - return - async with sandbox_context(eval_config.sandbox) as sandbox: + if not await install_packages(eval_config, sandbox): + return + for jsonl_file in unroll_files(cfg.input_files):And update
install_packagesto accept an active sandbox:-async def install_packages(eval_config: LiveCodeBenchEvaluatorConfig) -> bool: +async def install_packages(eval_config: LiveCodeBenchEvaluatorConfig, sandbox) -> bool: """ Installs required packages in a temporary sandbox. Returns True on success, False on failure. """ - async with sandbox_context(eval_config.sandbox) as sandbox: - LOG.info(f"Installing livecodebench with {eval_config.interpreter}...") - pip_cmd = "pip" if eval_config.interpreter == "python" else "pypy3 -m pip" - git_url = LIVECODEBENCH_PYTHON_GIT_URL if eval_config.interpreter == "python" else LIVECODEBENCH_PYPY3_GIT_URL - cmd = f"{pip_cmd} install {git_url}" - - result, _ = await sandbox.execute_code(cmd, language="shell", timeout=300) - if result.get("process_status") != "completed": - LOG.warning(f"Failed to install livecodebench: {result.get('stderr', 'Unknown error')}") - return False - - LOG.info("Successfully installed livecodebench.") - return True + LOG.info(f"Installing livecodebench with {eval_config.interpreter}...") + pip_cmd = "pip" if eval_config.interpreter == "python" else "pypy3 -m pip" + git_url = LIVECODEBENCH_PYTHON_GIT_URL if eval_config.interpreter == "python" else LIVECODEBENCH_PYPY3_GIT_URL + cmd = f"{pip_cmd} install {git_url}" + + result, _ = await sandbox.execute_code(cmd, language="shell", timeout=300) + if result.get("process_status") != "completed": + LOG.warning(f"Failed to install livecodebench: {result.get('stderr', 'Unknown error')}") + return False + + LOG.info("Successfully installed livecodebench.") + return True
127-129: Avoidasyncio.runwhen an event loop may already be running.
asyncio.run()will raiseRuntimeError: asyncio.run() cannot be called from a running event loopif the caller is async-aware (e.g., Jupyter notebooks, async CLI frameworks, Triton job managers). This breaks the function in those environments.Replace the wrapper with a version that checks for an active loop:
def eval_livecodebench(cfg): - """Synchronous wrapper to run the async evaluation.""" - asyncio.run(eval_livecodebench_async(cfg)) + """Run the async evaluation, reusing an existing loop when present.""" + try: + loop = asyncio.get_running_loop() + except RuntimeError: + asyncio.run(eval_livecodebench_async(cfg)) + else: + return loop.create_task(eval_livecodebench_async(cfg))
🧹 Nitpick comments (4)
nemo_skills/dataset/livecodebench/prepare.py (1)
154-157: Consider extracting the exception message.The static analysis tool flags a long exception message inline. While this is a minor style issue, extracting it to a module-level constant improves maintainability.
Apply this diff:
+CUSTOM_SPLIT_ERROR_MSG = ( + "If preparing a custom split, you must specify all " + "--release_version, --start_date, and --end_date arguments." +) + ... if args.release_version == "all" or args.start_date == "all" or args.end_date == "all": - raise ValueError( - "If preparing a custom split, you must specify all " - "--release_version, --start_date, and --end_date arguments." - ) + raise ValueError(CUSTOM_SPLIT_ERROR_MSG)nemo_skills/evaluation/evaluator/code.py (1)
152-163: Clarify the purpose and scope ofeval_livecodebench_pro.This function only transforms field names (
task_id→problem_id,completion→text_response) and setsresponse_metatoNone. It does not invoke any external evaluation library or compute correctness. The nameeval_livecodebench_proimplies evaluation, but this is strictly a post-processing step.Consider renaming to
postprocess_livecodebench_proor adding a docstring that clarifies this is a schema transformation, not an evaluation workflow:def eval_livecodebench_pro(cfg): + """Post-process LiveCodeBench-Pro samples: rename fields and add response_meta.""" for jsonl_file in unroll_files(cfg.input_files):Alternatively, if this function is intended to be called by a separate evaluation harness, document that expectation.
docs/evaluation/code.md (1)
186-248: Add language specifiers to fenced code blocks.The markdown linter flags that fenced code blocks at lines 186, 193, 205, 227, 235, and 248 are missing language specifiers. Adding
bashorshellidentifiers improves syntax highlighting and readability.Apply this diff:
-``` +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05...
-
+bash
ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05 --keep_all_columns --cluster=<CLUSTER_NAME> --data_dir=<DATA_DIR>... -``` +```bash ns eval \ ......
-
+bash
--extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl"... -``` +```json { "livecodebench": { "pass@1": { "accuracy": 71.14 ... }}, ... }...
-
+bash
--benchmarks=livecodebench:3nemo_skills/evaluation/evaluator/livecodebench.py (1)
64-80: Extract long exception messages to improve maintainability.Static analysis flags lines 64, 66, and 80 for embedding long exception messages inline. Extracting them to module-level constants improves readability and maintainability.
Apply this diff:
+INVALID_PYTHON_INTERPRETER_MSG = "Python interpreter must be 'python' or 'pypy3'." +CPP_REQUIRES_TEST_FILE_MSG = "C++ evaluation requires a test_file." +MIXED_RELEASE_VERSIONS_MSG = "All samples should have the same release version. Found: {versions}" + ... if eval_config.language == "python" and eval_config.interpreter not in ["python", "pypy3"]: - raise ValueError("Python interpreter must be 'python' or 'pypy3'.") + raise ValueError(INVALID_PYTHON_INTERPRETER_MSG) if eval_config.language == "cpp" and eval_config.test_file is None: - raise ValueError("C++ evaluation requires a test_file.") + raise ValueError(CPP_REQUIRES_TEST_FILE_MSG) ... versions = {s["release_version"] for s in samples} if len(versions) > 1: - raise ValueError(f"All samples should have the same release version. Found: {versions}") + raise ValueError(MIXED_RELEASE_VERSIONS_MSG.format(versions=versions))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/evaluation/code.md(1 hunks)nemo_skills/dataset/livecodebench/prepare.py(6 hunks)nemo_skills/evaluation/evaluator/__init__.py(1 hunks)nemo_skills/evaluation/evaluator/code.py(2 hunks)nemo_skills/evaluation/evaluator/livecodebench.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- nemo_skills/evaluation/evaluator/init.py
🧰 Additional context used
🧬 Code graph analysis (2)
nemo_skills/evaluation/evaluator/code.py (2)
nemo_skills/file_utils.py (1)
unroll_files(21-32)nemo_skills/utils.py (1)
get_logger_name(130-131)
nemo_skills/evaluation/evaluator/livecodebench.py (3)
nemo_skills/code_execution/sandbox.py (2)
get_sandbox(419-422)close(77-79)nemo_skills/evaluation/evaluator/code.py (1)
preprocess_code(36-92)nemo_skills/utils.py (2)
get_logger_name(130-131)nested_dataclass(49-82)
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
186-186: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
193-193: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
205-205: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
227-227: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
235-235: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
248-248: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.13.1)
nemo_skills/dataset/livecodebench/prepare.py
154-157: Avoid specifying long messages outside the exception class
(TRY003)
nemo_skills/evaluation/evaluator/livecodebench.py
64-64: Avoid specifying long messages outside the exception class
(TRY003)
66-66: Avoid specifying long messages outside the exception class
(TRY003)
80-80: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
🔇 Additional comments (4)
nemo_skills/evaluation/evaluator/code.py (1)
26-27: Import fix confirmed.The import of
unroll_filesfromnemo_skills.file_utilsis now correct and resolves theImportErrorflagged in the previous review.nemo_skills/evaluation/evaluator/livecodebench.py (1)
103-103: LGTM: Safe command construction.Using
shlex.quoteto escape the Python code passed to-cprevents shell injection and correctly handles special characters in the evaluation script.nemo_skills/dataset/livecodebench/prepare.py (2)
122-125: No downstream JSONL readers detected—output format change is self-contained.
All JSONL handling lives inprepare.py; no consumer code parses these files expecting a fixed schema.
96-97: Cast columns only when they are present.Lines 96-97 unconditionally cast
public_test_casesandprivate_test_casestolarge_string, but whenkeep_all_columns=False, these columns are removed at line 98. Thedatasetslibrary will raise aKeyErrorwhen attempting to cast a column that has already been removed from the dataset.Move the casting before the removal, or conditionally cast only when
keep_all_columns=True:+ if keep_all_columns: + dataset = dataset.cast_column("public_test_cases", Value("large_string")) + dataset = dataset.cast_column("private_test_cases", Value("large_string")) remove_columns = [] if not keep_all_columns: remove_columns = [ "question_title", "contest_id", "metadata", "question_content", "platform", "question_id", "starter_code", "public_test_cases", "private_test_cases", ] - dataset = dataset.cast_column("public_test_cases", Value("large_string")) - dataset = dataset.cast_column("private_test_cases", Value("large_string")) dataset = dataset.map(map_fn, remove_columns=remove_columns)Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
docs/evaluation/code.md (2)
223-229: Replace PyPy3 snippet with a runnable command and use bash fence.A full command avoids ambiguity and is immediately executable.
-##### Pypy3 Evaluation +##### PyPy3 Evaluation @@ -To run with the Pypy3 interpreter, modify the `--extra_eval_args` flag as shown below. -``` ---extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl" -``` +To run with the PyPy3 interpreter, use: +```bash +ns eval \ + --cluster=<CLUSTER_NAME> \ + --model=nvidia/OpenReasoning-Nemotron-32B \ + --server_type=vllm \ + --server_args="--async-scheduling" \ + --server_nodes=1 \ + --server_gpus=8 \ + --benchmarks=livecodebench \ + --split=test_v6_2408_2505 \ + --data_dir=<DATA_DIR> \ + --output_dir=<OUTPUT_DIR> \ + --extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl" \ + --with_sandbox \ + ++inference.temperature=0.6 \ + ++inference.top_p=0.95 \ + ++inference.tokens_to_generate=65536 +```
196-221: Optional: add a local (non‑Slurm) example.Many users run locally; adding a minimal local command improves usability.
Tip: For local runs, omit --cluster and use a smaller --server_gpus (e.g., 1). Example: ```bash ns eval \ --model=nvidia/OpenReasoning-Nemotron-32B \ --server_type=vllm \ --benchmarks=livecodebench \ --split=test_v6_2408_2505 \ --data_dir=<DATA_DIR> \ --output_dir=<OUTPUT_DIR> \ --extra_eval_args="++eval_config.interpreter=python" \ --with_sandbox</blockquote></details> </blockquote></details> <details> <summary>📜 Review details</summary> **Configuration used**: CodeRabbit UI **Review profile**: CHILL **Plan**: Pro <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 404228692685c29e5298af14b5707a0cbd0df366 and 27fde4f274ab1f6f5104a59a8a6e53bb4ed26164. </details> <details> <summary>📒 Files selected for processing (1)</summary> * `docs/evaluation/code.md` (1 hunks) </details> <details> <summary>🧰 Additional context used</summary> <details> <summary>🪛 markdownlint-cli2 (0.18.1)</summary> <details> <summary>docs/evaluation/code.md</summary> 185-185: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 192-192: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 204-204: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 226-226: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 234-234: Fenced code blocks should have a language specified (MD040, fenced-code-language) --- 247-247: Fenced code blocks should have a language specified (MD040, fenced-code-language) </details> </details> </details> <details> <summary>⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)</summary> * GitHub Check: unit-tests </details> <details> <summary>🔇 Additional comments (1)</summary><blockquote> <details> <summary>docs/evaluation/code.md (1)</summary><blockquote> `205-221`: **Confirm vLLM flag support and test file path** - `--async-scheduling` is forwarded verbatim by the CLI; manually verify your vLLM server image supports this flag. - The evaluator config keys `interpreter` (used for Python vs. PyPy) and `test_file` (required for C++ runs) match those in `livecodebench.py`. - Ensure `<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl` (i.e. `{data_dir}/livecodebench/{split}.jsonl`) aligns with the output of `ns prepare_data --keep_all_columns`. </blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
Kipok
left a comment
There was a problem hiding this comment.
thanks @wasiahmad! Just a small comment, but otherwise looks good
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (6)
docs/evaluation/code.md (6)
185-187: Add bash fence to the data prep snippet.Without a language tag markdownlint (MD040) keeps failing; please mark this command as bash.
-``` +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05
189-194: Fix PyPy3 casing and add bash fence.Line 189 should read “PyPy3”, and the shell example needs a bash fence to satisfy MD040.
-##### For Pypy3 Evaluation: -If you plan to evaluate using the Pypy3 interpreter, you must add the `--keep_all_columns` flag during data preparation. This will download a larger dataset (~1.9GB) containing the necessary test cases. So, we recommend downloading the dataset into a Slurm cluster location. -``` +##### For PyPy3 Evaluation: +If you plan to evaluate using the PyPy3 interpreter, you must add the `--keep_all_columns` flag during data preparation. This will download a larger dataset (~1.9GB) containing the necessary test cases. So, we recommend downloading the dataset into a Slurm cluster location. +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05 --keep_all_columns --cluster=<CLUSTER_NAME> --data_dir=<DATA_DIR>
204-221: Mark the evaluation command as bash.The primary eval command also needs a bash fence for linting and readability.
-``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \ --model=nvidia/OpenReasoning-Nemotron-32B \ @@ ++inference.top_p=0.95 \ ++inference.tokens_to_generate=65536
227-228: Add bash fence to the PyPy3 override snippet.This single-line CLI flag should be fenced as bash to clear MD040.
-``` +```bash --extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl"
234-253: Tag the metrics block as text (lint fix).The metrics excerpt needs a language on the fence;
textworks well here.-``` +```text -------------------------- livecodebench -------------------------- evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy pass@1 | 454 | 15995 | 2188 | 71.15% @@ ----------------------- livecodebench-medium ---------------------- evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy pass@1 | 141 | 14178 | 1889 | 85.11%
259-261: Add bash fence to the repeats snippet.Marking this CLI flag as bash resolves MD040 and improves copy/paste.
-``` +```bash --benchmarks=livecodebench:3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
185-185: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
192-192: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
226-226: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
docs/evaluation/code.md (4)
185-187: Add a bash fence to the prepare command.Apply the markdownlint fix so the snippet lint-checks cleanly and gets syntax highlighting.
-``` +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05 -``` +```
189-194: Fix PyPy3 casing and tag the command as bash.Consistent branding and a language-tagged fence keep the docs polished.
-##### For Pypy3 Evaluation: +##### For PyPy3 Evaluation: @@ -``` +```bash ns prepare_data livecodebench --release_version v6 --start_date 2024-08 --end_date 2025-05 --keep_all_columns --cluster=<CLUSTER_NAME> --data_dir=<DATA_DIR> -``` +```
205-221: Annotate the full eval command as bash.Needed for markdownlint MD040 and better UX.
-``` +```bash ns eval \ --cluster=<CLUSTER_NAME> \ @@ ++inference.top_p=0.95 \ ++inference.tokens_to_generate=65536 -``` +```
260-261: Add a bash fence to the repeat-run flag.Keeps lint happy and clarifies that this is a CLI flag.
-``` +```bash --benchmarks=livecodebench:3 -``` +```
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/evaluation/code.md(1 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.18.1)
docs/evaluation/code.md
185-185: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
192-192: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
204-204: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
226-226: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
259-259: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: unit-tests
| --extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl" | ||
| ``` |
There was a problem hiding this comment.
Tag the interpreter override snippet as bash.
Prevents MD040 lint failures.
-```
+```bash
--extra_eval_args="++eval_config.interpreter=pypy3 ++eval_config.test_file=<DATA_DIR>/livecodebench/test_v6_2408_2505.jsonl"
-```
+```🤖 Prompt for AI Agents
In docs/evaluation/code.md around lines 227-228, the code fence containing the
--extra_eval_args snippet is not tagged with a language which triggers MD040;
update the opening fence to ```bash so the snippet is explicitly labeled as bash
and ensure the closing ``` fence remains present and correctly placed.
| -------------------------- livecodebench -------------------------- | ||
| evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy | ||
| pass@1 | 454 | 15995 | 2188 | 71.15% | ||
|
|
||
|
|
||
| ------------------------ livecodebench-easy ----------------------- | ||
| evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy | ||
| pass@1 | 110 | 5338 | 1806 | 99.09% | ||
|
|
||
|
|
||
| ------------------------ livecodebench-hard ----------------------- | ||
| evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy | ||
| pass@1 | 203 | 23031 | 2188 | 46.31% | ||
|
|
||
|
|
||
| ----------------------- livecodebench-medium ---------------------- | ||
| evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy | ||
| pass@1 | 141 | 14178 | 1889 | 85.11% | ||
| ``` |
There was a problem hiding this comment.
Declare the metrics block as plain text.
The ASCII table isn’t JSON; marking it as text satisfies MD040 and keeps formatting intact.
-```
+```text
-------------------------- livecodebench --------------------------
evaluation_mode | num_entries | avg_tokens | gen_seconds | accuracy
@@
pass@1 | 141 | 14178 | 1889 | 85.11%
-```
+```🤖 Prompt for AI Agents
In docs/evaluation/code.md around lines 235 to 253, the ASCII metrics table is
currently in a fenced code block without a language, triggering MD040; change
the fence to declare the block as plain text by adding "text" after the opening
triple backticks (i.e., use ```text) and keep the closing triple backticks
unchanged so the table renders as plain text and preserves formatting.
Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Shubham Toshniwal <shtoshni@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Signed-off-by: tamohannes <hovhannes.tamoyan@gmail.com> Signed-off-by: Sadegh Mahdavi <smahdavi4@gmail.com> Signed-off-by: Michal Bien <mbien@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: i-vainn <1vanmoshkov@mail.ru> Signed-off-by: fzyzcjy <ch271828n@outlook.com> Signed-off-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Makesh Sreedhar <makeshn@nvidia.com> Signed-off-by: alessiodevoto <devoto.alessio@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Shuoyang Ding <shuoyangd@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: darraghdog <dhanley@nvidia.com> Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> Signed-off-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Signed-off-by: Adam Rajfer <arajfer@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: Lizzie Wei <lizziew@nvidia.com> Co-authored-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Sadegh Mahdavi <smahdavi4@gmail.com> Co-authored-by: Shubham Toshniwal <shtoshni@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <3909933+activatedgeek@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Hovhannes Tamoyan <hovhannes.tamoyan@gmail.com> Co-authored-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Co-authored-by: Michał Bień <michal@mbien.pl> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Aleksander Ficek <37374704+aleksficek@users.noreply.github.com> Co-authored-by: i-vainn <1vanmoshkov@mail.ru> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Shubham Toshniwal <stoshniwal@nvidia.com> Co-authored-by: David <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Co-authored-by: Daria Gitman <dgitman@nvidia.com> Co-authored-by: Darragh Hanley <darraghdog@users.noreply.github.com> Co-authored-by: Wei Du <wedu@wedu-mlt.client.nvidia.com> Co-authored-by: Shantanu Acharya <shantanua@nvidia.com> Co-authored-by: Shantanu Acharya <shan.sacharya@gmail.com> Co-authored-by: smajumdar <titu1994@gmail.com> Co-authored-by: Ivan <42346810+i-vainn@users.noreply.github.com> Co-authored-by: makeshn <makesh.24@gmail.com> Co-authored-by: Sid Jain <tmfs10@gmail.com> Co-authored-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Alessio Devoto <50107094+alessiodevoto@users.noreply.github.com> Co-authored-by: Xin Yu <60579067+xinyu-dev@users.noreply.github.com> Co-authored-by: Xin Yu <mightycamole@Tumole-Macbook-2024.local> Co-authored-by: Hemil Desai <hemil.desai10@gmail.com> Co-authored-by: Feng Chen <fengchen@nvidia.com> Co-authored-by: Nick Ludwig <nick.ludwig.g@gmail.com> Co-authored-by: shuoyangd <shuoyangd@users.noreply.github.com> Co-authored-by: Stephen Ge <stephen.ge@gmail.com> Co-authored-by: vmendelev <vmendelev@nvidia.com> Co-authored-by: Avinash Vem <avem@nvidia.com> Co-authored-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Co-authored-by: Adam Rajfer <arajfer@nvidia.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Vladimir Bataev <artbataev@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Lizzie Wei <elizabeth.m.wei@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Shubham Toshniwal <shtoshni@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Signed-off-by: tamohannes <hovhannes.tamoyan@gmail.com> Signed-off-by: Sadegh Mahdavi <smahdavi4@gmail.com> Signed-off-by: Michal Bien <mbien@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: i-vainn <1vanmoshkov@mail.ru> Signed-off-by: fzyzcjy <ch271828n@outlook.com> Signed-off-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Makesh Sreedhar <makeshn@nvidia.com> Signed-off-by: alessiodevoto <devoto.alessio@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Shuoyang Ding <shuoyangd@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: darraghdog <dhanley@nvidia.com> Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> Signed-off-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Signed-off-by: Adam Rajfer <arajfer@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: Lizzie Wei <lizziew@nvidia.com> Co-authored-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Sadegh Mahdavi <smahdavi4@gmail.com> Co-authored-by: Shubham Toshniwal <shtoshni@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <3909933+activatedgeek@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Hovhannes Tamoyan <hovhannes.tamoyan@gmail.com> Co-authored-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Co-authored-by: Michał Bień <michal@mbien.pl> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Aleksander Ficek <37374704+aleksficek@users.noreply.github.com> Co-authored-by: i-vainn <1vanmoshkov@mail.ru> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Shubham Toshniwal <stoshniwal@nvidia.com> Co-authored-by: David <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Co-authored-by: Daria Gitman <dgitman@nvidia.com> Co-authored-by: Darragh Hanley <darraghdog@users.noreply.github.com> Co-authored-by: Wei Du <wedu@wedu-mlt.client.nvidia.com> Co-authored-by: Shantanu Acharya <shantanua@nvidia.com> Co-authored-by: Shantanu Acharya <shan.sacharya@gmail.com> Co-authored-by: smajumdar <titu1994@gmail.com> Co-authored-by: Ivan <42346810+i-vainn@users.noreply.github.com> Co-authored-by: makeshn <makesh.24@gmail.com> Co-authored-by: Sid Jain <tmfs10@gmail.com> Co-authored-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Alessio Devoto <50107094+alessiodevoto@users.noreply.github.com> Co-authored-by: Xin Yu <60579067+xinyu-dev@users.noreply.github.com> Co-authored-by: Xin Yu <mightycamole@Tumole-Macbook-2024.local> Co-authored-by: Hemil Desai <hemil.desai10@gmail.com> Co-authored-by: Feng Chen <fengchen@nvidia.com> Co-authored-by: Nick Ludwig <nick.ludwig.g@gmail.com> Co-authored-by: shuoyangd <shuoyangd@users.noreply.github.com> Co-authored-by: Stephen Ge <stephen.ge@gmail.com> Co-authored-by: vmendelev <vmendelev@nvidia.com> Co-authored-by: Avinash Vem <avem@nvidia.com> Co-authored-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Co-authored-by: Adam Rajfer <arajfer@nvidia.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Vladimir Bataev <artbataev@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Lizzie Wei <elizabeth.m.wei@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
Signed-off-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Sadegh Mahdavi <smahdavi@nvidia.com> Signed-off-by: Igor Gitman <igitman@nvidia.com> Signed-off-by: Shubham Toshniwal <stoshniwal@nvidia.com> Signed-off-by: fayejf <fayejf07@gmail.com> Signed-off-by: fayejf <36722593+fayejf@users.noreply.github.com> Signed-off-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: Igor Gitman <igor.a.gitman@gmail.com> Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: Shubham Toshniwal <shtoshni@gmail.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Signed-off-by: tamohannes <hovhannes.tamoyan@gmail.com> Signed-off-by: Sadegh Mahdavi <smahdavi4@gmail.com> Signed-off-by: Michal Bien <mbien@nvidia.com> Signed-off-by: jubick1337 <mattyson.so@gmail.com> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Signed-off-by: i-vainn <1vanmoshkov@mail.ru> Signed-off-by: fzyzcjy <ch271828n@outlook.com> Signed-off-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: Makesh Sreedhar <makeshn@nvidia.com> Signed-off-by: alessiodevoto <devoto.alessio@gmail.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com> Signed-off-by: Shuoyang Ding <shuoyangd@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: darraghdog <dhanley@nvidia.com> Signed-off-by: Valentin Mendelev <vmendelev@nvidia.com> Signed-off-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Signed-off-by: Adam Rajfer <arajfer@nvidia.com> Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: Lizzie Wei <lizziew@nvidia.com> Co-authored-by: Feng Chen <42473790+fchen97@users.noreply.github.com> Co-authored-by: Igor Gitman <igitman@nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: Sadegh Mahdavi <smahdavi4@gmail.com> Co-authored-by: Shubham Toshniwal <shtoshni@gmail.com> Co-authored-by: fayejf <36722593+fayejf@users.noreply.github.com> Co-authored-by: Sanyam Kapoor <3909933+activatedgeek@users.noreply.github.com> Co-authored-by: Igor Gitman <igor.a.gitman@gmail.com> Co-authored-by: Wei Du <wedu@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: Sean Naren <snarenthiran@nvidia.com> Co-authored-by: Hovhannes Tamoyan <hovhannes.tamoyan@gmail.com> Co-authored-by: Hovhannes Tamoyan <htamoyan@htamoyan-mlt.client.nvidia.com> Co-authored-by: Michał Bień <michal@mbien.pl> Co-authored-by: Matvei Novikov <mattyson.so@gmail.com> Co-authored-by: Cheng-Ping Hsieh <37269846+hsiehjackson@users.noreply.github.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Aleksander Ficek <37374704+aleksficek@users.noreply.github.com> Co-authored-by: i-vainn <1vanmoshkov@mail.ru> Co-authored-by: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Co-authored-by: Shubham Toshniwal <stoshniwal@nvidia.com> Co-authored-by: David <dmosallanezh@nvidia.com> Co-authored-by: David Mosallanezhad <dmosallanezh@dmosallanezh-mlt.client.nvidia.com> Co-authored-by: Daria Gitman <dgitman@nvidia.com> Co-authored-by: Darragh Hanley <darraghdog@users.noreply.github.com> Co-authored-by: Wei Du <wedu@wedu-mlt.client.nvidia.com> Co-authored-by: Shantanu Acharya <shantanua@nvidia.com> Co-authored-by: Shantanu Acharya <shan.sacharya@gmail.com> Co-authored-by: smajumdar <titu1994@gmail.com> Co-authored-by: Ivan <42346810+i-vainn@users.noreply.github.com> Co-authored-by: makeshn <makesh.24@gmail.com> Co-authored-by: Sid Jain <tmfs10@gmail.com> Co-authored-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: Alessio Devoto <50107094+alessiodevoto@users.noreply.github.com> Co-authored-by: Xin Yu <60579067+xinyu-dev@users.noreply.github.com> Co-authored-by: Xin Yu <mightycamole@Tumole-Macbook-2024.local> Co-authored-by: Hemil Desai <hemil.desai10@gmail.com> Co-authored-by: Feng Chen <fengchen@nvidia.com> Co-authored-by: Nick Ludwig <nick.ludwig.g@gmail.com> Co-authored-by: shuoyangd <shuoyangd@users.noreply.github.com> Co-authored-by: Stephen Ge <stephen.ge@gmail.com> Co-authored-by: vmendelev <vmendelev@nvidia.com> Co-authored-by: Avinash Vem <avem@nvidia.com> Co-authored-by: Igor Gitman <igitman@cs-oci-ord-login-01.cm.cluster> Co-authored-by: Adam Rajfer <arajfer@nvidia.com> Co-authored-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: Jiacheng Xu <jcxu@utexas.edu> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Vladimir Bataev <artbataev@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Lizzie Wei <elizabeth.m.wei@gmail.com> Signed-off-by: dgitman <dgitman@nvidia.com>
In this PR, we are attempting to include LiveCodeBench score calculation into nemo-skills. We include all supporting scripts at nemo-skills/evaluation/evaluator/livecodebench. And updated the evaluation function at nemo_skills/evaluation/evaluator/livecodebench.py.
Summary by CodeRabbit
New Features
Documentation