Conversation
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
📝 WalkthroughWalkthroughThe PR introduces LiveCodeBench-Pro evaluation support by adding configuration constants, dataset preparation utilities for downloading and processing remote test cases, and a new evaluator function with automatic dependency installation and sample processing. Changes
Sequence DiagramsequenceDiagram
participant User
participant Evaluator as eval_livecodebench_pro()
participant HF as Hugging Face Hub
participant LiveCodeBench as livecodebench lib
participant Sandbox as Local Sandbox
User->>Evaluator: Call with config (test_dir, language=cpp)
Evaluator->>Evaluator: Import livecodebench
alt Package not found
Evaluator->>HF: Install from Git URL
HF-->>Evaluator: Installation complete
end
Evaluator->>Evaluator: Read samples from JSONL
Evaluator->>Evaluator: Preprocess samples (strip_whitespace=True)
Evaluator->>Evaluator: Add code_list field per sample
Evaluator->>LiveCodeBench: Call evaluate() with<br/>language, test_file, timeout
LiveCodeBench->>Sandbox: Execute test cases<br/>(num_processes=12)
Sandbox-->>LiveCodeBench: Test results
LiveCodeBench-->>Evaluator: Evaluation results file
Evaluator->>Evaluator: Load results & attach<br/>graded_list per sample
Evaluator->>Evaluator: Rewrite JSONL with results
Evaluator-->>User: Return evaluated samples
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
nemo_skills/evaluation/evaluator/code.py (1)
239-246: Consider usingstr | Nonetype annotations for optional fields.The
test_fileandtest_dirfields default toNonebut are typed asstr. For consistency withBaseEvaluatorConfig(which usesstr | None), consider updating the type hints.@nested_dataclass(kw_only=True) class LiveCodeBenchProEvaluatorConfig(BaseEvaluatorConfig): sandbox: dict = field(default_factory=lambda: {"sandbox_type": "local"}) language: str = "cpp" # use either "python" or "cpp" - test_file: str = None - test_dir: str = None # path to the unit tests directory + test_file: str | None = None + test_dir: str | None = None # path to the unit tests directory timeout: int = 6 num_processes: int = 12
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
nemo_skills/dataset/livecodebench-pro/__init__.py(1 hunks)nemo_skills/dataset/livecodebench-pro/prepare.py(1 hunks)nemo_skills/evaluation/evaluator/code.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
nemo_skills/evaluation/evaluator/code.py (3)
nemo_skills/utils.py (1)
nested_dataclass(69-102)nemo_skills/evaluation/evaluator/base.py (1)
BaseEvaluatorConfig(27-31)nemo_skills/evaluation/evaluator/__init__.py (1)
evaluate(117-131)
🪛 Ruff (0.14.8)
nemo_skills/dataset/livecodebench-pro/prepare.py
69-69: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: pre-commit
- GitHub Check: unit-tests
🔇 Additional comments (7)
nemo_skills/dataset/livecodebench-pro/__init__.py (1)
18-19: LGTM!The configuration constants are consistent with the rest of the PR:
EVAL_SPLIT = "test_25q2"matches the 25q2 split defined inprepare.py, andcpp_codegenaligns with the defaultlanguage: str = "cpp"inLiveCodeBenchProEvaluatorConfig.nemo_skills/dataset/livecodebench-pro/prepare.py (3)
22-29: LGTM!The repository constants and split definitions are clear and well-structured. The tuple format
(tag, split_name, expected_count)provides useful validation data.
32-42: LGTM!The error handling appropriately logs the failure and re-raises to propagate the error.
73-83: LGTM!The main block properly validates the
HF_TOKENenvironment variable and orchestrates the two-step workflow.nemo_skills/evaluation/evaluator/code.py (3)
125-134: LGTM!Good defensive change to handle the edge case where the generation contains a
</think>closing tag without the opening tag.
292-293: LGTM!The pattern of moving the eval results file to prevent recomputation is consistent with other evaluators in this file.
285-290: Verifyquestion_idfield exists in the LiveCodeBench-Pro HuggingFace dataset.The code at line 289 accesses
sample["question_id"]to look up evaluation results in the grades dictionary. While the prepare.py script preserves all fields from the source dataset viaoutput_record = dict(row), the presence ofquestion_idin the original HuggingFace repository (QAQAQAQAQ/LiveCodeBench-Pro) should be explicitly confirmed in documentation or code comments to ensure the data pipeline is robust.
|
@gwarmstrong this PR is ready to be merged. I have checked it by evaluating Qwen3-30B-A3B-Thinking-2507 and Qwen3-235B-A22B-Thinking-2507 models on LCB-Pro dataset. The results align with our expectation. |
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: i-vainn <imoshkov@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
…ize_robustness generic for more benchmarks, update docstrings. (#1079) Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: bzantium <ryumin93@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Stephen Ge <stepheng@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: i-vainn <imoshkov@nvidia.com> Signed-off-by: Grigor Nalbandyan <gnalbandyan@nvidia.com> Co-authored-by: Nick Ludwig <nliudvig@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Ivan <imoshkov@nvidia.com> Co-authored-by: Wojciech Prazuch <wojciechprazuch3@gmail.com> Co-authored-by: gnalbandyan <153070076+gnalbandyan@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Sanyam Kapoor <sanyamk@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Jiacheng Xu <jiachengx@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: Jiacheng Xu <jiachengx@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: Mateusz Winiarek <mwiniarek@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
…ontainers (#1116) Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Zelenfroind <gzelenfroind@nvidia.com> Signed-off-by: George <37293288+Jorjeous@users.noreply.github.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com> Co-authored-by: Mehrzad Samadi <mehrzadsamadi@gmail.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Signed-off-by: George Armstrong <georgea@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Wei Du <wedu@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
…#1129) Signed-off-by: Stephen Ge <stepheng@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Arkadiusz Nowaczynski <anowaczynski@nvidia.com> Co-authored-by: George Armstrong <georgea@nvidia.com> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
6a0260d to
2ff5b51
Compare
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
24c9fa3 to
18e7043
Compare
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dlord <dlord@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: wasiahmad <wasiahmad@ucla.edu> Signed-off-by: dgitman <dgitman@nvidia.com>
Dataset: https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro
Test cases: https://huggingface.co/datasets/QAQAQAQAQ/LiveCodeBench-Pro-Testcase
Summary by CodeRabbit
Release Notes
New Features
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.