Skip to content

Evaluation on LiveBench-Coding#821

Merged
Kipok merged 8 commits intomainfrom
feat/livebench-coding
Sep 18, 2025
Merged

Evaluation on LiveBench-Coding#821
Kipok merged 8 commits intomainfrom
feat/livebench-coding

Conversation

@wasiahmad
Copy link
Collaborator

In this PR, we are adding evaluation support for LiveBench-Coding through nemo-skills. Primary changes are:

  • Dataset download through nemo-skills/dataset/livebench-coding
  • Evaluation logic implemented at nemo_skills/evaluation/evaluator/code.py

@wasiahmad wasiahmad requested a review from Kipok September 18, 2025 08:08
@wasiahmad
Copy link
Collaborator Author

wasiahmad commented Sep 18, 2025

Evaluation results based on Qwen2.5-Coder-32B-Instruct (paper (Table 2, page 18) reporting 56.8 pass@1).

Our evaluation gives (greedy decoding but still ran 10 times):

"pass@1[avg-of-10]": {
    "num_entries": 128,
    "avg_tokens": 282,
    "gen_seconds": 897,
    "accuracy": 54.843750000000014,
    "accuracy_statistics": {
        "avg": 0.5484375,
        "std_dev_across_runs": 0.009603692143013424,
        "avg_sample_std_dev": 0.01391647534032558,
        "std_err_across_runs": 0.003036954111898594
    }
}

@wasiahmad wasiahmad marked this pull request as draft September 18, 2025 08:37
@wasiahmad wasiahmad marked this pull request as ready for review September 18, 2025 09:01
Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update docs. Also Wasi, please start using -s for your commits so that DCO passes. At some point we will enforce it and not merge any PRs for which it doesn't pass

Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
@Kipok Kipok merged commit 882675a into main Sep 18, 2025
4 of 5 checks passed
@Kipok Kipok deleted the feat/livebench-coding branch September 18, 2025 19:28
fayejf pushed a commit that referenced this pull request Sep 19, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: fayejf <fayejf07@gmail.com>
jiacheng-xu pushed a commit that referenced this pull request Sep 19, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
wasiahmad added a commit that referenced this pull request Oct 1, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Oct 8, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Dec 5, 2025
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: wasiahmad <wasiahmad@ucla.edu>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants