Fix missing evaluator for SWE-bench by ludwig-n · Pull Request #874 · NVIDIA-NeMo/Skills

ludwig-n · 2025-10-01T11:35:43Z

Fixes an issue introduced by #825 where this line was failing because SweBenchGenerationTask has no evaluator attribute.

Edit: the coderabbit summary is nonsense, I'm just overriding the existing apply_evaluation_hook function to be a no-op for SWE-bench because we don't use an evaluator here (yet).

Summary by CodeRabbit

Refactor
- Introduced an asynchronous evaluation hook to the evaluation workflow, enabling a standardized post-generation processing step.
- Current behavior remains unchanged; data passes through without modification.
Chores
- Prepared the evaluation pipeline for future extensibility with a non-disruptive hook point.

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

coderabbitai · 2025-10-01T11:35:51Z

Walkthrough

Introduces an asynchronous hook method apply_evaluation_hook(self, data_point) to SweBenchGenerationTask in nemo_skills/inference/eval/swebench.py. The method currently returns the input data_point unchanged, with a comment indicating evaluation is performed directly after generation.

Changes

Cohort / File(s)	Summary
Evaluation Hook Addition `nemo_skills/inference/eval/swebench.py`	Added async method apply_evaluation_hook(self, data_point) that returns data_point unchanged; comment notes evaluation occurs immediately after generation.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as Caller
    participant T as SweBenchGenerationTask
    rect rgb(240,248,255)
    note over T: Generation phase
    U->>T: generate(data_point)
    T-->>U: generated_output
    end
    rect rgb(245,255,240)
    note over T: New hook (post-generation)
    U->>T: apply_evaluation_hook(data_point)
    T-->>U: data_point (unchanged)
    end
    rect rgb(255,250,240)
    note over T: Evaluation phase
    U->>T: evaluate(generated_output, data_point)
    T-->>U: evaluation_result
    end

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

I twitch my ears at hooks so new,
A nibble of code, no change in view—
Pass-through carrot, crisp and plain,
Post-gen hop, then tests remain.
In fields of eval, I bound with cheer,
Small tweak today—more growth next year! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title precisely identifies the main change of restoring the missing evaluator for the SWE-bench task, directly reflecting the pull request’s objective to fix the evaluator attribute error in SweBenchGenerationTask.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ludwig-n/swe-bench-fix-evaluator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Kipok

@ludwig-n would be great to add slurm tests for swe-bench, so that we can catch issues like this in the future automatically

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Make apply_evaluation_hook a no-op for SWE-bench

8447e4d

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>

Kipok approved these changes Oct 1, 2025

View reviewed changes

Kipok merged commit b01671c into main Oct 1, 2025
6 checks passed

Kipok deleted the ludwig-n/swe-bench-fix-evaluator branch October 1, 2025 20:28

SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025

Fix missing evaluator for SWE-bench (NVIDIA-NeMo#874)

1b8577d

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

SeanNaren pushed a commit that referenced this pull request Oct 9, 2025

Fix missing evaluator for SWE-bench (#874)

6d19fb5

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: SeanNaren <snarenthiran@nvidia.com>

coderabbitai bot mentioned this pull request Dec 23, 2025

Evaluation support for SWE-rebench #1102

Merged

dgtm777 pushed a commit that referenced this pull request Mar 18, 2026

Fix missing evaluator for SWE-bench (#874)

257c501

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com> Signed-off-by: dgitman <dgitman@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix missing evaluator for SWE-bench#874

Fix missing evaluator for SWE-bench#874
Kipok merged 1 commit intomainfrom
ludwig-n/swe-bench-fix-evaluator

ludwig-n commented Oct 1, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 1, 2025 •

edited

Loading

Uh oh!

Kipok left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ludwig-n commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Kipok left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ludwig-n commented Oct 1, 2025 •

edited

Loading

coderabbitai bot commented Oct 1, 2025 •

edited

Loading