Skip to content

Fix missing evaluator for SWE-bench#874

Merged
Kipok merged 1 commit intomainfrom
ludwig-n/swe-bench-fix-evaluator
Oct 1, 2025
Merged

Fix missing evaluator for SWE-bench#874
Kipok merged 1 commit intomainfrom
ludwig-n/swe-bench-fix-evaluator

Conversation

@ludwig-n
Copy link
Collaborator

@ludwig-n ludwig-n commented Oct 1, 2025

Fixes an issue introduced by #825 where this line was failing because SweBenchGenerationTask has no evaluator attribute.

Edit: the coderabbit summary is nonsense, I'm just overriding the existing apply_evaluation_hook function to be a no-op for SWE-bench because we don't use an evaluator here (yet).

Summary by CodeRabbit

  • Refactor

    • Introduced an asynchronous evaluation hook to the evaluation workflow, enabling a standardized post-generation processing step.
    • Current behavior remains unchanged; data passes through without modification.
  • Chores

    • Prepared the evaluation pipeline for future extensibility with a non-disruptive hook point.

Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 1, 2025

Walkthrough

Introduces an asynchronous hook method apply_evaluation_hook(self, data_point) to SweBenchGenerationTask in nemo_skills/inference/eval/swebench.py. The method currently returns the input data_point unchanged, with a comment indicating evaluation is performed directly after generation.

Changes

Cohort / File(s) Summary
Evaluation Hook Addition
nemo_skills/inference/eval/swebench.py
Added async method apply_evaluation_hook(self, data_point) that returns data_point unchanged; comment notes evaluation occurs immediately after generation.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant U as Caller
    participant T as SweBenchGenerationTask
    rect rgb(240,248,255)
    note over T: Generation phase
    U->>T: generate(data_point)
    T-->>U: generated_output
    end
    rect rgb(245,255,240)
    note over T: New hook (post-generation)
    U->>T: apply_evaluation_hook(data_point)
    T-->>U: data_point (unchanged)
    end
    rect rgb(255,250,240)
    note over T: Evaluation phase
    U->>T: evaluate(generated_output, data_point)
    T-->>U: evaluation_result
    end
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

I twitch my ears at hooks so new,
A nibble of code, no change in view—
Pass-through carrot, crisp and plain,
Post-gen hop, then tests remain.
In fields of eval, I bound with cheer,
Small tweak today—more growth next year! 🥕🐇

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title precisely identifies the main change of restoring the missing evaluator for the SWE-bench task, directly reflecting the pull request’s objective to fix the evaluator attribute error in SweBenchGenerationTask.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ludwig-n/swe-bench-fix-evaluator

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Collaborator

@Kipok Kipok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ludwig-n would be great to add slurm tests for swe-bench, so that we can catch issues like this in the future automatically

@Kipok Kipok merged commit b01671c into main Oct 1, 2025
6 checks passed
@Kipok Kipok deleted the ludwig-n/swe-bench-fix-evaluator branch October 1, 2025 20:28
SeanNaren pushed a commit to SeanNaren/NeMo-Skills that referenced this pull request Oct 9, 2025
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
SeanNaren pushed a commit that referenced this pull request Oct 9, 2025
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: SeanNaren <snarenthiran@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: Nikolai Ludwig <nliudvig@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants