Skip to content

add swe-rebench to excluded datasets#1154

Merged
gwarmstrong merged 1 commit intomainfrom
georgea/fix-integration-tests-swe-rebench
Jan 6, 2026
Merged

add swe-rebench to excluded datasets#1154
gwarmstrong merged 1 commit intomainfrom
georgea/fix-integration-tests-swe-rebench

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Jan 6, 2026

Summary by CodeRabbit

  • Tests
    • "swe-rebench" dataset is now excluded from evaluation runs.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: George Armstrong <georgea@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 6, 2026

📝 Walkthrough

Walkthrough

The pull request adds "swe-rebench" to the EXCLUDED_DATASETS list in the GPU test evaluation script. This prevents the dataset from being processed during dataset preparation and evaluation phases.

Changes

Cohort / File(s) Summary
Test Dataset Configuration
tests/gpu-tests/test_eval.py
Added "swe-rebench" to EXCLUDED_DATASETS list to skip this dataset during evaluation.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Possibly related PRs

  • FIX ioi ignore #1131: Modifies the same EXCLUDED_DATASETS set in tests/gpu-tests/test_eval.py, consolidating ioi entries.

Suggested labels

run GPU tests

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately describes the main change: adding 'swe-rebench' to the excluded datasets list in the test file.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c039f5 and 1fc4fb2.

📒 Files selected for processing (1)
  • tests/gpu-tests/test_eval.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: gpu-tests-qwen
  • GitHub Check: Greptile Review
  • GitHub Check: pre-commit
  • GitHub Check: unit-tests
🔇 Additional comments (1)
tests/gpu-tests/test_eval.py (1)

40-40: LGTM!

The addition of "swe-rebench" to the exclusion list is correctly placed and follows the existing pattern. Grouping it with "swe-bench" makes logical sense.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 6, 2026

Greptile Summary

Added swe-rebench to the EXCLUDED_DATASETS set in test_eval.py:40. This exclusion follows the same pattern as swe-bench (line 39), which is appropriate because:

  • SWE-rebench requires explicit parameters like container_formatter, start_date, and end_date in its prepare.py script
  • The dataset doesn't support the simple max_samples parameter used by the test suite
  • SWE-rebench was recently added in Evaluation support for SWE-rebench #1102 and shares the same evaluation infrastructure as swe-bench

The change maintains consistency with the existing exclusion policy stated in the comment on line 27: "These don't support max_samples, require explicit parameters, or are very heavy to prepare"

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is a single-line addition that correctly excludes swe-rebench from automated testing. The exclusion is justified and consistent with the existing pattern for swe-bench. No logical issues, syntax errors, or security concerns exist.
  • No files require special attention

Important Files Changed

Filename Overview
tests/gpu-tests/test_eval.py Added swe-rebench to excluded datasets list - consistent with swe-bench exclusion pattern

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant Test as test_aaa_prepare_and_eval_all_datasets
    participant GDS as get_preparable_datasets()
    participant Datasets as Dataset Directory
    
    Dev->>Test: Run test suite
    Test->>GDS: Get list of preparable datasets
    GDS->>Datasets: Scan dataset directory
    Datasets-->>GDS: Return all datasets with prepare.py
    GDS->>GDS: Filter out EXCLUDED_DATASETS (includes swe-rebench)
    GDS-->>Test: Return filtered dataset list
    Test->>Test: Prepare and evaluate datasets
    Note over Test,GDS: swe-rebench now excluded<br/>like swe-bench (requires<br/>explicit parameters)
Loading

@gwarmstrong gwarmstrong merged commit a04f8e0 into main Jan 6, 2026
6 of 7 checks passed
@gwarmstrong gwarmstrong deleted the georgea/fix-integration-tests-swe-rebench branch January 6, 2026 20:08
blahblahasdf pushed a commit to blahblahasdf/Skills that referenced this pull request Jan 8, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dlord <dlord@nvidia.com>
hsiehjackson pushed a commit that referenced this pull request Jan 13, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
@coderabbitai coderabbitai bot mentioned this pull request Feb 11, 2026
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
dgtm777 pushed a commit that referenced this pull request Mar 18, 2026
Signed-off-by: George Armstrong <georgea@nvidia.com>
Signed-off-by: dgitman <dgitman@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant