WIP: evals for api review command #2606

theobarberbany · 2025-11-27T12:59:36Z

This is experimental.

Vibe coded test suite to test our vibe coded commands to review our vibe coded code.

Uses LLM as a judge (claude) to review output of the /api-review command.

Directory Structure

tests/eval/testdata/
├── golden/                     # Base truth tests - single isolated issues
│   ├── missing-optional-doc/
│   │   ├── patch.diff          # Triggers ONLY missing-optional-doc
│   │   └── expected.txt
│   ├── undocumented-enum/
│   │   ├── patch.diff          # Triggers ONLY undocumented-enum
│   │   └── expected.txt
│   ├── missing-featuregate/
│   │   ├── patch.diff          # Triggers ONLY missing-featuregate
│   │   └── expected.txt
│   └── valid-api-change/
│       ├── patch.diff          # Triggers NO issues
│       └── expected.txt
└── integration/                # Complex scenarios - multiple issues
    ├── new-field-all-issues/
    │   ├── patch.diff          # Triggers multiple issues together
    │   └── expected.txt
    └── partial-documentation/
        ├── patch.diff
        └── expected.txt

Test Case Format

patch.diff

Standard git diff format:

diff --git a/config/v1/types.go b/config/v1/types.go
--- a/config/v1/types.go
+++ b/config/v1/types.go
@@ -10,0 +11,5 @@
+// MyField does something
+// +optional
+// +kubebuilder:validation:Enum=Foo;Bar
+MyField string `json:"myField"`

expected.txt

One expected issue per line:

enum values Foo and Bar not documented in comment
optional field does not explain behavior when omitted

Empty file means the API change should pass review with no issues.

Note: Order of issues in expected.txt does not matter. Comparison uses semantic matching, not exact string matching.

openshift-ci-robot · 2025-11-27T12:59:39Z

Pipeline controller notification
This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

openshift-ci · 2025-11-27T12:59:41Z

Hello @theobarberbany! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

coderabbitai · 2025-11-27T12:59:41Z

Important

Review skipped

Auto reviews are limited based on label configuration.

🚫 Excluded labels (none allowed) (1)

do-not-merge/work-in-progress

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2025-11-27T12:59:58Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This builds a basic go test suite that uses claude as a judge to review the output of the /api-review command. See tests/eval/DESIGN.md for more details.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 27, 2025

openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 27, 2025

openshift-ci bot requested review from JoelSpeed and everettraven November 27, 2025 12:59

theobarberbany force-pushed the test-evals branch 2 times, most recently from 99de6d4 to 06bca4c Compare November 27, 2025 13:54

theobarberbany marked this pull request as draft November 27, 2025 14:15

theobarberbany added 2 commits November 27, 2025 15:43

Basic evals for /api-review command

4ea8b69

This builds a basic go test suite that uses claude as a judge to review the output of the /api-review command. See tests/eval/DESIGN.md for more details.

WIP: Refactor + golden/integration + different models

0c769f0

theobarberbany force-pushed the test-evals branch from 06bca4c to 0c769f0 Compare November 27, 2025 16:13

openshift-ci bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: evals for api review command #2606

WIP: evals for api review command #2606

Uh oh!

theobarberbany commented Nov 27, 2025 •

edited

Loading

Uh oh!

openshift-ci-robot commented Nov 27, 2025

Uh oh!

openshift-ci bot commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025 •

edited

Loading

Review skipped

Uh oh!

openshift-ci bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP: evals for api review command #2606

Are you sure you want to change the base?

WIP: evals for api review command #2606

Uh oh!

Conversation

theobarberbany commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This is experimental.

Directory Structure

Test Case Format

patch.diff

expected.txt

Uh oh!

openshift-ci-robot commented Nov 27, 2025

Uh oh!

openshift-ci bot commented Nov 27, 2025

Uh oh!

coderabbitai bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci bot commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

theobarberbany commented Nov 27, 2025 •

edited

Loading

coderabbitai bot commented Nov 27, 2025 •

edited

Loading