eval: data model change for multi-eval #117

asamal4 · 2025-08-12T00:10:36Z

Modified data model/arguments required for multi-eval

Summary by CodeRabbit

New Features
- Added a CLI option to choose the evaluation endpoint type (streaming or query), defaulting to streaming, while keeping a separate option for specifying the agent provider.
Tests
- Updated several evaluation configurations to use a standardized list-based eval_types format (e.g., response_eval:accuracy) for multiple conversation test cases.

coderabbitai · 2025-08-12T00:10:43Z

Walkthrough

Adds a new CLI option --endpoint_type to test/evals/eval.py (choices: streaming, query; default streaming) while --agent_provider is reintroduced as well; --endpoint_type is not used elsewhere. Updates six entries in test/evals/eval_data.yaml to replace scalar eval_type with list-based eval_types: [response_eval:accuracy]. No public API changes.

Changes

Cohort / File(s)	Summary of Changes
CLI argument parsing `test/evals/eval.py`	Added `--endpoint_type` (choices: `streaming`, `query`; default: `streaming`); `--agent_provider` block was moved/re-added so both options exist. Minor formatting/whitespace tweaks. `endpoint_type` is not referenced elsewhere.
Evaluation data updates `test/evals/eval_data.yaml`	Six conversation entries (`basic_introduction_conv`, `basic_cluster_request_conv`, `list_versions_conv`, `available_operators_conv`, `sno_requirements_conv`, `multinode_requirements_conv`) changed `eval_type` scalar to `eval_types: [response_eval:accuracy]`. Other fields unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

eval: data structure as per multi-turn datamodel #72: Also modifies test/evals/eval_data.yaml to move toward list-based evaluation entries / conversation grouping.
eval: update script as per latest lsc logic #69: Also changes CLI/argument handling in test/evals/eval.py (argument names/values and eval script behavior).

Suggested labels

approved, lgtm

Suggested reviewers

carbonin
eranco74
omertuc
jhernand

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

asamal4 · 2025-08-12T00:11:49Z

/hold
waiting for lightspeed-core/lightspeed-evaluation#24

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0038bbb and b510403.

📒 Files selected for processing (2)

test/evals/eval.py (1 hunks)
test/evals/eval_data.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (3)

test/evals/eval_data.yaml (1)

5-5: LGTM! Consistent data model change for multi-eval support.

The transformation from scalar eval_type: judge-llm to list-based eval_type: [response_eval:accuracy] across all six evaluation entries is consistent and aligns with the PR objective to support multi-eval functionality. This change maintains backward compatibility by using a single-item list format.

Also applies to: 12-12, 19-19, 26-26, 33-33, 40-40

test/evals/eval.py (2)

24-24: LGTM! Consistent formatting improvements.

The addition of trailing commas after help strings and consistent formatting enhances code maintainability and follows Python best practices.

Also applies to: 28-29, 34-35, 42-42, 46-46, 51-51, 57-58, 63-64, 69-70, 75-76

44-46: Agent provider argument maintained alongside new endpoint type.

Both --agent_provider and --endpoint_type arguments are now available, which provides flexibility for configuring different aspects of the evaluation system. The separation of concerns is appropriate.

test/evals/eval.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/evals/eval.py (1)
37-42: Unused CLI option --endpoint_type (still not wired up).

This flag is defined but not consumed in this file, and prior review found no usage elsewhere in the repo. If AgentGoalEval doesn't read it, this is a dead CLI option that can confuse users.

Action one of:

Remove the flag until the underlying evaluation code supports it.

Or ensure it’s actually consumed (either by AgentGoalEval or by branching here).

Option A — remove the unused flag:
-    parser.add_argument(
-        "--endpoint_type",
-        choices=["streaming", "query"],
-        default="streaming",
-        help="Endpoint type to use for agent queries (default: streaming)",
-    )
Option B — keep the flag but confirm it is used by downstream code. Run this to verify usage across the repo:
#!/bin/bash
# Verify whether endpoint_type is referenced anywhere in the repo
rg -n -A3 -B3 '\bendpoint_type\b'

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b510403 and fc03a56.

📒 Files selected for processing (2)

test/evals/eval.py (1 hunks)
test/evals/eval_data.yaml (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

test/evals/eval_data.yaml

🔇 Additional comments (2)

test/evals/eval.py (2)

25-30: LGTM on help text improvements for CLI options.

Help strings are clear and consistent. Defaults are documented inline, which improves UX.

Also applies to: 31-36, 51-52, 57-58, 63-64, 69-70, 75-76

24-24: No-op formatting change.

Blank line is fine; no functional impact.

test/evals/eval.py

asamal4 · 2025-08-13T17:22:16Z

/unhold
@eranco74 @carbonin PTAL.. LCORE PR is merged now.

andrej1991 · 2025-08-14T06:00:28Z

/lgtm

andrej1991 · 2025-08-14T06:00:40Z

/approve

eranco74

/lgtm

eranco74 · 2025-08-14T06:40:54Z

/approve

openshift-ci · 2025-08-14T06:40:59Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrej1991, asamal4, eranco74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [eranco74]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot requested review from carbonin and omertuc August 12, 2025 00:10

openshift-ci bot added the do-not-merge/hold label Aug 12, 2025

coderabbitai bot reviewed Aug 12, 2025

View reviewed changes

test/evals/eval.py Show resolved Hide resolved

eval: data model change for multi-eval

fc03a56

asamal4 force-pushed the data-model-change-for-multieval branch from b510403 to fc03a56 Compare August 13, 2025 17:16

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

test/evals/eval.py Show resolved Hide resolved

openshift-ci bot removed the do-not-merge/hold label Aug 13, 2025

openshift-ci bot assigned andrej1991 Aug 14, 2025

openshift-ci bot added the lgtm label Aug 14, 2025

eranco74 reviewed Aug 14, 2025

View reviewed changes

openshift-ci bot assigned eranco74 Aug 14, 2025

openshift-ci bot added the approved label Aug 14, 2025

openshift-merge-bot bot merged commit 5ca55bd into rh-ecosystem-edge:main Aug 14, 2025
5 checks passed

coderabbitai bot mentioned this pull request Aug 18, 2025

Increasing eval test coverage #134

Merged

eval: data model change for multi-eval #117

eval: data model change for multi-eval #117

Uh oh!

Conversation

asamal4 commented Aug 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

asamal4 commented Aug 12, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asamal4 commented Aug 13, 2025

Uh oh!

andrej1991 commented Aug 14, 2025

Uh oh!

andrej1991 commented Aug 14, 2025

Uh oh!

eranco74 left a comment

Choose a reason for hiding this comment

Uh oh!

eranco74 commented Aug 14, 2025

Uh oh!

openshift-ci bot commented Aug 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asamal4 commented Aug 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)