Skip to content

Conversation

@asamal4
Copy link
Contributor

@asamal4 asamal4 commented Aug 12, 2025

Modified data model/arguments required for multi-eval

Summary by CodeRabbit

  • New Features

    • Added a CLI option to choose the evaluation endpoint type (streaming or query), defaulting to streaming, while keeping a separate option for specifying the agent provider.
  • Tests

    • Updated several evaluation configurations to use a standardized list-based eval_types format (e.g., response_eval:accuracy) for multiple conversation test cases.

@openshift-ci openshift-ci bot requested review from carbonin and omertuc August 12, 2025 00:10
@coderabbitai
Copy link

coderabbitai bot commented Aug 12, 2025

Walkthrough

Adds a new CLI option --endpoint_type to test/evals/eval.py (choices: streaming, query; default streaming) while --agent_provider is reintroduced as well; --endpoint_type is not used elsewhere. Updates six entries in test/evals/eval_data.yaml to replace scalar eval_type with list-based eval_types: [response_eval:accuracy]. No public API changes.

Changes

Cohort / File(s) Summary of Changes
CLI argument parsing
test/evals/eval.py
Added --endpoint_type (choices: streaming, query; default: streaming); --agent_provider block was moved/re-added so both options exist. Minor formatting/whitespace tweaks. endpoint_type is not referenced elsewhere.
Evaluation data updates
test/evals/eval_data.yaml
Six conversation entries (basic_introduction_conv, basic_cluster_request_conv, list_versions_conv, available_operators_conv, sno_requirements_conv, multinode_requirements_conv) changed eval_type scalar to eval_types: [response_eval:accuracy]. Other fields unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Suggested labels

approved, lgtm

Suggested reviewers

  • carbonin
  • eranco74
  • omertuc
  • jhernand
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@asamal4
Copy link
Contributor Author

asamal4 commented Aug 12, 2025

/hold
waiting for lightspeed-core/lightspeed-evaluation#24

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0038bbb and b510403.

📒 Files selected for processing (2)
  • test/evals/eval.py (1 hunks)
  • test/evals/eval_data.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
🔇 Additional comments (3)
test/evals/eval_data.yaml (1)

5-5: LGTM! Consistent data model change for multi-eval support.

The transformation from scalar eval_type: judge-llm to list-based eval_type: [response_eval:accuracy] across all six evaluation entries is consistent and aligns with the PR objective to support multi-eval functionality. This change maintains backward compatibility by using a single-item list format.

Also applies to: 12-12, 19-19, 26-26, 33-33, 40-40

test/evals/eval.py (2)

24-24: LGTM! Consistent formatting improvements.

The addition of trailing commas after help strings and consistent formatting enhances code maintainability and follows Python best practices.

Also applies to: 28-29, 34-35, 42-42, 46-46, 51-51, 57-58, 63-64, 69-70, 75-76


44-46: Agent provider argument maintained alongside new endpoint type.

Both --agent_provider and --endpoint_type arguments are now available, which provides flexibility for configuring different aspects of the evaluation system. The separation of concerns is appropriate.

@asamal4 asamal4 force-pushed the data-model-change-for-multieval branch from b510403 to fc03a56 Compare August 13, 2025 17:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
test/evals/eval.py (1)

37-42: Unused CLI option --endpoint_type (still not wired up).

This flag is defined but not consumed in this file, and prior review found no usage elsewhere in the repo. If AgentGoalEval doesn't read it, this is a dead CLI option that can confuse users.

Action one of:

  • Remove the flag until the underlying evaluation code supports it.
  • Or ensure it’s actually consumed (either by AgentGoalEval or by branching here).

Option A — remove the unused flag:

-    parser.add_argument(
-        "--endpoint_type",
-        choices=["streaming", "query"],
-        default="streaming",
-        help="Endpoint type to use for agent queries (default: streaming)",
-    )

Option B — keep the flag but confirm it is used by downstream code. Run this to verify usage across the repo:

#!/bin/bash
# Verify whether endpoint_type is referenced anywhere in the repo
rg -n -A3 -B3 '\bendpoint_type\b'
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b510403 and fc03a56.

📒 Files selected for processing (2)
  • test/evals/eval.py (1 hunks)
  • test/evals/eval_data.yaml (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/evals/eval_data.yaml
🔇 Additional comments (2)
test/evals/eval.py (2)

25-30: LGTM on help text improvements for CLI options.

Help strings are clear and consistent. Defaults are documented inline, which improves UX.

Also applies to: 31-36, 51-52, 57-58, 63-64, 69-70, 75-76


24-24: No-op formatting change.

Blank line is fine; no functional impact.

@asamal4
Copy link
Contributor Author

asamal4 commented Aug 13, 2025

/unhold
@eranco74 @carbonin PTAL.. LCORE PR is merged now.

@andrej1991
Copy link
Collaborator

/lgtm

@andrej1991
Copy link
Collaborator

/approve

Copy link
Collaborator

@eranco74 eranco74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@eranco74
Copy link
Collaborator

/approve

@openshift-ci
Copy link

openshift-ci bot commented Aug 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andrej1991, asamal4, eranco74

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 5ca55bd into rh-ecosystem-edge:main Aug 14, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants