-
Notifications
You must be signed in to change notification settings - Fork 21
eval: data model change for multi-eval #117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eval: data model change for multi-eval #117
Conversation
WalkthroughAdds a new CLI option Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Suggested labels
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
/hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
test/evals/eval.py(1 hunks)test/evals/eval_data.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
🔇 Additional comments (3)
test/evals/eval_data.yaml (1)
5-5: LGTM! Consistent data model change for multi-eval support.The transformation from scalar
eval_type: judge-llmto list-basedeval_type: [response_eval:accuracy]across all six evaluation entries is consistent and aligns with the PR objective to support multi-eval functionality. This change maintains backward compatibility by using a single-item list format.Also applies to: 12-12, 19-19, 26-26, 33-33, 40-40
test/evals/eval.py (2)
24-24: LGTM! Consistent formatting improvements.The addition of trailing commas after help strings and consistent formatting enhances code maintainability and follows Python best practices.
Also applies to: 28-29, 34-35, 42-42, 46-46, 51-51, 57-58, 63-64, 69-70, 75-76
44-46: Agent provider argument maintained alongside new endpoint type.Both
--agent_providerand--endpoint_typearguments are now available, which provides flexibility for configuring different aspects of the evaluation system. The separation of concerns is appropriate.
b510403 to
fc03a56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/evals/eval.py (1)
37-42: Unused CLI option --endpoint_type (still not wired up).This flag is defined but not consumed in this file, and prior review found no usage elsewhere in the repo. If
AgentGoalEvaldoesn't read it, this is a dead CLI option that can confuse users.Action one of:
- Remove the flag until the underlying evaluation code supports it.
- Or ensure it’s actually consumed (either by
AgentGoalEvalor by branching here).Option A — remove the unused flag:
- parser.add_argument( - "--endpoint_type", - choices=["streaming", "query"], - default="streaming", - help="Endpoint type to use for agent queries (default: streaming)", - )Option B — keep the flag but confirm it is used by downstream code. Run this to verify usage across the repo:
#!/bin/bash # Verify whether endpoint_type is referenced anywhere in the repo rg -n -A3 -B3 '\bendpoint_type\b'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
test/evals/eval.py(1 hunks)test/evals/eval_data.yaml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- test/evals/eval_data.yaml
🔇 Additional comments (2)
test/evals/eval.py (2)
25-30: LGTM on help text improvements for CLI options.Help strings are clear and consistent. Defaults are documented inline, which improves UX.
Also applies to: 31-36, 51-52, 57-58, 63-64, 69-70, 75-76
24-24: No-op formatting change.Blank line is fine; no functional impact.
|
/lgtm |
|
/approve |
eranco74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andrej1991, asamal4, eranco74 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5ca55bd
into
rh-ecosystem-edge:main
Modified data model/arguments required for multi-eval
Summary by CodeRabbit
New Features
Tests