-
Notifications
You must be signed in to change notification settings - Fork 21
eval: update script as per latest lsc logic #55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe script was refactored to consolidate evaluation execution and result reporting. Manual iteration and detailed per-evaluation output were removed. Now, a single method handles all evaluations, and a summary is printed. Some argument defaults were updated, and logging configuration was slightly reformatted. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Hi @asamal4. Thanks for your PR. I'm waiting for a rh-ecosystem-edge member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
/ok-to-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
test/evals/eval.py (1)
21-30: Consider making configuration more flexible.The argument configuration is appropriate for the evaluation setup and aligns with the latest lsc logic. However, consider making these values configurable through command-line arguments or environment variables for better flexibility across different environments.
For example, you could add command-line argument parsing:
parser = argparse.ArgumentParser(description="Agent goal evaluation") parser.add_argument("--eval-data-yaml", default="assisted_eval_data.yaml") parser.add_argument("--agent-endpoint", default="http://localhost:8090") parser.add_argument("--agent-provider", default="gemini") # ... other arguments args = parser.parse_args()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
test/evals/eval.py(1 hunks)
🔇 Additional comments (3)
test/evals/eval.py (3)
1-6: LGTM! Clean docstring and imports.The updated docstring clearly describes the script's purpose, and the import statements are appropriate for the functionality.
10-17: Well-configured logging setup.The logging configuration appropriately sets a WARNING level by default to reduce noise while enabling INFO level for the specific
lsc_agent_evallogger to track evaluation progress. The format includes all necessary details for debugging.
32-44: Excellent refactoring with clear result handling.The use of
run_evaluation()andget_result_summary()greatly simplifies the workflow while preserving precise success/failure reporting, and the exit codes plus emojis improve usability.Because
AgentGoalEvaland itsget_result_summary()method are imported from the externallsc_agent_evallibrary (not defined locally), please manually verify that:
get_result_summary()returns a dictionary.- The dictionary contains numeric entries for the keys
"FAIL"and"ERROR".
|
@asamal4: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/hold |
|
/unhold |
|
@carbonin PTAL |
|
/ok-to-test |
|
Thanks for doing this @asamal4 I tested it out just to be sure. The output now looks like this (with some artificial failures added for examples): If there's no way to prevent writing the results dir can you add it to .gitignore? |
@carbonin I updated the folder name, and added it to gitignore. and removed output from readme. |
|
/ok-to-test |
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: asamal4, carbonin The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/ok-to-test |
|
/retest |
|
/ok-to-test |
|
/lgtm |
|
/ok-to-test |
|
/close |
|
@asamal4: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Summary by CodeRabbit
Summary by CodeRabbit
.gitignoreto exclude evaluation output directories, keeping repositories clean.