-
Notifications
You must be signed in to change notification settings - Fork 359
Add automatic tests for metrics #939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…/lighteval into nathan-add-tests-for-metrics
…/lighteval into nathan-add-tests-for-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds an automated testing framework for LightEval metrics to ensure their reliability and correctness. The automated testing system allows developers to define test cases with input/output pairs in JSON files, which are then automatically validated against metric implementations.
- Creates an automated testing framework for metrics with JSON-based test case definitions
- Moves existing unit tests from
tests/tasks
totests/unit/tasks
folder for better organization - Fixes broken metrics by correcting function signatures and implementing missing dependencies
Reviewed Changes
Copilot reviewed 57 out of 82 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
tests/unit/tasks/test_registry.py | Updates custom task import paths to reflect new unit test folder structure |
tests/unit/metrics/test_metrics_automated.py | Core automated testing framework implementation with test case models and execution logic |
tests/unit/metrics/test_automated_metrics_pytest.py | Pytest integration for the automated testing framework |
tests/unit/metrics/pytest.ini | Pytest configuration for metric testing |
tests/unit/metrics/test_cases/*.json | JSON test case files for various metrics (stored in Git LFS) |
src/lighteval/metrics/metrics_sample.py | Fixes broken metric implementations including function signatures and NLTK dependencies |
src/lighteval/metrics/metrics_corpus.py | Adds NLTK download and fixes F1 score calculation |
src/lighteval/metrics/metrics.py | Corrects F1 score averaging parameter |
src/lighteval/metrics/imports/summac.py | Removes deprecated tokenizer parameter |
src/lighteval/tasks/extended/lcb/main.py | Adds missing batched_compute parameter |
pyproject.toml | Updates test dependencies |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a doc section in Adding a new metric to explain how to create a test file. A bunch of nits.
But overall, cool new mechanism, with clearer login! GG
@@ -1 +1,2 @@ | |||
*.json filter=lfs diff=lfs merge=lfs -text | |||
tests/unit/metrics/test_cases/*.json -filter -diff -merge text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use git-lfs for json files in the test_cases dir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 59 out of 84 changed files in this pull request and generated 4 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
if metric_params != {}: | ||
metric = self.METRIC_CLASSES[metric_class].value | ||
metric_enum_value = copy.deepcopy(metric)(metric_params) | ||
else: | ||
metric_enum_value = self.METRIC_CLASSES[metric_class].value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 126 should access the metric enum, not the value. It should be metric = self.METRIC_CLASSES[metric_class]
without .value
, then call metric.value
on line 127.
Copilot uses AI. Check for mistakes.
- Adds mechanism to auto test metric. When creating a metric you now create a json file with test cases (input, output and expected results). - move unit test to a tests/unit folder. - fix broken metrics --------- Co-authored-by: Copilot <[email protected]>
tests/unit
folder.