Skip to content

Conversation

NathanHB
Copy link
Member

@NathanHB NathanHB commented Aug 27, 2025

  • Adds mechanism to auto test metric. When creating a metric you now create a json file with test cases (input, output and expected results).
  • move unit test to a tests/unit folder.
  • fix broken metrics

@HuggingFaceDocBuilderDev
Copy link
Collaborator

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@NathanHB NathanHB marked this pull request as draft August 27, 2025 14:17
@NathanHB NathanHB marked this pull request as ready for review September 8, 2025 13:28
@NathanHB NathanHB requested review from Copilot and clefourrier and removed request for Copilot September 8, 2025 13:47
@NathanHB NathanHB requested a review from Copilot September 8, 2025 13:47
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds an automated testing framework for LightEval metrics to ensure their reliability and correctness. The automated testing system allows developers to define test cases with input/output pairs in JSON files, which are then automatically validated against metric implementations.

  • Creates an automated testing framework for metrics with JSON-based test case definitions
  • Moves existing unit tests from tests/tasks to tests/unit/tasks folder for better organization
  • Fixes broken metrics by correcting function signatures and implementing missing dependencies

Reviewed Changes

Copilot reviewed 57 out of 82 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/unit/tasks/test_registry.py Updates custom task import paths to reflect new unit test folder structure
tests/unit/metrics/test_metrics_automated.py Core automated testing framework implementation with test case models and execution logic
tests/unit/metrics/test_automated_metrics_pytest.py Pytest integration for the automated testing framework
tests/unit/metrics/pytest.ini Pytest configuration for metric testing
tests/unit/metrics/test_cases/*.json JSON test case files for various metrics (stored in Git LFS)
src/lighteval/metrics/metrics_sample.py Fixes broken metric implementations including function signatures and NLTK dependencies
src/lighteval/metrics/metrics_corpus.py Adds NLTK download and fixes F1 score calculation
src/lighteval/metrics/metrics.py Corrects F1 score averaging parameter
src/lighteval/metrics/imports/summac.py Removes deprecated tokenizer parameter
src/lighteval/tasks/extended/lcb/main.py Adds missing batched_compute parameter
pyproject.toml Updates test dependencies

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@clefourrier clefourrier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a doc section in Adding a new metric to explain how to create a test file. A bunch of nits.

But overall, cool new mechanism, with clearer login! GG

@NathanHB NathanHB self-assigned this Sep 15, 2025
@@ -1 +1,2 @@
*.json filter=lfs diff=lfs merge=lfs -text
tests/unit/metrics/test_cases/*.json -filter -diff -merge text
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not use git-lfs for json files in the test_cases dir

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 59 out of 84 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines 125 to 129
if metric_params != {}:
metric = self.METRIC_CLASSES[metric_class].value
metric_enum_value = copy.deepcopy(metric)(metric_params)
else:
metric_enum_value = self.METRIC_CLASSES[metric_class].value
Copy link
Preview

Copilot AI Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 126 should access the metric enum, not the value. It should be metric = self.METRIC_CLASSES[metric_class] without .value, then call metric.value on line 127.

Copilot uses AI. Check for mistakes.

@NathanHB NathanHB changed the title Add auto tests for metrics Add automatic tests for metrics Sep 17, 2025
@NathanHB NathanHB merged commit 16318bb into main Sep 17, 2025
5 checks passed
NathanHB added a commit that referenced this pull request Sep 19, 2025
- Adds mechanism to auto test metric. When creating a metric you now create a json file with test cases (input, output and expected results).
- move unit test to a tests/unit folder.
- fix broken metrics

---------

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants