-
Notifications
You must be signed in to change notification settings - Fork 359
Add automatic tests for metrics #939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
d8cfc2e
Fixe Sampling Metrics and Evals
NathanHB 7ae5da5
remove breakpoint
NathanHB a00f3c0
add auto tests for metrics
NathanHB a892260
Merge branch 'main' into nathan-add-tests-for-metrics
NathanHB bf25211
Delete tests/unit/metrics/test_cases/README.md
NathanHB 2b65d08
Delete tests/unit/metrics/test_unit_harness_metrics.py
NathanHB 594b942
add pip as test dependency, for spacy to work correctly
NathanHB 6db8263
Merge branch 'nathan-add-tests-for-metrics' of github.com:huggingface…
NathanHB 9f7c2be
fix tests and reorg files
NathanHB e1a55ac
fix tests and reorg files
NathanHB c9e7243
better tests, passing
NathanHB e493b49
Merge remote-tracking branch 'origin/main' into nathan-add-tests-for-…
NathanHB 5f323b7
Merge remote-tracking branch 'origin/main' into nathan-add-tests-for-…
NathanHB 3d7b448
fix tests
NathanHB 0c4a554
fix faithfullness metric
NathanHB 594c269
adds corpus level metric testing
NathanHB fc01e6b
fix bleu metric
NathanHB c574035
fix bleu metric
NathanHB e127955
Merge branch 'main' into nathan-add-tests-for-metrics
NathanHB 51db828
fix tests after merge
NathanHB 70a5a10
Delete tests/slow_tests/test_sglang_model.py
NathanHB 6384835
test simpleqa judge
NathanHB 3c9aec6
Merge branch 'nathan-add-tests-for-metrics' of github.com:huggingface…
NathanHB b5b82a8
fix avg at k
NathanHB bf740a3
remove test files from git lfs cache
NathanHB ef216dc
re-add test-files to actual repo
NathanHB f903ee0
use SKIPPED_METRIC list instead of hardcoding all metric names
NathanHB 86892e9
Merge remote-tracking branch 'origin/main' into nathan-add-tests-for-…
NathanHB 23e9714
Update tests/unit/metrics/test_metrics_automated.py
NathanHB 048b407
fix tests
NathanHB c4aebce
remove breakpoint
NathanHB 432345e
remove breakpoint
NathanHB dab1dae
Merge branch 'main' into nathan-add-tests-for-metrics
NathanHB fd27034
fix quality
NathanHB File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,2 @@ | ||
*.json filter=lfs diff=lfs merge=lfs -text | ||
tests/unit/metrics/test_cases/*.json -filter -diff -merge text | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
[tool:pytest] | ||
testpaths = . | ||
python_files = test_*.py | ||
python_classes = Test* | ||
python_functions = test_* | ||
addopts = | ||
-v | ||
--tb=short | ||
--strict-markers | ||
--disable-warnings | ||
markers = | ||
slow: marks tests as slow (deselect with '-m "not slow"') | ||
unit: marks tests as unit tests | ||
integration: marks tests as integration tests | ||
automated: marks tests as automated metric tests | ||
filterwarnings = | ||
ignore::DeprecationWarning | ||
ignore::PendingDeprecationWarning |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# MIT License | ||
|
||
# Copyright (c) 2024 The HuggingFace Team | ||
|
||
# Permission is hereby granted, free of charge, to any person obtaining a copy | ||
# of this software and associated documentation files (the "Software"), to deal | ||
# in the Software without restriction, including without limitation the rights | ||
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
# copies of the Software, and to permit persons to whom the Software is | ||
# furnished to do so, subject to the following conditions: | ||
|
||
# The above copyright notice and this permission notice shall be included in all | ||
# copies or substantial portions of the Software. | ||
|
||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | ||
# SOFTWARE. | ||
|
||
""" | ||
Pytest integration for the automated metric testing framework. | ||
|
||
This module provides pytest fixtures and test functions that can load and run | ||
test cases from JSON files. | ||
""" | ||
|
||
import json | ||
from pathlib import Path | ||
from typing import List | ||
|
||
import pytest | ||
from test_metrics_automated import AutomatedMetricTester, MetricTestSuite | ||
|
||
|
||
@pytest.fixture | ||
def metric_tester(): | ||
"""Fixture providing an AutomatedMetricTester instance.""" | ||
return AutomatedMetricTester() | ||
|
||
|
||
def load_test_suite_from_file(file_path: str) -> MetricTestSuite: | ||
"""Load a test suite from a JSON file.""" | ||
with open(file_path, "r") as f: | ||
data = json.load(f) | ||
return MetricTestSuite(**data) | ||
|
||
|
||
def get_test_suite_files() -> List[str]: | ||
"""Get all test suite JSON files from the test_cases directory.""" | ||
test_cases_dir = Path(__file__).parent / "test_cases" | ||
if not test_cases_dir.exists(): | ||
return [] | ||
|
||
json_files = list(test_cases_dir.glob("*.json")) | ||
return [str(f) for f in json_files] | ||
|
||
|
||
def parametrize_test_suites(): | ||
"""Create parametrized test cases for all test suite files.""" | ||
test_files = get_test_suite_files() | ||
if not test_files: | ||
pytest.skip("No test suite files found") | ||
|
||
return test_files | ||
|
||
|
||
class TestAutomatedMetrics: | ||
"""Test class for automated metric testing with pytest.""" | ||
|
||
@pytest.mark.parametrize("test_file", parametrize_test_suites()) | ||
def test_metric_suite(self, metric_tester, test_file): | ||
"""Test a complete metric test suite from a JSON file.""" | ||
test_suite = load_test_suite_from_file(test_file) | ||
|
||
# Run all test cases in the suite | ||
results = metric_tester.run_test_suite(test_suite) | ||
|
||
# Separate failed tests from skipped tests | ||
failed_tests = [r for r in results if not r["success"] and not r.get("skipped", False)] | ||
skipped_tests = [r for r in results if r.get("skipped", False)] | ||
|
||
if failed_tests: | ||
# Create detailed error message | ||
error_msg = f"Test suite '{test_suite.name}' failed with {len(failed_tests)} failed tests:\n" | ||
for result in failed_tests: | ||
error_msg += f"\n - {result['test_case']}: " | ||
if result["error"]: | ||
error_msg += f"Error: {result['error']}" | ||
else: | ||
error_msg += f"Expected {result['expected']}, got {result['actual']}" | ||
|
||
pytest.fail(error_msg) | ||
|
||
# Log skipped tests | ||
if skipped_tests: | ||
print(f"\nSkipped {len(skipped_tests)} tests in '{test_suite.name}':") | ||
for result in skipped_tests: | ||
print(f" - {result['test_case']}: {result.get('skip_reason', 'Unknown reason')}") | ||
|
||
# All non-skipped tests passed | ||
assert len(failed_tests) == 0, f"Expected all non-skipped tests to pass, but {len(failed_tests)} failed" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not use git-lfs for json files in the test_cases dir