Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,9 +115,7 @@ jobs:
- name: Install minimal dependencies
run: pip install -r requirements.min.txt
- name: Install package
run: pip install -e .[dev,spark,fsspec]
- name: Run pip-audit
run: pip-audit --ignore-vuln PYSEC-2024-48 --ignore-vuln GHSA-jw8x-6495-233v --ignore-vuln GHSA-4hq2-rpgc-r8r7
run: pip install -e .[dev,spark,fsspec,llm]
- name: Run Tests
run: python -m pytest --durations=50
test:
Expand Down Expand Up @@ -155,7 +153,7 @@ jobs:
uses: ./.github/share-actions/get-bikes-dataset-cached

- name: Install package
run: pip install -e .[dev,spark,fsspec]
run: pip install -e .[dev,spark,fsspec,llm]
- name: Run Tests
run: python -m pytest --durations=50

Expand All @@ -173,7 +171,7 @@ jobs:
cache: "pip"
cache-dependency-path: setup.py
- name: Install dependencies
run: pip install -e ".[dev]"
run: pip install -e .
- name: Install wheel
run: pip install wheel
- name: Build package
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/ui.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,8 @@ jobs:
uses: ./.github/share-actions/ui-node-pnpm-install

- name: Install Playwright Browsers
working-directory: ui
run: pnpm dlx playwright@1.43.0 install --with-deps
working-directory: ui/service
run: pnpm exec playwright install --with-deps chromium

- name: 🔍 Get bikes dataset cached
uses: ./.github/share-actions/get-bikes-dataset-cached
Expand All @@ -162,7 +162,7 @@ jobs:

- name: Wait UI to be ready to test
working-directory: ui/service
run: pnpm wait-on tcp:127.0.0.1:8000 -t 200000
run: pnpm wait-on tcp:127.0.0.1:8000 -t 4m

- name: Run Service Playwright tests
working-directory: ui/service
Expand Down
3 changes: 3 additions & 0 deletions docs/book/reference/all-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -272,6 +272,9 @@ Check for regular expression matches.
| **DoesNotContain()** <ul><li>Checks if the text does not contain any or all specified items. </li><li> Returns True/False for every input. </li></ul> Example use:<br> `DoesNotContain(items=["as a large language model"]` | **Required:** <br> `items: List[str]` <br><br>**Optional:**<ul><li>`display_name`</li><li>`mode = 'all'` or `'any'`</li><li>`case_sensitive = True` or `False`</li></ul> |
| **IncludesWords()** <ul><li> Checks if the text includes **any** (default) or **all** specified words. </li><li> Considers only vocabulary words (from NLTK vocabulary). </li><li> By default, considers inflected and variant forms of the same word. </li><li> Returns True/False for every input. </li></ul> Example use:<br> `IncludesWords(words_list=['booking', 'hotel', 'flight']` | **Required:** <br> `words_list: List[str]` <br><br>**Optional:**<ul><li>`display_name`</li><li>`mode = 'any'` or `'all'`</li><li>`lemmatize = True` or `False`</li></ul> |
| **ExcludesWords()** <ul><li>Checks if the text excludes all specified words.</li><li> Considers only vocabulary words (from NLTK vocabulary). </li><li>By default, considers inflected and variant forms of the same word. </li><li>Returns True/False for every input. </li></ul> Example use:<br> `ExcludesWords(words_list=['buy', 'sell', 'bet']`| **Required:** <br>`words_list: List[str]` <br><br>**Optional:**<ul><li>`display_name`</li><li>`mode = 'all'` or `'any'`</li><li>`lemmatize = True` or `False`</li></ul> |
| **ItemMatch()** <ul><li>Checks whether the text contains **any** (default) or **all** specified items that are specific to each row (represented as tuples) </li><li>Returns True/False for each row. </li></ul> Example use:<br> `ItemMatch(with_column="expected")`| **Required:** <br>`with_column: str`<br><br>**Optional:**<ul><li>`display_name`</li><li>`mode = 'all'` or `'any'`</li></li><li>`case_sensitive = True` or `False`</li></ul> |
| **ItemNoMatch()** <ul><li>Checks whether the text excludes **any** (default) or **all** specified items that are specific to each row (represented as tuples) </li><li>Returns True/False for each row. </li></ul> Example use:<br> `ItemMatch(with_column="forbidden")`| **Required:** <br>`with_column: str`<br><br>**Optional:**<ul><li>`display_name`</li><li>`mode = 'all'` or `'any'`</li></li><li>`case_sensitive = True` or `False`</li></ul> |
| **JSONSchemaMatch()** <ul><li>Checks if the text contains a JSON object matching the **expected_schema**. Supports exact (**exact=True**) or minimal (**exact=False**) matching, with optional strict type validation (**validate_types=True**). </li><li>Returns True/False for each row. </li></ul> Example use:<br> `JSONSchemaMatch(expected_schema={"name": str, "age": int}, exact_match=False, validate_types=True)`| **Required:** <br>`expected_schema: Dict[str, type]`<br><br>**Optional:**<ul><li>`exact_match = True` or `False`</li><li>`validate_types = True` or `False`</li></ul> |

## Descriptors: Text stats

Expand Down
66 changes: 66 additions & 0 deletions examples/data_generators.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
from evidently.experimental.dataset_generators.llm.questions import QADatasetFromSeedGenerator, QADatasetGenerator
from evidently.experimental.dataset_generators.llm.index import DataCollectionProvider
from evidently.options.base import Options


def generate_from_file():
file_path = "../cloud_quickstart_tracing.pdf"
data = DataCollectionProvider.from_files(file_path, chunk_size=50, chunk_overlap=20, splitter="simple")

generator = QADatasetGenerator(
data_collection=data,
provider="openai",
model="gpt-4o-mini",
num_questions=5,
options=Options.from_any_options(None)
)
generated = generator.generate()
for _, a in generated.iterrows():
print("Q", a["questions"])
if "answers" in a:
print("A", a["answers"])
if "context" in a:
print("C", a["context"])
print()


def main():
data = DataCollectionProvider.from_chunks(chunks=["I am a banana", "My spoon is too big"])
generator = QADatasetGenerator(
data_collection=data,
provider="openai",
model="gpt-4o-mini",
num_questions=5,
options=Options.from_any_options(None)
)

generated = generator.generate()
for _, a in generated.iterrows():
print("Q", a["questions"])
if "answers" in a:
print("A", a["answers"])
if "context" in a:
print("C", a["context"])
print()

generator = QADatasetFromSeedGenerator(
seed_question="What is 'kek'?",
num_questions=5,
provider="openai",
model="gpt-4o-mini",
options=Options.from_any_options(None)
)

generated = generator.generate()
for _, a in generated.iterrows():
print("Q", a["questions"])
if "answers" in a:
print("A", a["answers"])
if "context" in a:
print("C", a["context"])
print()


if __name__ == '__main__':
main()
# generate_from_file()
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Evidently Dataset ROUGE Summary Metric"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"from evidently.report import Report\n",
"from evidently.metrics import ROUGESummaryMetric"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"current_data = {\n",
" \"summary\": [\"hello there\", \"general kenobi\"],\n",
"}\n",
"\n",
"current_df = pd.DataFrame(current_data)\n",
"\n",
"reference_data = {\n",
" \"summary\": [\"hello there\", \"no de\"]\n",
"}\n",
"\n",
"current_df = pd.DataFrame(current_data)\n",
"reference_df = pd.DataFrame(reference_data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report = Report(metrics=[\n",
" ROUGESummaryMetric(column_name=\"summary\", rouge_n=2)\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report.run(current_data=current_df, reference_data=reference_df)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report.as_dict()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"report.as_dataframe()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.19"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
"version": "3.8.19"
}
},
"nbformat": 4,
Expand Down
1 change: 1 addition & 0 deletions requirements.dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ pip-audit
pyspark
ruff==0.3.7
pre-commit==3.5.0
evaluate==0.4.1

# service dependencies
litestar>=2.7.1
Expand Down
2 changes: 2 additions & 0 deletions requirements.min.txt
Original file line number Diff line number Diff line change
Expand Up @@ -31,3 +31,5 @@ openai==1.16.2
evaluate==0.4.1
transformers[torch]==4.39.3
sentence-transformers==2.7.0
rouge-score==0.1.2
chromadb==0.4.0
9 changes: 9 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,15 @@ ignore_missing_imports = True
[mypy-litellm.*]
ignore_missing_imports = True

[mypy-chromadb.*]
ignore_missing_imports = True

[mypy-llama_index.*]
ignore_missing_imports = True

[mypy-pypdf.*]
ignore_missing_imports = True

[tool:pytest]
testpaths=tests
python_classes=*Test
Expand Down
4 changes: 4 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
"deprecation>=2.1.0",
"uuid6>=2024.7.10",
"cryptography>=43.0.1",
"evaluate>=0.4.1",
],
extras_require={
"dev": [
Expand All @@ -96,12 +97,15 @@
"ruff==0.3.7",
"pre-commit==3.5.0",
"pytest-asyncio==0.23.7",
"evaluate>=0.4.1",
],
"llm": [
"openai>=1.16.2",
"evaluate>=0.4.1",
"transformers[torch]>=4.39.3",
"sentence-transformers>=2.7.0",
"rouge-score>=0.1.2",
"chromadb>=0.4.0",
],
"spark": ["pyspark>=3.4.0"],
"fsspec": [
Expand Down
6 changes: 6 additions & 0 deletions src/evidently/descriptors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from .custom_descriptor import CustomPairColumnEval
from .hf_descriptor import HuggingFaceModel
from .hf_descriptor import HuggingFaceToxicityModel
from .json_schema_match_descriptor import JSONSchemaMatch
from .llm_judges import BiasLLMEval
from .llm_judges import ContextQualityLLMEval
from .llm_judges import DeclineLLMEval
Expand All @@ -19,6 +20,8 @@
from .sentiment_descriptor import Sentiment
from .text_contains_descriptor import Contains
from .text_contains_descriptor import DoesNotContain
from .text_contains_descriptor import ItemMatch
from .text_contains_descriptor import ItemNoMatch
from .text_length_descriptor import TextLength
from .text_part_descriptor import BeginsWith
from .text_part_descriptor import EndsWith
Expand Down Expand Up @@ -47,6 +50,8 @@
"EndsWith",
"DoesNotContain",
"IncludesWords",
"ItemMatch",
"ItemNoMatch",
"ExcludesWords",
"TextLength",
"TriggerWordsPresence",
Expand All @@ -55,5 +60,6 @@
"SentenceCount",
"Sentiment",
"RegExp",
"JSONSchemaMatch",
"_registry",
]
11 changes: 11 additions & 0 deletions src/evidently/descriptors/_registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@
"evidently.descriptors.hf_descriptor.HuggingFaceToxicityModel",
"evidently:descriptor:HuggingFaceToxicityModel",
)
register_type_alias(
FeatureDescriptor,
"evidently.descriptors.json_schema_match_descriptor.JSONSchemaMatch",
"evidently:descriptor:JSONSchemaMatch",
)
register_type_alias(
FeatureDescriptor, "evidently.descriptors.llm_judges.BiasLLMEval", "evidently:descriptor:BiasLLMEval"
)
Expand Down Expand Up @@ -72,6 +77,12 @@
"evidently.descriptors.text_contains_descriptor.DoesNotContain",
"evidently:descriptor:DoesNotContain",
)
register_type_alias(
FeatureDescriptor, "evidently.descriptors.text_contains_descriptor.ItemMatch", "evidently:descriptor:ItemMatch"
)
register_type_alias(
FeatureDescriptor, "evidently.descriptors.text_contains_descriptor.ItemNoMatch", "evidently:descriptor:ItemNoMatch"
)
register_type_alias(
FeatureDescriptor, "evidently.descriptors.text_length_descriptor.TextLength", "evidently:descriptor:TextLength"
)
Expand Down
23 changes: 23 additions & 0 deletions src/evidently/descriptors/json_schema_match_descriptor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from typing import Dict

from evidently.features import json_schema_match_feature
from evidently.features.generated_features import FeatureDescriptor
from evidently.features.generated_features import GeneratedFeature


class JSONSchemaMatch(FeatureDescriptor):
class Config:
type_alias = "evidently:descriptor:JSONSchemaMatch"

expected_schema: Dict[str, type]
validate_types: bool = False
exact_match: bool = False

def feature(self, column_name: str) -> GeneratedFeature:
return json_schema_match_feature.JSONSchemaMatch(
column_name=column_name,
expected_schema=self.expected_schema,
validate_types=self.validate_types,
exact_match=self.exact_match,
display_name=self.display_name,
)
Loading