Add ItemMatch + ItemNoMatch Descriptors by jon-bown · Pull Request #1338 · evidentlyai/evidently

jon-bown · 2024-10-06T20:53:11Z

Implement ItemMatch, ItemNoMatch, descriptor + feature and associated unit tests.

Resolves issue #1311
Resolves issue #1310

emeli-dral · 2024-10-07T16:05:51Z

Hi @jon-bown ,
Thanks for the contribution! It looks great, and I believe might need a small adjustment.

I ran an example based on your pytest for ItemMatch descriptor and it failed with TypeError, could you please take a look?

Here is an example:

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import TextEvals
from evidently.descriptors import ItemMatch, ItemNoMatch

data = {
        "generated": [
            "You should consider purchasing Nike or Adidas shoes.",
            "I eat apples, grapes, and oranges",
            "grapes, oranges, apples.",
            "Oranges are more sour than grapes.",
            "This test doesn't have the words.",
        ],
        "forbidden": [
            ["nike", "adidas", "puma"],
            ["grapes", "apples", "oranges"],
            ["Apples", "Oranges", "Grapes"],
            ["orange", "sweet", "grape"],
            ["none", "of", "these"],
        ],
    }

df = pd.DataFrame(data)

report = Report(metrics=[
    TextEvals(column_name="generated", descriptors=[
        ItemMatch(with_column="forbidden", case_sensitive=False)
            ]
        )
])

report.run(reference_data=None, current_data=df)
report

jon-bown · 2024-10-07T17:49:36Z

Hi @jon-bown , Thanks for the contribution! It looks great, and I believe might need a small adjustment.

I ran an example based on your pytest for ItemMatch descriptor and it failed with TypeError, could you please take a look?

Here is an example:

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import TextEvals
from evidently.descriptors import ItemMatch, ItemNoMatch

data = {
        "generated": [
            "You should consider purchasing Nike or Adidas shoes.",
            "I eat apples, grapes, and oranges",
            "grapes, oranges, apples.",
            "Oranges are more sour than grapes.",
            "This test doesn't have the words.",
        ],
        "forbidden": [
            ["nike", "adidas", "puma"],
            ["grapes", "apples", "oranges"],
            ["Apples", "Oranges", "Grapes"],
            ["orange", "sweet", "grape"],
            ["none", "of", "these"],
        ],
    }

df = pd.DataFrame(data)

report = Report(metrics=[
    TextEvals(column_name="generated", descriptors=[
        ItemMatch(with_column="forbidden", case_sensitive=False)
            ]
        )
])

report.run(reference_data=None, current_data=df)
report

Hi @emeli-dral, I was also seeing this in my unit tests but didn't realize it was related to the data definition not handling list types in the input data. I've added some type handling to the data_preprocessing.py file, let me know what you think?

emeli-dral · 2024-10-09T17:57:18Z

First of all, I apologise for the delay.
You’ve correctly identified the issue: the improper handling of the list type in the current implementation. After reviewing this, we’ve decided that full support for the list type in the data definition will be handled separately, so there’s no need for you to implement that in this PR.

To resolve this, I suggest the following changes:

1/ Change from List to Tuple:
Please modify the items from being a list to a tuple. Since tuple is immutable, this will help ensure stability when dealing with feature data types.

2/ Revert Changes in data_preprocessing.py:
Revert any changes made to data_preprocessing.py related to list handling, as they will no longer be necessary after switching to tuples.

Thus we will expect users to explicitly create an item as a tuple rather than a list, which is a reasonable requirement. I'll make sure that this expectation is clearly documented for users.

Many thanks for your work on this!

jon-bown · 2024-10-09T20:46:46Z

First of all, I apologise for the delay. You’ve correctly identified the issue: the improper handling of the list type in the current implementation. After reviewing this, we’ve decided that full support for the list type in the data definition will be handled separately, so there’s no need for you to implement that in this PR.

To resolve this, I suggest the following changes:

1/ Change from List to Tuple: Please modify the items from being a list to a tuple. Since tuple is immutable, this will help ensure stability when dealing with feature data types.

2/ Revert Changes in data_preprocessing.py: Revert any changes made to data_preprocessing.py related to list handling, as they will no longer be necessary after switching to tuples.

Thus we will expect users to explicitly create an item as a tuple rather than a list, which is a reasonable requirement. I'll make sure that this expectation is clearly documented for users.

Many thanks for your work on this!

I've reverted the data_preprocessing.py file. As far as the changing from list to tuple, do you want me to modify anything in my feature implementation for this? The error previously was being thrown before the feature was called, so even if I did modify something that error would still be thrown unless they do what I'm doing in the unit tests.

emeli-dral · 2024-10-11T13:10:15Z

Hi @jon-bown,

I took a look at the error, and luckily it’s a very minor issue!

The error occurs during the generation of the display_name in the following section of the code:

def _as_column(self) -> ColumnName:
    160     return self._create_column(
    161         self._feature_column_name(),
--> 162         default_display_name=f"Text Contains of {self.mode} [{', '.join(self.items)}] for {self.column_name}",
    163     )

The issue arises with {' , '.join(self.items)}, where it’s assumed that self.items is a class variable, but in this case, the items are actually in the second column.

To resolve this, I recommend simplifying the default_display_name to something like "Text contains defined items" and "Text does not contain defined items".

Additionally, I think the same fix could work for the other issue. How about using "Text contains defined words" and "Text does not contain defined words"?

Once these changes are made, we should be good to merge your PRs. Feel free to also add the new descriptors to the list of descriptors under the Text Evals heading on the documentation page.

Thanks again for your contributions!

jon-bown · 2024-10-12T04:01:05Z

Hi @emeli-dral

Thanks for catching that, I've updated the default_display_name in the _as_column function. Is there anything you think I should change for the _feature_column_name function? I went off of the other two features for the style, but let me know if what I have is ok! I will work on updating the docs for these features.

emeli-dral · 2024-10-14T15:42:34Z

Hi @jon-bown,

Thanks a ton for all your hard work on this PR! You've done an awesome job, and your efforts have really paid off. I appreciate the attention to detail and how smoothly you handled everything.

Excited to merge this in!

elenasamuylova · 2024-10-14T16:01:03Z

Thanks for the contribution @jon-bown ! 🚀

jon-bown added 2 commits October 6, 2024 14:40

initial commit

5bd46c7

add unit tests

4e86c7b

Add nunique handler for list type

151f953

jon-bown added 2 commits October 9, 2024 14:39

revert data_preprocessing change

99041fa

Update unit tests to convert to tuples first

53ec168

Revised default display name for generated column

cad734f

jon-bown added 2 commits October 12, 2024 13:58

Add ItemMatch/NoMatch to docs

c3e9925

Clarify mode parameter in docs

9795d13

emeli-dral merged commit 634568b into evidentlyai:main Oct 14, 2024

emeli-dral added the hacktoberfest Accepted contributions will count towards your hacktoberfest PRs label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ItemMatch + ItemNoMatch Descriptors#1338

Add ItemMatch + ItemNoMatch Descriptors#1338
emeli-dral merged 8 commits intoevidentlyai:mainfrom
jon-bown:1311/jonbown/item-match

jon-bown commented Oct 6, 2024 •

edited

Loading

Uh oh!

emeli-dral commented Oct 7, 2024

Uh oh!

jon-bown commented Oct 7, 2024 •

edited

Loading

Uh oh!

emeli-dral commented Oct 9, 2024

Uh oh!

jon-bown commented Oct 9, 2024

Uh oh!

emeli-dral commented Oct 11, 2024 •

edited

Loading

Uh oh!

jon-bown commented Oct 12, 2024 •

edited

Loading

Uh oh!

emeli-dral commented Oct 14, 2024

Uh oh!

elenasamuylova commented Oct 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jon-bown commented Oct 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emeli-dral commented Oct 7, 2024

Uh oh!

jon-bown commented Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emeli-dral commented Oct 9, 2024

Uh oh!

jon-bown commented Oct 9, 2024

Uh oh!

emeli-dral commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jon-bown commented Oct 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emeli-dral commented Oct 14, 2024

Uh oh!

elenasamuylova commented Oct 14, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jon-bown commented Oct 6, 2024 •

edited

Loading

jon-bown commented Oct 7, 2024 •

edited

Loading

emeli-dral commented Oct 11, 2024 •

edited

Loading

jon-bown commented Oct 12, 2024 •

edited

Loading