Skip to content

Add ItemMatch + ItemNoMatch Descriptors#1338

Merged
emeli-dral merged 8 commits intoevidentlyai:mainfrom
jon-bown:1311/jonbown/item-match
Oct 14, 2024
Merged

Add ItemMatch + ItemNoMatch Descriptors#1338
emeli-dral merged 8 commits intoevidentlyai:mainfrom
jon-bown:1311/jonbown/item-match

Conversation

@jon-bown
Copy link
Contributor

@jon-bown jon-bown commented Oct 6, 2024

Implement ItemMatch, ItemNoMatch, descriptor + feature and associated unit tests.

Resolves issue #1311
Resolves issue #1310

@emeli-dral
Copy link
Contributor

Hi @jon-bown ,
Thanks for the contribution! It looks great, and I believe might need a small adjustment.

I ran an example based on your pytest for ItemMatch descriptor and it failed with TypeError, could you please take a look?

Here is an example:

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import TextEvals
from evidently.descriptors import ItemMatch, ItemNoMatch

data = {
        "generated": [
            "You should consider purchasing Nike or Adidas shoes.",
            "I eat apples, grapes, and oranges",
            "grapes, oranges, apples.",
            "Oranges are more sour than grapes.",
            "This test doesn't have the words.",
        ],
        "forbidden": [
            ["nike", "adidas", "puma"],
            ["grapes", "apples", "oranges"],
            ["Apples", "Oranges", "Grapes"],
            ["orange", "sweet", "grape"],
            ["none", "of", "these"],
        ],
    }

df = pd.DataFrame(data)

report = Report(metrics=[
    TextEvals(column_name="generated", descriptors=[
        ItemMatch(with_column="forbidden", case_sensitive=False)
            ]
        )
])

report.run(reference_data=None, current_data=df)
report

@jon-bown
Copy link
Contributor Author

jon-bown commented Oct 7, 2024

Hi @jon-bown , Thanks for the contribution! It looks great, and I believe might need a small adjustment.

I ran an example based on your pytest for ItemMatch descriptor and it failed with TypeError, could you please take a look?

Here is an example:

import pandas as pd
from evidently.report import Report
from evidently.metric_preset import TextEvals
from evidently.descriptors import ItemMatch, ItemNoMatch

data = {
        "generated": [
            "You should consider purchasing Nike or Adidas shoes.",
            "I eat apples, grapes, and oranges",
            "grapes, oranges, apples.",
            "Oranges are more sour than grapes.",
            "This test doesn't have the words.",
        ],
        "forbidden": [
            ["nike", "adidas", "puma"],
            ["grapes", "apples", "oranges"],
            ["Apples", "Oranges", "Grapes"],
            ["orange", "sweet", "grape"],
            ["none", "of", "these"],
        ],
    }

df = pd.DataFrame(data)

report = Report(metrics=[
    TextEvals(column_name="generated", descriptors=[
        ItemMatch(with_column="forbidden", case_sensitive=False)
            ]
        )
])

report.run(reference_data=None, current_data=df)
report

Hi @emeli-dral, I was also seeing this in my unit tests but didn't realize it was related to the data definition not handling list types in the input data. I've added some type handling to the data_preprocessing.py file, let me know what you think?

@emeli-dral
Copy link
Contributor

First of all, I apologise for the delay.
You’ve correctly identified the issue: the improper handling of the list type in the current implementation. After reviewing this, we’ve decided that full support for the list type in the data definition will be handled separately, so there’s no need for you to implement that in this PR.

To resolve this, I suggest the following changes:

1/ Change from List to Tuple:
Please modify the items from being a list to a tuple. Since tuple is immutable, this will help ensure stability when dealing with feature data types.

2/ Revert Changes in data_preprocessing.py:
Revert any changes made to data_preprocessing.py related to list handling, as they will no longer be necessary after switching to tuples.

Thus we will expect users to explicitly create an item as a tuple rather than a list, which is a reasonable requirement. I'll make sure that this expectation is clearly documented for users.

Many thanks for your work on this!

@jon-bown
Copy link
Contributor Author

jon-bown commented Oct 9, 2024

First of all, I apologise for the delay. You’ve correctly identified the issue: the improper handling of the list type in the current implementation. After reviewing this, we’ve decided that full support for the list type in the data definition will be handled separately, so there’s no need for you to implement that in this PR.

To resolve this, I suggest the following changes:

1/ Change from List to Tuple: Please modify the items from being a list to a tuple. Since tuple is immutable, this will help ensure stability when dealing with feature data types.

2/ Revert Changes in data_preprocessing.py: Revert any changes made to data_preprocessing.py related to list handling, as they will no longer be necessary after switching to tuples.

Thus we will expect users to explicitly create an item as a tuple rather than a list, which is a reasonable requirement. I'll make sure that this expectation is clearly documented for users.

Many thanks for your work on this!

I've reverted the data_preprocessing.py file. As far as the changing from list to tuple, do you want me to modify anything in my feature implementation for this? The error previously was being thrown before the feature was called, so even if I did modify something that error would still be thrown unless they do what I'm doing in the unit tests.

@emeli-dral
Copy link
Contributor

emeli-dral commented Oct 11, 2024

Hi @jon-bown,

I took a look at the error, and luckily it’s a very minor issue!

The error occurs during the generation of the display_name in the following section of the code:

def _as_column(self) -> ColumnName:
    160     return self._create_column(
    161         self._feature_column_name(),
--> 162         default_display_name=f"Text Contains of {self.mode} [{', '.join(self.items)}] for {self.column_name}",
    163     )

The issue arises with {' , '.join(self.items)}, where it’s assumed that self.items is a class variable, but in this case, the items are actually in the second column.

To resolve this, I recommend simplifying the default_display_name to something like "Text contains defined items" and "Text does not contain defined items".

Additionally, I think the same fix could work for the other issue. How about using "Text contains defined words" and "Text does not contain defined words"?

Once these changes are made, we should be good to merge your PRs. Feel free to also add the new descriptors to the list of descriptors under the Text Evals heading on the documentation page.

Thanks again for your contributions!

@jon-bown
Copy link
Contributor Author

jon-bown commented Oct 12, 2024

Hi @emeli-dral

Thanks for catching that, I've updated the default_display_name in the _as_column function. Is there anything you think I should change for the _feature_column_name function? I went off of the other two features for the style, but let me know if what I have is ok! I will work on updating the docs for these features.

@emeli-dral
Copy link
Contributor

Hi @jon-bown,

Thanks a ton for all your hard work on this PR! You've done an awesome job, and your efforts have really paid off. I appreciate the attention to detail and how smoothly you handled everything.

Excited to merge this in!

@emeli-dral emeli-dral merged commit 634568b into evidentlyai:main Oct 14, 2024
@emeli-dral emeli-dral added the hacktoberfest Accepted contributions will count towards your hacktoberfest PRs label Oct 14, 2024
@elenasamuylova
Copy link
Contributor

Thanks for the contribution @jon-bown ! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest Accepted contributions will count towards your hacktoberfest PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants