Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat entities with same type and value but dissimilar bodies as different entities? #135

Closed
keshav47 opened this issue Mar 29, 2022 · 1 comment · Fixed by #138
Closed

Comments

@keshav47
Copy link
Contributor

Describe the bug
Image this scenario:
We need to capture 10 digit mobile numbers uttered by the user from transcripts using ListEntityPlugin.

"alternatives": [
        [
            {
                "confidence": 0.9299549,
                "transcript": "my number is 9041270333"
            },
            {
                "confidence": 0.9299549,
                "transcript": "my claim is 1041270333"
            },
            {
                "confidence": 0.9299549,
                "transcript": "my claim is 1041270333"
            }
        ]
    ],

Passing the following pattern inside ListEntityPlugin:

entity_patterns:
  number_pattern:
    mobile:
    - \b\d{10}\b

Applying the current logic inside entity_consensus function, i.e

entity_type_value_group = py_.group_by(
            entities, lambda entity: (entity.type, entity.get_value())
        )

we capture the following (entity-type, value group):
{('number_pattern', 'mobile'): [KeywordEntity(body='9041270333', type='number_pattern', parsers=['ListEntityPlugin'], score=0, alternative_index=0, alternative_indices=None, latent=False, value='mobile', entity_type='number_pattern', _meta={})]

Due to this, we miss out on the other more recurring number i.e 1041270333.
In order to include this value, we can replace the above group-by logic with this:

entity_type_value_group = py_.group_by(
            entities, lambda entity: (entity.body, entity.get_value())
        )

this will capture the other entity like this:

{('9041270323', 'mobile'): [KeywordEntity(body='9041270333', type='number_pattern', parsers=['ListEntityPlugin'], score=0, alternative_index=0, alternative_indices=None, latent=False, value='mobile', entity_type='number_pattern', _meta={})], {('1041270333', 'mobile'): [KeywordEntity(body='1041270333', type='number_pattern', parsers=['ListEntityPlugin'], score=0, alternative_index=0, alternative_indices=None, latent=False, value='mobile', entity_type='number_pattern', _meta={})]

Disclaimer:
The above suggested group-by aggregation will not work with Datetime entities, can someone suggest a better alternative to handle this particular edge case.

@keshav47 keshav47 changed the title Petition to refactor, group_by(entity.type, entity.get_value()) with group_by(entity.body, entity.get_value()) inside entity_consensus functionality. Treat entities with same type and value but dissimilar bodies as different entities? Mar 29, 2022
@ltbringer
Copy link
Contributor

I can support an expression in the candidate keys:
The current understanding is:

candidates = {
   "<entity_type>": {
       "<entity_value>": [
           ....
       ]
   }
}

patterns are assumed to have a value as their parent instead of the parsed value derived.

By supporting:

candidates = {
   "<entity_type>": {
       "__value__": [
           ....
       ]
   }
}

we can let the plugin understand that the parsed value should be used instead.

ltbringer added a commit that referenced this issue Mar 29, 2022
* add: support for aggregation of numeric values.

* docs: content update.

* update: remove zen of python.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants