Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix special token cleanup when using language models #5354

Merged
merged 5 commits into from
Mar 3, 2020

Conversation

dakshvar22
Copy link
Contributor

Proposed changes:

Status (please check what you already did):

  • reformat files using black (please check Readme for instructions)

@dakshvar22 dakshvar22 changed the base branch from master to 1.8.x March 2, 2020 17:22
@dakshvar22 dakshvar22 requested a review from tabergma March 2, 2020 17:23
Copy link
Contributor

@tabergma tabergma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

def cleanup_tokens(
token_ids_string: List[Tuple[int, Text]], delimiter: Text
) -> Tuple[List[int], List[Text]]:
"""Utility method to apply specific delimiter based cleanup on list of tokens"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use the full docstring symtax, e.g. including Args and Return. But we also have another issue for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing it out. Will address it as part of that issue.

Comment on lines +18 to +19
token_ids, token_strings = zip(*token_ids_string)
return token_ids, token_strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
token_ids, token_strings = zip(*token_ids_string)
return token_ids, token_strings
return zip(*token_ids_string)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tabergma Get a type error if I do that, that's why had to first unpack it and then return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, thanks for the explanation, then I guess it is fine

@dakshvar22 dakshvar22 merged commit 2d28eb8 into 1.8.x Mar 3, 2020
@dakshvar22 dakshvar22 deleted the bugfix_lm_token_cleanup branch March 3, 2020 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rasa 1.8.0 bug xlnet "entity_recognition": true
2 participants