Fix special token cleanup when using language models #5354

dakshvar22 · 2020-03-02T17:22:33Z

Proposed changes:

fixes Rasa 1.8.0 bug xlnet "entity_recognition": true #5353

Status (please check what you already did):

reformat files using black (please check Readme for instructions)

tabergma

👍

tabergma · 2020-03-03T07:16:09Z

rasa/nlu/utils/hugging_face/transformers_pre_post_processors.py

+def cleanup_tokens(
+    token_ids_string: List[Tuple[int, Text]], delimiter: Text
+) -> Tuple[List[int], List[Text]]:
+    """Utility method to apply specific delimiter based cleanup on list of tokens"""


We should use the full docstring symtax, e.g. including Args and Return. But we also have another issue for that.

Thanks for pointing it out. Will address it as part of that issue.

tabergma · 2020-03-03T07:16:35Z

rasa/nlu/utils/hugging_face/transformers_pre_post_processors.py

+    token_ids, token_strings = zip(*token_ids_string)
+    return token_ids, token_strings


Suggested change

token_ids, token_strings = zip(*token_ids_string)

return token_ids, token_strings

return zip(*token_ids_string)

@tabergma Get a type error if I do that, that's why had to first unpack it and then return.

ok, thanks for the explanation, then I guess it is fine

fixes bug

4ffa598

dakshvar22 changed the base branch from master to 1.8.x March 2, 2020 17:22

dakshvar22 requested a review from tabergma March 2, 2020 17:23

dakshvar22 added 3 commits March 2, 2020 18:36

fix failing test

ac2cc18

added a test for bug as well

2dc36b4

add check for failing test again

22f1cf3

tabergma approved these changes Mar 3, 2020

View reviewed changes

MatthewRaza mentioned this pull request Mar 3, 2020

Rasa 1.8.0 bug xlnet "entity_recognition": true #5353

Closed

add changelog

452ee89

dakshvar22 merged commit 2d28eb8 into 1.8.x Mar 3, 2020

dakshvar22 deleted the bugfix_lm_token_cleanup branch March 3, 2020 10:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix special token cleanup when using language models #5354

Fix special token cleanup when using language models #5354

dakshvar22 commented Mar 2, 2020

tabergma left a comment

tabergma Mar 3, 2020

dakshvar22 Mar 3, 2020

tabergma Mar 3, 2020

dakshvar22 Mar 3, 2020

tabergma Mar 3, 2020

		token_ids, token_strings = zip(*token_ids_string)
		return token_ids, token_strings

	token_ids, token_strings = zip(*token_ids_string)
	return token_ids, token_strings
	return zip(*token_ids_string)

Fix special token cleanup when using language models #5354

Fix special token cleanup when using language models #5354

Conversation

dakshvar22 commented Mar 2, 2020

tabergma left a comment

Choose a reason for hiding this comment

tabergma Mar 3, 2020

Choose a reason for hiding this comment

dakshvar22 Mar 3, 2020

Choose a reason for hiding this comment

tabergma Mar 3, 2020

Choose a reason for hiding this comment

dakshvar22 Mar 3, 2020

Choose a reason for hiding this comment

tabergma Mar 3, 2020

Choose a reason for hiding this comment