Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rasa test nlu produces wrong partitions on using the percentages argument #6120

Closed
AMR-KELEG opened this issue Jul 1, 2020 · 4 comments · Fixed by #6121
Closed

Rasa test nlu produces wrong partitions on using the percentages argument #6120

AMR-KELEG opened this issue Jul 1, 2020 · 4 comments · Fixed by #6121
Assignees
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@AMR-KELEG
Copy link
Contributor

AMR-KELEG commented Jul 1, 2020

Rasa version: rasa==1.10.4

Python version: Python 3.6.9

Operating system (windows, osx, ...): Ubuntu 18

Issue:
Currently rasa test nlu allows specifying percentages of the training dataset to be excluded while training using the -P flag. This isn't currently done correctly.
E.g: If the percentages were -P 50 25 then you would expect that the number of training samples would be 500 and 250 if the original training dataset size was 1000 samples.
BUT, the way the function currently work will generate number of training samples of 500 and 125 samples.

rasa/rasa/nlu/test.py

Lines 1848 to 1865 in 4830308

train, test = data.train_test_split()
io_utils.write_text_file(test.nlu_as_markdown(), test_path)
for percentage in exclusion_percentages:
percent_string = f"{percentage}%_exclusion"
_, train = train.train_test_split(percentage / 100)
# only count for the first run and ignore the others
if run == 0:
training_examples_per_run.append(len(train.training_examples))
model_output_path = os.path.join(run_path, percent_string)
train_split_path = os.path.join(model_output_path, "train")
train_nlu_split_path = os.path.join(train_split_path, TRAIN_DATA_FILE)
train_nlg_split_path = os.path.join(train_split_path, NLG_DATA_FILE)
io_utils.create_path(train_nlu_split_path)
io_utils.write_text_file(train.nlu_as_markdown(), train_nlu_split_path)
io_utils.write_text_file(train.nlg_as_markdown(), train_nlg_split_path)

Command or request that led to error:

rasa test nlu -c config.yml config_dup.yml -r 1 -p 50 25

I am willing to submit a PR to solve this so kindly assign the issue to me.

Thanks

@AMR-KELEG AMR-KELEG added area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Jul 1, 2020
@sara-tagger
Copy link
Collaborator

Thanks for the issue, @koaning will get back to you about it soon!

You may find help in the docs and the forum, too 🤗

@koaning
Copy link
Contributor

koaning commented Jul 2, 2020

Just to check, are you using tensorboard in your runs?

@AMR-KELEG
Copy link
Contributor Author

Just to check, are you using tensorboard in your runs?

Nope.

@tabergma
Copy link
Contributor

tabergma commented Jul 3, 2020

@AMR-KELEG Thanks for catching this! Seem like you are correct. And thanks for create a PR already. We will give it a review soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss 🎡 Anything related to the open source Rasa framework type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants