Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training a classifier should overwrite the .lex #484

Open
kordjamshidi opened this issue Aug 3, 2017 · 9 comments
Open

training a classifier should overwrite the .lex #484

kordjamshidi opened this issue Aug 3, 2017 · 9 comments
Assignees
Labels

Comments

@kordjamshidi
Copy link
Member

kordjamshidi commented Aug 3, 2017

It seems if the .lex of a classifier has been created before and exists in the default path when we retrain the classifiers it adds features to the same lexicon, that is, the lexicon is not overwritten.
(We need tests for load, save and when classifiers are created from scratch. related to #411 )

@kordjamshidi
Copy link
Member Author

@danyaljj do you have any comments on this?

@danyaljj
Copy link
Member

danyaljj commented Aug 4, 2017

Just to clarify it, are you saying that training a model would write on disk (lexicon file), before/without calling save()?

@kordjamshidi
Copy link
Member Author

No, with or without save is not an issue. The issue is when there exists a lex anyhow from the past, the train() just uses that and adds new features to it that leads to exploding the lex size as we run the app and train() frequent times (in different independent runs).

@danyaljj
Copy link
Member

danyaljj commented Aug 4, 2017

I see. So you think we should always remove lexicon file, at the beginning of train?

@kordjamshidi
Copy link
Member Author

I expected it to be overwritten by default, we need to indicate if we want to continue training or need to train from scratch. Because removing those at the beginning of the train will be problematic in case we want to initialize models with existing lex and lc.

@danyaljj
Copy link
Member

danyaljj commented Aug 4, 2017

Right I agree it's tricky.
We can ask the user at the beginning of the training:

Do you want to remove existing model files? [Y/N]

What do you think?

@kordjamshidi
Copy link
Member Author

Sounds good to me. @Rahgooy might have comments.

@Rahgooy
Copy link
Collaborator

Rahgooy commented Aug 4, 2017

I think it is good for training a single model, but when we want to train multiple models, let's say with a loop, in that case, the user should wait for the first model to train and then enter [Y/N]. IMO, the better option is to have it as a parameter or something.

@kordjamshidi
Copy link
Member Author

kordjamshidi commented Aug 4, 2017

In fact for jointraining we have the init parameter: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants