Split training dataset from test dataset in evaluation #22

dzieciou · 2021-03-11T08:09:33Z

The section "Choosing stemming table" does not clarify how exactly evaluation was done.

I should differentiate between data used for training and data used for validation. Where same data used for validation for both tables?
I would like also to repeat evaluation using different training data than validation data just like Andrzej Białecki did in his oryginal implementation to account for the fact that bigger training table make be to overfitted and does not handle new words (see my discussion in https://datascience.stackexchange.com/questions/84652/stemmer-or-dictionary)

In particular:

split data into training and test data
validation both test data and on training+test data

dzieciou · 2021-03-13T12:20:23Z

How well it handles words unknown during training? This is easy for stemmer trained original dictionary -- just use words for polimorf that are not in original dictionary. But how to test it on stemmer trained on polimorf dictionary if that is the most complete dictionary I know?
How well they handle words known during training?

dzieciou · 2021-05-23T12:07:44Z

Context sensitive stemming for web search
https://dl.acm.org/doi/10.1145/1277741.1277851

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split training dataset from test dataset in evaluation #22

Split training dataset from test dataset in evaluation #22

dzieciou commented Mar 11, 2021

dzieciou commented Mar 13, 2021 •

edited

Loading

dzieciou commented May 23, 2021

Split training dataset from test dataset in evaluation #22

Split training dataset from test dataset in evaluation #22

Comments

dzieciou commented Mar 11, 2021

dzieciou commented Mar 13, 2021 • edited Loading

dzieciou commented May 23, 2021

dzieciou commented Mar 13, 2021 •

edited

Loading