Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to add words or correct words in the dictionary (or document it)? #93

Open
ojak opened this issue Feb 26, 2015 · 2 comments

Comments

@ojak
Copy link
Contributor

ojak commented Feb 26, 2015

There are words that are missing or mis-identified by the language parser. Is there a way to add a word to the parsing dictionary? If not, what would be the best way to handle such cases?

For example, with default settings, the word _spicy_ is tagged as FW (foreign word):

> sentence("This is a spicy pepper.").apply(:tokenize, :category).words[3]
=> Word (70319169769500)  --- "spicy"  ---  {:tag=>"FW", :category=>"unknown"}   --- []
@ojak ojak changed the title Is it possible to add or correct words to the dictionary (or add docs for how to do it? Is it possible to add words or correct words in the dictionary (or document it)? Feb 26, 2015
@louismullie
Copy link
Owner

The best way would be to build a custom dictionary and search/replace for the specific words. Currently, you're using the default tokenizer (which is :lingua). You could also try with alternate taggers (:brill or :stanford). The specifics of each tokenizer are abstracted away from the interface, so "Adding a word to the parsing dictionary" dictionary would require creating a base class for each tagger (https://github.com/louismullie/treat/tree/master/lib/treat/workers/lexicalizers/taggers) that would handle an :override_tags option and plugging it into the initialize methods of the child classes.

@ojak
Copy link
Contributor Author

ojak commented Feb 26, 2015

OK, thanks. I'll look into that approach and let you know how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants