You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are words that are missing or mis-identified by the language parser. Is there a way to add a word to the parsing dictionary? If not, what would be the best way to handle such cases?
For example, with default settings, the word _spicy_ is tagged as FW (foreign word):
> sentence("This is a spicy pepper.").apply(:tokenize,:category).words[3]=>Word(70319169769500) --- "spicy" --- {:tag=>"FW",:category=>"unknown"} --- []
The text was updated successfully, but these errors were encountered:
ojak
changed the title
Is it possible to add or correct words to the dictionary (or add docs for how to do it?
Is it possible to add words or correct words in the dictionary (or document it)?
Feb 26, 2015
The best way would be to build a custom dictionary and search/replace for the specific words. Currently, you're using the default tokenizer (which is :lingua). You could also try with alternate taggers (:brill or :stanford). The specifics of each tokenizer are abstracted away from the interface, so "Adding a word to the parsing dictionary" dictionary would require creating a base class for each tagger (https://github.com/louismullie/treat/tree/master/lib/treat/workers/lexicalizers/taggers) that would handle an :override_tags option and plugging it into the initialize methods of the child classes.
There are words that are missing or mis-identified by the language parser. Is there a way to add a word to the parsing dictionary? If not, what would be the best way to handle such cases?
For example, with default settings, the word _spicy_ is tagged as
FW
(foreign word):The text was updated successfully, but these errors were encountered: