You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
The current ODSCategories keyword map has a lot of contextually duplicate keywords to accomodate various form of the keword. For example "school" and "schools" to accomodate the plural form of the word. This "duplication" adds unnecessary bulk to the mapping: it lengthens the time needed to categorise and makes the mapping more cumbersome to curate.
The ultimate aim of this task is to:
reduce the size of the category-keyword map,
while also maintaining or improving the volume of matched datasets having those keywords.
Describe the solution you'd like
Required: we can regex match keywords in plural form (an "s" suffix)
Required: ODSCategories_Keywords must retain keyword as in the mapping (not the dataset variant)
Optional: consider other forms of plurals ("ies", "es")
Optional: consider plural forms in phrases (word groups)
Describe alternatives you've considered
Stemming. We could stem the word down, but stemming might introduce all matter of complexities we may not need to handle just yet. Simple "s" suffix matches will be able to reduce a whole chunk of our current keyword-category map - low-hanging fruit.
Is your feature request related to a problem? Please describe.
The current ODSCategories keyword map has a lot of contextually duplicate keywords to accomodate various form of the keword. For example "school" and "schools" to accomodate the plural form of the word. This "duplication" adds unnecessary bulk to the mapping: it lengthens the time needed to categorise and makes the mapping more cumbersome to curate.
The ultimate aim of this task is to:
Describe the solution you'd like
Describe alternatives you've considered
Stemming. We could stem the word down, but stemming might introduce all matter of complexities we may not need to handle just yet. Simple "s" suffix matches will be able to reduce a whole chunk of our current keyword-category map - low-hanging fruit.
Additional context
See relevant docs: How-to-modify-category-keywords
The text was updated successfully, but these errors were encountered: