Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Persian][Numeral] Extract incorrect number from text #713

Open
AmirMohamadBabaee opened this issue Feb 8, 2023 · 1 comment
Open

[Persian][Numeral] Extract incorrect number from text #713

AmirMohamadBabaee opened this issue Feb 8, 2023 · 1 comment

Comments

@AmirMohamadBabaee
Copy link

when I checked the numeral dimension, I figured out that there is an issue with the numeral extractor.
in Persian, صدرا is a first name, and صد means hundred. when the name is fed into the Duckling the output is 100را which is not correct. I tried some other examples and found it in another example when the verb بده (which means give) is converted to ب10 cause ده means ten in Persian.
How can I change the configuration of Duckling to enforce rules to apply just to tokens that are space separated?

@AmirMohamadBabaee AmirMohamadBabaee changed the title extract incorrect number from text in Persian [Persian][Numeral] Extract incorrect number from text Feb 8, 2023
@abolfazlakbary
Copy link

abolfazlakbary commented Nov 30, 2024

Hello @AmirMohamadBabaee , I had the same error. "بده" contains the substring "ده", so it matches "ده" in the regex. I solved this problem by adding ^ to beginning and $ to end of regex. it ensures "ده" is the entity only if it's an standalone word and not substring of another word.
for example for ruleToNineteen function , you must write your regex like this:
regex "(?:^|\s)(صفر|یک|سه|چهارده|چهار|پنج|شی?ش|هفت|هشت|نه|یازده|دوازده|سیزده|پ(ا|و)نزده|ش(ا|و)نزده|هی?فده|هی?جده|نوزده|ده|دو)(?:$|\s)"
i used two slashes before s, i don't know why it's not showing properly, but make sure to use it correctly (\ + \ + s).
when you test again , you see that the problem is fixed, I hope :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants