Arabic language support #139
Replies: 10 comments 15 replies
-
Hey, As I answered to your previous language support request, I would like some feedback on the current state of the engine. We currently split by whitespace and when it comes to CJK character we split them one by one and consider each of those a different word. Thank you! |
Beta Was this translation helpful? Give feedback.
-
Hi @Kerollmops , I think if you use what Apache Solr uses, it would be great. Character normalization factories are very important. solr.ArabicNormalizationFilterFactory So, to have a good Arabic search engine, you need to be able to search by stem. Also, there should be some kind of character normalization, while there is an option for users to get exact searches if they wish, i.e.: ignoring normalization. Something like quoted search. For example, in Arabic, we have alef hamza above, alef hamza below, and alef hamza. Here how hey look like:
Character normalization should be able to treat all variations of hamza as the last one. So, hamza above and hamza below can be ignored. Think of it as in German: Hope this helps. |
Beta Was this translation helpful? Give feedback.
-
I add here the comment of @mohamed-foly that could be definitely useful |
Beta Was this translation helpful? Give feedback.
-
Also, we've just released a new version of our tokenizer, making it easier to contribute. Meaning anyone can add an Arabic normalizer/segmenter and then, make Meilisearch finally compliant with the Arabic language as the Arabic community expects Meilisearch to work 😄 Feel free to open any PR or even ask any question on the tokenizer repo, named charabia 🎉 |
Beta Was this translation helpful? Give feedback.
-
any updates on this? |
Beta Was this translation helpful? Give feedback.
-
Hello all!
All these issues are open to external contributions during the whole month, so don't hesitate to contribute! 🧑💻 This is another step in enhancing Arabic Language support, depending on future feedback, we will be able to go further. Thanks for all your feedback! ✍️ |
Beta Was this translation helpful? Give feedback.
-
Hello all! I have a simple question about Arabic ligatures, is it only useful to format texts or does it have a particular sense compared to the decomposed form? For instance
Thanks a lot for your help! |
Beta Was this translation helpful? Give feedback.
-
I hope if anyone can help with this: meilisearch/charabia#204 |
Beta Was this translation helpful? Give feedback.
-
any update on this? |
Beta Was this translation helpful? Give feedback.
-
Hello, commenting on this again in case someone can work on it. In arabic there is "ال" which is the same as "the" but it is at the start of the word instead of separate. so "باب" would mean "door" but "الباب" would mean "the door" I am not sure how meilisearch handles "the" in english but I think this also should be handled someway in arabic |
Beta Was this translation helpful? Give feedback.
-
Related: https://github.com/meilisearch/MeiliSearch/issues/553
Beta Was this translation helpful? Give feedback.
All reactions