Persian isn't supported. #42

niyumard · 2020-01-08T19:23:47Z

Hi, I see Persian texts like this using stutter:

It seems stutter doesn't support RTL languages and doesn't use a suitable font for them.

jamestomasino · 2020-01-08T20:31:51Z

Hi @niyumard ! Thanks for logging the issue. Right now you're absolutely right. Not only that, but the way I'm breaking up word-forms is specifically English based. I don't have the knowledge to re-implement that in a way that would support other languages, let alone RTL ones.

That being said, it's about time I look into finding a way to allow for localization to be submitted. If I can make it easy for others to contribute their own language parts we can start tackling this.

jamestomasino · 2020-01-09T17:48:57Z

While I haven't really made progress on Persian, I have added some basic locale support to Stutter. More work will be required to handle RTL, but if you have any LTR languages you want to work with, then all you need to do is modify the JSON object at the top of parts.js. i found a list of common prefixes and suffixes for Spanish, and left all other word-splitting behavior the same as English. Hopefully others will manage to PR in other languages!

niyumard · 2020-01-09T18:54:47Z

I suggest that you let users use their font of choice, I think that might help.

I also tried changing "__stutter_right" to "__stutter_left" and it helps! although there's a problem again because Persian/Arabic script doesn't use block letters but is cursive in its nature.

niyumard · 2020-01-10T20:38:29Z

You may be able to solve the cursive problem by using this character: "ـ"
https://en.wikipedia.org/wiki/Kashida
Which for example when added to س makes it سـ which is perfect for the start or middle of a word س itself being used in the end of a word.

jamestomasino · 2020-01-10T20:49:51Z

Ahh, so I'll need to make my word divide character into a configurable value in the json object as well. That's very good to know.

Other than the display being in the wrong direction, is Stutter reading through Persian text in the correct direction so that each word is in the correct order? If so, i think the steps needed to add support would be:

Add functionality to display RTL languages visually RTL
Add support for other fonts that can display the characters properly (possibly by using CSS Variables and a user defined string)
Add a custom word divide character
Modify the regex properties for the language to parse it correctly

Can you think of anything else?

niyumard · 2020-01-11T16:22:34Z

Is Stutter reading through Persian text in the correct direction so that each word is in the correct order?

Yes the order is right.

can display the characters properly

The characters are displayed properly but I'd rather see them in another font, this one's too ugly for Persian texts, so maybe this one's not that much of a priority but if you can make it happen it'd be great.

Can you think of anything else?

Not really, I'm not sure how stutter divides words.

jamestomasino · 2020-01-24T14:16:58Z

I've added more information to the README regarding localization. I moved the locales content to its own JSON file as well. I'll need to add more features in for Persian than are currently available, but if you'd like to start creating a "fa" entry that would be helpful. I assume the first regular expression will still work since it's just splitting on whitespace. The second one which splits on "." or "," will probably need to be changed. Finally, the presub section will need a lot of love.

That stuff collectively is the 4th item in the checklist above. I'll have to do 1-3 myself.

niyumard · 2020-02-19T22:33:19Z

Well I can't master regex at the moment it seems, how about I write down Persian alphabet and common prefixes here?

jamestomasino · 2020-02-19T22:43:22Z

Lets start with that and see how it goes. :)

niyumard · 2020-02-20T07:39:47Z

Here are the Persian alphabet:

ا ب پ ت ث ج چ ح خ د ذ ر ز ژ س ش ص ض ط ظ ع غ ف ق ک گ ل م ن و ه ی
But some may also use these characters too:
ك ء ة آ إ ي ئ ؤ

Complex words maybe separated in two ways, the correct way is by zero-width non-joiner but some may separate inside a word with space or some may not use any, for example:
correct form for a prefix:

می‌خواهم

but people also use:

می خواهم

and

میخواهم

correct form for a suffix:

کتاب‌ها

but people also use:

کتاب ها

or

کتابها

so here are some prefixes:

می

and here are some suffixes:

ها های تر ترین کده گان گانه گر وار ستان

anytime there's a zero-width nonjoiner you can easily separate that word in two parts although they obviously should come together for example:

کم‌محبت = کم + محبت

I hope that it helps!

niyumard · 2021-02-06T16:13:17Z

I think I've found the main problem. It seems that separating words down to letters (or a group of letters) isn't a good idea for cursive scripts in which the letters change shape according to their position in the word. When the extension tries to make one letter red, it does so by separating that single letter, so it gets separated and is shown in the wrong way. In Persian and languages with Arabic script in general, the letters change shape according to their adjacent letters.

For example in the word کتاب, the letter ت becomes ـتـ when it's medial and surrounded with certain other letters.
The same thing goes for other letters as well. They change shape according to their position in the word as mentioned in the wiki.

What we need is to introduce Keshida in stuttter.
So the solution for the letter ت is that if it's isolated it doesn't need any kashida.
if it's the first letter, then it needs to be connected to the next letter, in that case the browser itself processes it in the right way. If you copy تا and remove the first character, you can see what happens.
If we want to separate it though, we need one keshida, تـ‌ا
and if it's in the middle it needs to keshida charachters, one before and one after it: ـتـ

jamestomasino added enhancement New feature or request help wanted Extra attention is needed labels Jan 8, 2020

jamestomasino self-assigned this Jan 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persian isn't supported. #42

Persian isn't supported. #42

niyumard commented Jan 8, 2020

jamestomasino commented Jan 8, 2020

jamestomasino commented Jan 9, 2020

niyumard commented Jan 9, 2020

niyumard commented Jan 10, 2020 •

edited

Loading

jamestomasino commented Jan 10, 2020

niyumard commented Jan 11, 2020

jamestomasino commented Jan 24, 2020

niyumard commented Feb 19, 2020 •

edited

Loading

jamestomasino commented Feb 19, 2020

niyumard commented Feb 20, 2020

niyumard commented Feb 6, 2021

Persian isn't supported. #42

Persian isn't supported. #42

Comments

niyumard commented Jan 8, 2020

jamestomasino commented Jan 8, 2020

jamestomasino commented Jan 9, 2020

niyumard commented Jan 9, 2020

niyumard commented Jan 10, 2020 • edited Loading

jamestomasino commented Jan 10, 2020

niyumard commented Jan 11, 2020

jamestomasino commented Jan 24, 2020

niyumard commented Feb 19, 2020 • edited Loading

jamestomasino commented Feb 19, 2020

niyumard commented Feb 20, 2020

niyumard commented Feb 6, 2021

niyumard commented Jan 10, 2020 •

edited

Loading

niyumard commented Feb 19, 2020 •

edited

Loading