Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plurals support for CSV translation files #1291

Open
dalexeev opened this issue Aug 1, 2020 · 3 comments
Open

Add plurals support for CSV translation files #1291

dalexeev opened this issue Aug 1, 2020 · 3 comments
Milestone

Comments

@dalexeev
Copy link
Member

dalexeev commented Aug 1, 2020

Describe the project you are working on:

A game (main language is Russian).

Describe the problem or limitation you are having in your project:

CSV translation files do not support plurals. godotengine/godot#40443 adds plurals support for .po files, but CSV is overlooked.
.po files are designed to use English as the primary language, while CSV also allows identifiers.

Comments

Me:

It looks like the tr_n function is not very suitable if you are using identifier system:

tr_n("MY_ID", "", n) # Or `tr_n("MY_ID", "MY_ID", n)`?
From the docs There are two approaches to generate multilingual language games and applications. Both are based on a key:value system. The first is to use one of the languages as the key (usually English), the second is to use a specific identifier. <...> In general, games use the second approach and a unique ID is used for each string.

@Calinou:

@dalexeev In my experience, gettext PO files are heavily centered around using English text as identifiers. On the other hand, custom formats (like Godot's CSV format) and XLIFF tend to recommend using keys as identifiers.

@pycbouh:

In my experience, gettext PO files are heavily centered around using English text as identifiers.

This is definitely the intended way to use it by the creators for translating Linux, but the file format itself is not enforcing this as a rule in any way. If you use keys as identifiers, some tools may warn you that your translation language is English (POEdit does that, for one), but it's on the user to handle this. In this case the user being the engine.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:

For CSV, we should also implement plurals support. For example like this:

KEY en ru
DAYS_AGO[0] %d day ago %d день назад
DAYS_AGO[1] %d days ago %d дня назад
DAYS_AGO[2] - %d дней назад

Usage:

var s = tr_n(n, "DAYS_AGO") % n

That is, we just have to make n the first argument, and it will be compatible with both systems.

Indeed, some cells remain empty. But there are relatively few of them. Note that strings without numeric substitution still require only one row:

KEY en ru
REGULAR_KEY Regular key Обычный ключ
... ... ...
SPECIAL_KEY[0] %d key %d ключ
SPECIAL_KEY[1] %d keys %d ключа
SPECIAL_KEY[2] %d ключей
... ... ...
ANOTHER_KEY Another key Другой ключ
... ... ...

There is another option:

KEY en[0] en[1] ru[0] ru[1] ru[2]
JUST_KEY Just a key Просто ключ
DAYS_AGO %d day ago %d days ago %d день назад %d дня назад %d дней назад

But I like the first option better, because strings usually don't have numeric substitutions. Moreover, each language in this variant requires multiple columns. Although if we split the table into two files (for tr() and for tr_n()), then there will be no empty cells at all. But this is also not good, because it complicates the work (2 files instead of 1). In general, the first option is the most compromise.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:

It's not hard to implement. Here's an example to help you understand how this should work:

func tr_n(n: int, key: String) -> String:
    return tr("%s[%d]" % [key, f(n)])

func f(n: int) -> int:
    match TranslationServer.get_locale():
        "en_US":
            if n == 1:
                return 0
            else:
                return 1
        "ru_RU":
            if n % 10 == 1 && n % 100 != 11:
                return 0
            elif n % 10 >= 2 && n % 10 <= 4 && (n % 100 < 10 || n % 100 >= 20):
                return 1
            else:
                return 2
        ...

The only thing, the first option only works with identifiers. The second option also works with English strings as the primary key.

If this enhancement will not be used often, can it be worked around with a few lines of script?:

This is a commonly used feature. In addition, there is currently no way to globally redefine the tr_n function.

Is there a reason why this should be core and not an add-on in the asset library?:

.po files are not a complete replacement for CSV (see above). Therefore, CSV should support plurals as well as .po files.


@akien-mga:

For CSV plurals, I would suggest opening a proposal indeed and doing research on how plurals are handled by other projects that support CSV translations.

From what I found, there are many different CSV translation workflows and the few that support plurals have it hacked in in a way as suggested e.g. here, but there's no common standard. It's a simple system so we can indeed design our own plurals logic, but if there was a somewhat "popular" way of doing plurals with CSV used e.g. in other game engines, it would be best for us to follow that.

@Zylann
Copy link

Zylann commented Aug 3, 2020

Side note:

.po files are designed to use English as the primary language

That might be a convention, but I think it's not true. I do use identifiers in my game with .po files and it works fine in Godot. I dunno where these convention differences come from but it's not enforced into the formats themselves.

@dalexeev
Copy link
Member Author

dalexeev commented Aug 3, 2020

@Zylann The API added in godotengine/godot#40443 assumes:

# tr_n(message, plural_message, n, context = "")
var s = tr_n("%d day ago", "%d days ago", n) % n
# ru.po
msgid "%d day ago"
msgid_plural "%d days ago"
msgstr[0] "%d день назад"
msgstr[1] "%d дня назад"
msgstr[2] "%d дней назад"

If using IDs:

# tr_n(message, plural_message, n, context = "")
var s = tr_n("DAYS_AGO", "", n) % n
# en.po
msgid "DAYS_AGO"
msgid_plural ""
msgstr[0] "%d day ago"
msgstr[1] "%d days ago"

I suggested changing the order of the arguments:

That is, we just have to make n the first argument, and it will be compatible with both systems.

# tr_n(n, message, plural_message = "", context = "")
var s = tr_n(n, "DAYS_AGO") % n

However, CSV still needs full plurals support. If only because CSV files can be opened in any spreadsheet processor, and .po files are inconvenient to edit without special software.

@SkyLucilfer
Copy link

SkyLucilfer commented Aug 23, 2020

I have implemented this feature. It functions like how the proposal describes, using tr_n(n, "DAYS_AGO") will fetch the correct plural translation from the CSV using adjusted key, i.e. DAYS_AGO[0], DAYS_AGO[1] etc. depending on the locale and n.

The PR should be coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants