Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace current CSV translation system with gettext #347

Closed
Vovkiv opened this issue Dec 22, 2023 · 9 comments · Fixed by #475
Closed

Replace current CSV translation system with gettext #347

Vovkiv opened this issue Dec 22, 2023 · 9 comments · Fixed by #475
Labels

Comments

@Vovkiv
Copy link
Contributor

Vovkiv commented Dec 22, 2023

This is kind of big step, but manageable one.
gettext currently most widely used system for translating software and games.
It has many nice features that every translator will appreciate! But might be less obvious then something like csv file.

What should be done:

  1. Convert CSV to gettext. This is relatively easy task, most of time you need to replace keys like "tooltip_1" with your source text (so, effectively move text from csv file to your source scripts).
  2. Convert other translations (bulgarian, german(?), and others) to gettext. This is also relatively easy task, some copy pasting required.
  3. Check if this thing even works, catch bugs if they arise.
  4. Somehow teach/introduce you to gettext basics, that might be little harder, but also manageable.
  5. Add translation documentation. It's not necessary has something to do with gettext, but this is still something that should be taken seriously if you want people to spend their time translating your software.
@Vovkiv
Copy link
Contributor Author

Vovkiv commented Dec 22, 2023

I might try to start this tomorrow. This might be kinda hard step since I'm not familiar with your codebase.

@MewPurPur
Copy link
Owner

MewPurPur commented Dec 22, 2023

Somehow teach/introduce you to gettext basics, that might be little harder, but also manageable.

This should be given its own section in CONTRIBUTING.md

@Vovkiv
Copy link
Contributor Author

Vovkiv commented Dec 22, 2023

After some initial digging I found, that, apparently, first step going to be actually hard, lol

Some issues related to actual bugs in Godot. You have some checkboxes (for example, settings window for inputs) and they all just instanced from single checkbox scene, so using gettext from script there not an option, unless every checkbox will get it's own script. I think it's probably not good idea to refer to setting in checkboxes as exported string, rather then checkboxes own script. But also auto-translate not going to be kicked in (because Godot can collect text from nodes with auto-translate set to true) because this bug: godotengine/godot#79144; Apparently, there bug that prevent gettext from collecting text from instanced scenes, like your checkboxes.

There also problem with tab container (I don't like it in overall, at least for that weird feature that it will collect name for tab from it's child, instead of being able to specify it in more normal way). Every tab child node should get it's own script with it's own tr function that gettext will be able to collect it. Probably, some class that tab will inherit.
Currently, you there implemented for loop in parent that will translate their names, while every tab probably should translate itself (it also might be better for decoupling tab container and their childs).

And in overall translations in every part of program implemented very differently, which is not good, even if we put gettext aside. Some nodes translates themself, while some nodes translate other nodes instead. Some rely on "auto-translate" property, while others simple translate from scripts.

Conversion most likely will require reworking entire parts of some components, which I'm not sure if you going to be happy with this and this might require much more time then I initially thought.
Also, offtopic, but why you use visual signals connecting? Aren't that thing considered to be anti-pattern? And it is kind of confusing...

So, what do you say?

@Vovkiv
Copy link
Contributor Author

Vovkiv commented Dec 22, 2023

Sadly, implementing game/program with gettext better when you starting from scratch, not when you already have entire codebase build around idea of csv. That's also why I'm trying to tell to developers that they need to consider moving to gettext as earlier as possible. (In future, consider using gettext from scratch, please.)

So this proposal might need to wait until some parts of codebase might get refactored a bit.
I already have some idea, how translations might be implemented in more cohesive and clean way. Maybe some composition will do, like adding simple node as child of label/button/etc that need to be translated. It will be actually responsible for translating stuff.
In fact, I think this might actually work with less significantly codebase changes, but again, will you be interested in this kind of solution?

@MewPurPur
Copy link
Owner

MewPurPur commented Dec 22, 2023

I think you should first focus on drafting out what the "For translators" section in CONTRIBUTING.md would be, because it will ultimately be the deciding factor on whether your work is accepted or not. I want to objectively weigh the pros and cons.

The thing is, after thinking about it, from what I know right now, I'm leaning towards no. So I want to know what the workflow would be for translators before having you do massive amounts of work that I may ultimately reject.

  • Needing to download more specialized programs just to be able to work on GodSVG is a huge downside, and the program you had mentioned as free and open-source only mentioned a free trial on its website. (If everything is like this, it's a complete dealbreaker)
  • Having a more complex workflow is also a downside, we have to gauge if this added complexity is worth it.
  • Being more automatic is neutral - while it improves certain situations, like finding where a string was used, it also locks you into implementing things a certain way, which might not be the best way. (You've listed this is an upside, I think it's a downside though)
  • Being a widely used standard also isn't too relevant - so is CSV. Even AAA titles usually use spreadsheets still.

The features you listed sound good, but I'm doubtful that they will justify the downsides mentioned above.

  • Pluralization is nice, but this looks like it would make sense maybe if I had a big app with thousands of readable strings. GodSVG is more like an assisted SVG code editor, so I doubt it will ever even hit 200 strings. Pluralization might come up a couple of times, but not enough to justify its own solution.
  • The same holds for context, which also hasn't come into play a single time yet. Anyway, that's why I'm using placeholder strings; if "bat" needs to be a flying animal one time and a sports tool another time, I'll just name the strings "#bat_animal" and "#wooden_bat" or something, and then you can look at the English translation to figure out what to do.
  • I use placeholder strings, so if all localizations were done in scripts, you'd be able to easily fetch them anyway, with or without external tools.
  • The other management-related things are also not as clear-cut as they would be in a normal codebase: GodSVG is never going to have enough strings to start losing track. We just have to be careful when adding, modifying, or removing strings - and I'm already extremely thorough with inspecting every PR that modifies the translations file. And with this system, I'll need to be thorough anyway with ensuring new strings can be fetched by gettext.

Besides that, the answer to almost every "Why is X done like this" question is "Because it solved an immediate problem". GodSVG is not even in beta, I can't worry about clean code when even the most fundamental systems are still getting refactored. It's a gradual process. I also don't see why signals connecting in the editor would be bad or confusing.

@Vovkiv
Copy link
Contributor Author

Vovkiv commented Dec 22, 2023

I think you should first focus on drafting out what the "For translators" section in CONTRIBUTING.md would be, because it will ultimately be the deciding factor on whether your work is accepted or not. I want to objectively weigh the pros and cons.

I'll open pull request later then.

But I still want to provide some counter (and maybe slightly offtopic) points anyway.

* Needing to download more specialized programs just to be able to work on GodSVG is a huge downside, and the program you had mentioned as free and open-source only mentioned a free trial on its website. (If everything is like this, it's a complete dealbreaker)

(I was talking about that thing: https://poedit.net/download/, it is free and open source. I think I misspelled several times yesterday. Instead of "POEdit" I write "POEditor, which is not related things. POEditor is actually web translation tool, like weblate. Sorry about that.)

First, you absolutely can edit and work with both csv and po files using nothing but text editor.
But in both cases it's not best idea to do so.
Try working with csv by hand like this:
зображення

Instead of using something like libreoffice (that you need to download in 99% of cases manually, unless you use some Linux distro that provides libreoffice out of box):
зображення

So, ultimately, you need to have some special program to work with translation anyway, Be it gettext files or csv or ini.
Also, for comparison sake, that's how translating by hand will look from po file:
зображення
I would say, that even by hand editing po files is easier and cleaner. After all, gettext was designed for that kind of use cases.

Also, don't forget that translators already need to download Godot and know git just to even download and open your project.
And also translators usually don't translate 1 project and never do this again. They, like me, maintain several projects with translations, That means, that they already have some tools to work with translations, like poedit, libreoffice, etc.
And this is why I was referring to standarization: when every project has csv/ini/json file for translation, it's mean that they translation system will work differently from project to project. With gettext, all that translator need to know is where pot file is. Then open it with standarized program (like poedit), click "create new translation" and translate. And same with updating existing translation after some time (because, lets say translator made translation for you, but after that they dissapeared). With time, you added new strings or removed old ones. Next translator might run your program, see that it is not 100% translated, and then they will go to your repo to help. All they need to do, is to (as previous translator) find po file with translation that they need (in my case, uk_UA.po), run it with poedit, click "update translation from pot" where you will pick pot file that most likely will be in same folder as po file, and then everything will be updated. Strings that no longer exist will be removed, and strings that changed slightly will be marked as such.

That's where standarization shines! Even small projects. I often don't have time to keep up with every program that I translate. I just open pot and update my po. After that I immediately know what changed. And I don't need to check git logs, check releases, anything!

In case of non gettext system, YOU as developer need to take care of this. Making sure that you removed deprecated strings and all of this, while in gettext, you simple generate new pot file once in while (same can do your translators, it's easy to do in godot itself). it's just making things less prone to "Ah, I forget that string, damn!". And while this might not have yet happened with you, it was happening in project that I was editing text in several times in several languages (they were using script file from their game engine, like gdscript, lol.).

* Having a more complex workflow is also a downside, we have to gauge if this added complexity is worth it.

It's pretty default way to work with translations.
You open translation file with whatever tool you have, do some changes and then (ideally) try to test them in-program.
I would say having dealing with csv file was more complex, simple because even looking visually on csv file is hard because you always have entire text file or (in case of libreoffice) entire spreadsheet in your eyes, plus keys, plus other languages (because in your case, all translations inside single csv).

* Being more automatic is neutral - while it improves certain situations, like finding where a string was used, it also locks you into implementing things a certain way, which might not be the best way. (You've listed this is an upside, I think it's a downside though)

I list it as upside because it is forcing you try to make your translations more predictable, where exactly you apply them, when and how. Making it more consistent and predictable. (For example, that's why I'm not biggest fan of auto-translation property) Applying them all over the place might become very problematic once refactor might required.
But this is more IMO then anything.

* Being a widely used standard also isn't too relevant - so is CSV. Even AAA titles usually use spreadsheets still.

There several problems with that.
First, csv file format itself is wildly used, yes - but localization features in it - isn't. There no standards for any localization feature in csv. No standards to define pluralization, no gender variants (which actually used quite often, like refer to player by gender if they can chose it, this is especially important in non-English languages, where you often need several gender forms just to say something simple as "Hey, you! Can you help me, please?". And this gender amount might vary from language to language), no standard to define comments between translators (or comments in general), no context, anything.

And second, AAA game devs very conservative in their translation choices and often translation tools and workflow for their games - sucks. Around 2 months ago, I finished translation (not for AAA game, but translation experience was pretty close) for one game with almost ~3k strings (maybe close to 2-2.5) and author used ini file instead (which is also doesn't include any localization features aswell). Do I need to tell, that it wasn't very good experience? Even finding strings that wasn't translated was painful. (gettext tools actually do provide ways to count this kinda of information). I can't even imagine managing games with 10k and more strings with this kinda of systems!

The features you listed sound good, but I'm doubtful that they will justify the downsides mentioned above.

Welp, that's fair
Again, it's just easier to implement when you starting, not in-between, sadly

* Pluralization is nice, but this looks like it would make sense maybe if I had a big app with thousands of readable strings. GodSVG is more like an assisted SVG code editor, so I doubt it will ever even hit 200 strings. Pluralization might come up a couple of times, but not enough to justify its own solution.

Problem here, that IF you need pluralization, then it's become problematic with csv or something like that (remember that every language have it's own rules regarding plural forms. For example, csv never had any standards regarding this. And Godot doesn't even have it's own way to pluralize with csv yet: godotengine/godot-proposals#1291 (but there ssems to be ready pr for that, but it was left pretty untouched for almost 4 years...)

* The same holds for context, which also hasn't come into play a single time yet. Anyway, that's why I'm using placeholder strings; if "bat" needs to be a flying animal one time and a sports tool another time, I'll just name the strings "#bat_animal" and "#wooden_bat" or something, and then you can look at the English translation to figure out what to do.

Context in gettext it's just not simple hint tags, they have technical purpose. (and, being able to sort strings by their context is big improvements for workflow. For example, sort strings by settings tab, e.g: "Visual options", "Input options", etc.)
Consider this:
a = tr("Bat", "Animal")
b = tr("Bat", "Item")
will generate 2 separate strings for pot file.

And problem here that from your language everything might looks right, while from my language point I might have bad time.
For example, recently I was translating Extension Manager for GNOME, and author used 1 word "Downloads" when referring to "Sort extensions by download amount" and "How much times this extension was downloaded?" in several places. From their point there nothing wrong, because "Downloads" for them is fine, why it should be wrong? But for almost all other languages it wasn't. Later they split string using gettext's context when I pointed out and other translators catch up.

* I use placeholder strings, so if all localizations were done in scripts, you'd be able to easily fetch them anyway, with or without external tools.

Well, it's only true if you actually familiar with codebase. Most people who might want to contribute translations or help with it won't able to easily find anything. Or just glance context in which this string might be used.
Also, problem here with csv: you can't tell how much times string was used and where.
Again, if you already familiar with codebase it is not big deal. For translator it might be painful, especially if they need to test out some edge case translations, like error messages and error popups, which usually not easy to trigger and test properly without hacking stuff together.

* The other management-related things are also not as clear-cut as they would be in a normal codebase: GodSVG is never going to have enough strings to start losing track.

It's mostly management features for USERS. gettext makes this easy for you and users, because when you have gettext setup, you never really touch translation files ever again. You just add script or scenes (adding scene to pot system will add it's dependencies aswell) that need to be translated and write translations

We just have to be careful when adding, modifying, or removing strings - and I'm already extremely thorough with inspecting every PR that modifies the translations file. And with this system, I'll need to be thorough anyway with ensuring new strings can be fetched by gettext.

That's why exactly you should use pseudolocalization should be used tho, even with csv file, just in case.

I also don't see why signals connecting in the editor would be bad or confusing.

Mostly because it's harder to track where each node connects which signal to which function. Connecting signals in node's own node script makes it easier to track where signals and to whom emits.
But again, this is more imo.

@MewPurPur
Copy link
Owner

MewPurPur commented Dec 22, 2023

Okay, if this is openable in a basic text editor and in simple programs, then I don't think the downsides I brought up are as valid. I do still think the extra dependencies are a notable downside though, as currently the barrier for contributing is really low because there's only Git (unavoidable) and Godot (a very simple download and most people who find the project already have it). Libreoffice or other spreadsheet tools are usually around by default too, so I disagree with that point.

So yeah, I'd appreciate writing a draft paragraph for "CONTRIBUTING.md" for translators, i.e. outlining the steps you need to take if your PR includes some new strings, or if you want to do translations. This will decide if I'll accept or reject the idea. (that is, I'd accept a PR implementing this, but I'm not planning to work on it personally yet, I think there are bigger priorities at the moment)

@Vovkiv
Copy link
Contributor Author

Vovkiv commented Dec 22, 2023

Okay, if this is openable in a basic text editor and in simple programs, then I don't think the downsides I brought up are as valid. I do still think the extra dependencies are a notable downside though, as currently the barrier for contributing is really low because there's only Git (unavoidable) and Godot (a very simple download and most people who find the project already have it). Libreoffice or other spreadsheet tools are usually around by default too, so I disagree with that point.

There still many people who might use this program but without knowing what "Godot" is. And I wouldn't call Godot simple download, considering tit weight slightly over 100mbs, aswell configuring git on windows is more PITA then it should be, while spreadsheet of choice might as well be not available or support for displaying csv might require some input for user to display it correctly (some slightly older version of Microsoft Excel can't even recognize csv without additional manipulations, which already more problematic then opening format with 100% supported software, such as Poedit which weight less then 30mb and available for all majority platforms without any strings attached)

Also, WIP version of contribution guide is done, now I need to create pull.

@MewPurPur
Copy link
Owner

godotengine/godot#87530

This is relevant, it would address the grievances with the translation system if it gets merged timely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants