Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translations Roadmap #498

Closed
17 of 25 tasks
emlove opened this issue Oct 26, 2017 · 30 comments
Closed
17 of 25 tasks

Translations Roadmap #498

emlove opened this issue Oct 26, 2017 · 30 comments
Labels
Feature Request Should be a discussion localization Translations and Localization

Comments

@emlove
Copy link
Contributor

emlove commented Oct 26, 2017

Description

This will be a tracking issue to keep track of the remaining work for the translations infrastructure.

Documentation Notes

Some things we need to capture in our write-up. WIP here: home-assistant/home-assistant.io#3792

  • Translations need to minimally include a translation for their own language tag. Other languages currently aren't used, but could be used in the future to display current-language options in the language select drop-down.
  • Translations use BCP 47 for language tags https://tools.ietf.org/html/bcp47
  • Need to decide on case conventions
  • Strings need to be escaped for JSON. Otherwise special characters should be included as-is.
@c727
Copy link
Contributor

c727 commented Oct 26, 2017

As we have seen today submitting translations via GitHub PRs isn't the best solution cause we get many PRs for the same language and there is much discussion about the best translation for some strings. Also users have to insert the correct language tag and can submit invalid JSON files.

Use a free 3rd party service:
-crowdin: https://crowdin.com/page/open-source-project-setup-request 4) could be problematic if balloob wants to charge money for the coming cloud service
-lingohub: https://lingohub.com/pricing/
-poeditor: https://poeditor.com/pricing/ 1000 strings
optimize current strings
we should check if current phrases are accurate and a good base for a translation
panel:
"States" --> "Overview" - easier to understand for new users, better to translate
script:
"Activate"-Button --> "run" - "activate a scene" is ok, but you "run a script"

documentation:
we can do it similar like this: https://wiki.lineageos.org/translate-howto.html

@emlove
Copy link
Contributor Author

emlove commented Oct 26, 2017

Yes, moving to an external service is on the roadmap.

@c727
Copy link
Contributor

c727 commented Oct 26, 2017

first start for guidlice docs: home-assistant/home-assistant.io#3792

@c727
Copy link
Contributor

c727 commented Oct 27, 2017

I tested poeditor.com, lingohub.com, crowdin.com, lokalise.co and some other free ones...

The UI of poeditor.com and lingohub.com is not intuitiv - no fun to use it.

crowdin.com is very popular. The confirmation dialogs are a bit annoying if you have to translate many phrases. Imo it has it's advantages in big projects with many splitted translation files. Also it doesn't allow you to have "commercial products related to the Open Source project".

lokalise.co is super easy to setup. Free for open source, machine translation is working out of the box (Google, MS, "R"). File import and export is easy, voting, chat, history, images...
checkt it out: https://lokalise.co/features

@emlove
Copy link
Contributor Author

emlove commented Oct 27, 2017

Lokalise.co seems like it has a good feature-set to me. Free full-featured open source plan seems like a solid win too.

@balloob
Copy link
Member

balloob commented Oct 29, 2017

What does it mean for Lokalise to have seats? Does that mean that we have to find people to give accounts to instead of people making their own accounts and helping translate the software?

@c727
Copy link
Contributor

c727 commented Oct 29, 2017

you can set your project private or public. private means you have to invite people by mail+name. public means they can join by a using a link. each joind user is 1 seat, but open source projects have unlimited seats all in all

@balloob
Copy link
Member

balloob commented Oct 30, 2017

Alright cool. I'll reach out to them

@balloob
Copy link
Member

balloob commented Oct 30, 2017

I reached out to them. Let's see what they say.

@balloob
Copy link
Member

balloob commented Oct 30, 2017

Lokalise came back to me and arranged a free enterprise plan for us. @c727 and @armills, since both of you are championing this project, sent me your emails on Discord and I'll make you admins for this.

@c727
Copy link
Contributor

c727 commented Oct 30, 2017

back to capitalization: let's say only first letter of a phrase is capitalized. We can still use css text-transform: capitalize to make every word capitalized

@emlove
Copy link
Contributor Author

emlove commented Oct 30, 2017

Looking at more examples, it seems more to me that it's going to vary on a case by case basis, and maybe our docs should just say to follow the conventions set by the English translation as it makes sense for that language. I'd rather see "Log Out" than "Log out" in the sidebar, but on the logbook page, "Showing entries for" makes the most sense.

I'd rather not lean on CSS transforms unless we need them, since I think it's preferable to have a human in the loop to make judgement calls on words like "of" / "the". I think most of our translations are really only used in one context anyway. It might be hard to say until it's a little more fleshed out.

@c727
Copy link
Contributor

c727 commented Oct 30, 2017

well I would go with material guidelines but if both of you say "Log Out" it's OK

@emlove
Copy link
Contributor Author

emlove commented Oct 30, 2017

Ah, good find. I'd say we should just link to the Material writing style as our official style guide. (Meaning we'd change to "Log out")

@andrey-git
Copy link
Contributor

What are the plans regarding various services.yaml files?

@emlove
Copy link
Contributor Author

emlove commented Oct 30, 2017

@andrey-git The rough roadmap I have in mind looks like this:
The frontend component will have a new service that allows additional translation resources to be registered. The frontend component will consolidate all the additional registered resources, and make them available to the polymer. Polymer can then overlay the server provided translations to the built-in translations when saving the resources to the hass object.

This doesn't cover the backend repo format of how to store things, but that's more of an implementation detail we can decide on once we have a starting point.

I added some loose bullet points for the initial implementation to the description.

@c727
Copy link
Contributor

c727 commented Oct 30, 2017

lokalise QA:

1Q: How to add new keys to lokalise?
1A: Upload an updated en.json or add them by hand.

2Q: How do we implement native name and tag?
2A: a) As armills suggested a few days ago we don't need the tag inside the file as it's already in file name.
To make exporting easy I suggest to define native language like this: "language.name" = "English (US)".
b) On the other hand armills also said that he may want to show the language selection in current language - that means we have to translate all language names for every language.
ab

3Q: How do we move our translations from GitHub to lokalise?
3A: a)We can delete the language key in each file and upload them to lokalise (current setup like 2a). We have to make sure that the keys are correct and translations match our (updated) guidelines.
b)We delete the GitHub translations because some are submitted before we changed States to Overview, some are incomplete, some don't match our guidelines, they are not enormous all in all.

4Q: How do we move our translations back to GitHub?
4A: a) Manually download the files and upload them to GitHub before each release.
b) Setup a webhook or other form of automation.

5Q: What's the ouput file name from lokalise?
5A: Default is en_US.json but we can change to en-US.json (via custom ISO code).

6Q: Do we want to proofread?
6A: In my opinion yes to 'protect' our translations and allow people to vote for alternatives.

7Q: Who do we want as proofreader?
7A: tbd

8Q: How can users add a new language?
8A: They have to request them (GitHub issue or Discord). --> An Admin has to add it from the list and modifies the tag (ISO code) if necessary (change en_US to en-US, remove country code if not necessary)

en.json for 2a)

{
    "language": {
        "name": "English"
    },
    "panel": {
        "config": "Configuration",
        "hassio": "Hass.io",
        "history": "History",
        "log_out": "Log out",
        "logbook": "Logbook",
        "mailbox": "Mailbox",
        "map": "Map",
        "shopping_list": "Shopping list",
        "states": "Overview"
    }
}

@balloob
Copy link
Member

balloob commented Oct 30, 2017

Let's not include a translation of each language in each language, that's a lot of bytes that hardly ever get used. Let's just render each language in how that language writes their own language.

@emlove
Copy link
Contributor Author

emlove commented Oct 30, 2017

My thoughts on the QA:

  1. The master English translation should be in git. Adding new keys to the schema is something that needs to be coordinated with the code, and should go through the normal PR process. For the initial deployment, a one-shot idempotent script that updates Lokalise and can be run periodically is sufficient. Long-term this should be auto-triggered with GitHub webhooks, and probably live in a lambda. Translation keys should only ever be changed/defined by running this script.

  2. I really don't like the idea of using a special keyword for native name. I think a translation key with different meanings in different languages will just cause problems for us down the road.
    I suggest we add a key in the base English translation for every language we support. This means that we won't have the problem of different keys being present in different languages, and are setting ourselves up to be future-proof. Other languages are only required to define their own name, although translating other names is welcome.
    At some point we're going to need to implement our own language picker, and I imagine something like below (without the flags). We may or may not want to include native and current language names, but we shouldn't paint ourselves into a corner right now when designing the schema.
    Example language picker

  3. I'd propose that we host our master translation set on Lokalise. I think that once we've run our schema update script to define the allowed keys, doing a one-time manual import to migrate our current data set is fine. After migrating the git translations can be removed.

  4. I don't think we should check the translations into git at all. I think we should fetch them as part of script/build_frontend, similar to script/update_mdi.py, and include them in the final PyPI package.

  5. I think we've already decided on BCP47 (en-US.json) with the other discussions, to match the format browsers report. My sentiment here is that it sucks browsers don't use the ISO formats, but we'll create fewer problems for ourselves by following the browser convention.

  6. This one might have to evolve as we go. I don't have any preference here.

  7. We should probably leave it to community voting.

  8. This seems reasonable to me. Also pending point 2 it may involve a PR to add another language to the base schema in git.

@c727
Copy link
Contributor

c727 commented Oct 31, 2017

  1. What do you think about a separate languages.json that includes tag, native_name, english_name, flag_icon or whatever we need for each supported language?

  2. Yes we decided for BCP47 100 times and I don't want to change or discuss this :P ... This was meant as a FYI: For example for English (US) we have to change the ISO Code from en_US to en-US to have the BCP47 style (en-US.json). Also for Chinese and Indian languages those ISO Codes have to be edited once.

6/7. Voting requires proofreading: https://docs.lokalise.co/article/QdiPOXw8TX-translation-upvoting

@c727
Copy link
Contributor

c727 commented Oct 31, 2017

btw this is what I get with Chinese language user account (navigator.languages):
tag

navigator.language is also zh-TW. Are you sure BCP47 is working?
__
the guy who translated to Chinese has

zh-CN
zh
en-US
en
zh-TW

For Hindi I get

hi-IN
hi
en-US
en

and navigator.language is hi

The first one always matches the default iso code lokalise is using.
Hungarian has

hu-HU
hu
...

and navigator.language is hu

@emlove
Copy link
Contributor Author

emlove commented Oct 31, 2017

  1. I don't think a separate file makes sense. I especially don't want to call out a separate "English" name. If we're that worried about size we can just use the special keyword and only ever include the native name.

For a Chinese user whose browser reports "zh", do they expect simplified or traditional scripts? We might have to just pick one to be the root "zh" if that's the case.

@c727
Copy link
Contributor

c727 commented Oct 31, 2017

there is no root for "zh" cause they have different signs, check #520

Chinese users have "zh-CN" for Simplified or "zh-TW" for Traditional. Some added both, some online guides give advice to add zh-Hant/s by hand.

@emlove
Copy link
Contributor Author

emlove commented Oct 31, 2017

OK, let's keep the Chinese discussion in #520 then. It's not a systematic problem for our roadmap.

@emlove
Copy link
Contributor Author

emlove commented Oct 31, 2017

OK, I have an updated plan for 2 that should give us the best of both worlds. We'll build the full schema with all language names I've described above, but for the time being we'll only compile the native name into the files we serve to the frontend. This keeps our schema flexible for the future, without bundling unnecessary strings into the current translations.

@c727
Copy link
Contributor

c727 commented Nov 1, 2017

BCP47 vs LOCALE:

  1. Main argument for BCP47 was that webbrowsers use it but they don't - they use LOCALE
  2. The current implementation for BCP47 is incomplete. Extended lanuage tag is missing so zh-xx-Hanx-TW -> en
  3. Currently we transfrom the LOCALE from the browser to BCP47: zh-TW -> zh-Hanx-TW
  4. For our usecase the BCP47 script-subtag is only usefull for a few lanuages. It more usefull for websites that filter thousands of books or documents.
  5. The "BCP47-guide" I've seen adds the tag to browser languages -> ["zh-TW", "zh", "en-US", "en", "zh-Hanx"]. We beginn our detection with the first element.

lokalise.co
We should bring it up again before next release or users will flood GitHub again with incorrect translation files if they read about translations.

  1. It's easier to use and maintain.
  2. It's indepent from BCP47 vs LOCALE

@dzungpv
Copy link
Contributor

dzungpv commented Nov 6, 2017

https://ackuna.com/ is a free translation service, i has been use it for some years

@cgarwood cgarwood added the localization Translations and Localization label Sep 21, 2018
@zsarnett
Copy link
Contributor

@armills any update to this? What needs to be done next? New vision?

@emlove
Copy link
Contributor Author

emlove commented Nov 30, 2018

I think for the moment everything that's unchecked would still be a good improvement. Although it's become more of a wish list than a roadmap now that the framework is out and in use.

@zsarnett zsarnett added Feature Request Should be a discussion and removed Road Map labels Nov 30, 2018
@ghost ghost deleted a comment from iantrich Mar 7, 2019
@ghost
Copy link

ghost commented Mar 7, 2019

This issue was moved by iantrich to home-assistant/ui-schema#245.

@ghost ghost closed this as completed Mar 7, 2019
@ghost ghost mentioned this issue Oct 4, 2019
25 tasks
tkdrob pushed a commit to tkdrob/frontend that referenced this issue Apr 20, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Jul 7, 2022
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Feature Request Should be a discussion localization Translations and Localization
Projects
None yet
Development

No branches or pull requests

7 participants