Skip to content

Conversation

@rk-for-zulip
Copy link
Contributor

Add a script to simplify pulling translation data.

Closely related to #3897, which it may subsume and/or should be integrated into.

@rk-for-zulip rk-for-zulip requested a review from gnprice February 19, 2020 03:11
@rk-for-zulip rk-for-zulip force-pushed the automate-tx-hookup branch 2 times, most recently from 76be4c2 to 5a6b1f0 Compare February 19, 2020 18:45
Copy link
Contributor

@chrisbobbe chrisbobbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, it looks like this automates a ton of work! I just added two small comments. This PR does a lot of the work of #3897 (I saw your reference to it). The only things that Greg's PR adds beyond this one, at least as I've found, are

  1. Lithuanian, in the settings page (see my comment about this)
  2. Docs in docs/howto/translations.md, which would have to change because of these nice automations.

But doing your own git diff or git range-diff is probably a good idea, either one of you. 🙂


/** All the locales for which we have translation databases. */
const rawLocales /* : string[] */ = await (async () => {
// $FlowFixMe: Flow doesn't know about promises.opendir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// $FlowFixMe: Flow doesn't know about promises.opendir

Neither, apparently, does Node 10, which I think is the latest version we explicitly require for anything (see 9f82dfb).

So it might be nice to have a friendly error message if the version is less than v12.12.0, or at least a comment saying that at least that version is required. I get this, currently:

$ node tools/generate-txn-import.js 
(node:83083) UnhandledPromiseRejectionWarning: TypeError: fs.promises.opendir is not a function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed – but I still get a warning:

(node:8216) ExperimentalWarning: The fs.promises API is experimental

... but I'm also running a somewhat older version of Node 10. If this is an issue for other people I can swap in the relevant sync calls instead.

Copy link
Contributor

@chrisbobbe chrisbobbe Mar 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also running a somewhat older version of Node 10.

I'm puzzled that it would work on any version of Node 10; the docs at v12.12.0 say it was introduced in v12.12.0. 🤷‍♂

(Never mind, I see that you switched from opendir to readdir.)

{ locale: 'id-ID', name: 'Indonesian', nativeName: 'Bahasa Indonesia' },
{ locale: 'it', name: 'Italian', nativeName: 'Italiano' },
{ locale: 'ja', name: 'Japanese', nativeName: '日本語' },
{ locale: 'ko', name: 'Korean', nativeName: '한국어' },
Copy link
Contributor

@chrisbobbe chrisbobbe Mar 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e5ac532 also adds Lithuanian and no others beyond the ones you've added. (Your choice to leave out Lithuanian may have been intentional, as a consequence of using a particular threshold that Greg didn't use.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly that. Serbian and Swedish already fall below that arbitrary threshold, though, so it makes sense to lower it. (Done.)

gnprice added a commit to gnprice/zulip-mobile that referenced this pull request Mar 9, 2020
For the libdef in flow-typed/, I reran the command in the comment
at the top.

For the other two, I copied changes from a draft branch of Ray's (now
PR zulip#3915) which includes some scripts to fully automate generating
all these files.  I dropped parts of that branch's changes where it
fixed naming so that we can, e.g., offer both zh-Hans and zh-Hant --
we should do that, but it'll need a migration for existing users who
have `zh` in their settings, so it's not as completely trivial as the
rest of these changes.

Co-authored-by: Ray Kraesig <[email protected]>
@rk-for-zulip rk-for-zulip mentioned this pull request Mar 9, 2020
@rk-for-zulip
Copy link
Contributor Author

(The following commit message is no longer in the set of PR commits (as the commit it was attached to was subsumed into #3897). It's got some scripts I used, though, so I'm stashing it here for now.)

languages: Add more languages to the language-selection screen.

Manually add all languages with more than (somewhat arbitrarily) 10
translated entries.

This was aided significantly by the following script, which prints
(among other things) the number of non-default entries in each
Transifex messages-database:

```bash
(for file in $(ls static/translations/messages_*); do
    echo $file: $(git diff --no-index --shortstat -- static/translations/messages_en.json $file);
done) | sort -k 5rn
```

... and also by the following script, which takes the language data
found at https://stackoverflow.com/a/20623472 (as "./langdata.json")
and prints all the potentially-relevant entries.

```bash
(
    CODES=$(
    ls static/translations/messages_* |
    perl -ne '/messages_(..)/ or die; print "$1\n"' |
    sort | uniq | xargs echo
    );
    node -e "
    const codes = ('${CODES[@]}').split(' ');
    first = s => s.split(/[;,]|( \()/)[0];
    data = require('./langdata.json');
    out = codes.map(code => {
        const entry = data.filter(s => s.code == code)[0];
        return Object.fromEntries( Object.entries(entry)
        .map(([k,v]) => [k==='code'?'locale':k,first(v)])
        );
    });
    out.forEach(s => s.nativeName = s.nativeName.replace(/./, c=>c.toUpperCase()));
    console.log(out);
    "
)
```

Note that the language data therein is not quite what we want (for
example, "Russian" is given as "Русский язык" rather than "Русский",
and there is no distinction between traditional and simplified
Chinese), but all added entries were verified against Wikipedia and/or
other common language-selection lists on the wider Internet.

@rk-for-zulip
Copy link
Contributor Author

(This is still sort of "in progress" – although not actively, of late – but that's better represented by marking it as a draft.)

Create a script (intended for use immediately after a Transifex pull)
to create automatic imports of all available Transifex message
databases.

This script makes no attempt to name languages in human-readable
fashion (unless the human in question can read ISO-639 codes), nor to
distinguish between languages we should and should not expose for
selection. These remain the responsibility of settings/languages.js.

It is also a known failing of this script (as it is of the system
which it automates) that all messages are loaded at app start time,
rather than only when the relevant language is selected.
Run the new script `tools/generate-txn-import.js`, and commit its
output.
Add a simple utility function based on `_.partition` (or dozens of
other utilities).
Add import-time validation code to confirm that all languages listed
in `languages.js` actually have associated Transifex databases.
For convenience when comparing to file lists and other databases, list
languages in the source code by locale code, not by English name.

Sort languages for display when exporting, to preserve the existing
order in the language-selection menu.
@rk-for-zulip
Copy link
Contributor Author

rk-for-zulip commented Apr 24, 2020

So here's a relatively milquetoast version of the translation-automation script. It does little more than generate the existing files that look like they already should be autogenerated.

Because it doesn't have access to the data in languages.js, it can't do additional checks for whether we really need or want the additional imports. As such, the script also adds a few new useless imports from react-intl and from unused-but-extant messages_*.json files. (We already have a fair number of these, so this isn't a new problem.)

I have some additional (unpolished, but functional) work that more or less pulls the data currently in languages.js out into a YAML file, which is then read by the new script to generate exactly the imports and settings-data that we actually want. I was originally planning to submit that as a successor to this PR, but it's starting to look like it'll be cleaner to squash them together and present it as an alternate PR instead.

EDIT: ... and this has been done, as #4063. (It was indeed much cleaner.)

@rk-for-zulip rk-for-zulip marked this pull request as ready for review April 24, 2020 01:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants