-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python-format flag incorrectly forced on all extractors #35
Comments
That is correct. The extractor interface currently does not provide ways to hand out flags. Not sure yet how to properly fix it, but the place to fix it is definitely the extractor interface. I suppose the correct thing to do would be to add a new optional last item to the message tuple. If it's missing the old guessing behavior kicks in, otherwise the flags are accepted from the extractor. |
Not sure if this is related, but the current heuristic used to determine if a message is actually a |
Issues with the heuristic detection of Python format strings are separate from this. |
For what it's worth I have been working the last few days adding a extraction system to lingua, which fixes this amongst a few other things. |
By "fix", do you mean you extended Babel's extractor interface? If so, could you contribute that babel? I was about to do so, but don't want to duplicate effort. I'm not sure if lingua is meant to be an extension to Babel or an alternative. |
I fixed this purely from lingua's point of view: lingua no longer uses Babel at all anymore now. Instead it now has its own extraction command ( |
Sigh, a pity the waste of resources in duplicating code bases... :-\ Help babel do the right thing, or help lingua? Both does the wrong thing wrt strftime() formats, but at least babel already works with Python 3 while lingua currently does not... |
Lingua in git works perfectly on Python 3. It needs a few last changes and a bit more documentation before I can make a release, but that should happen in the next few days. |
@lelit can you explain why you think lingua does the wrong thing with strftime formats? If it does something wrong I would like to fix that. |
I tried lingua this morning, from a fresh clone from the git repo: at least this line prevents Python 3 to even load the module, maybe it's one the few last changes, dunno, maybe you have a different branch for Py3? Wrt strftime format, both projects implement the same heuristic to determine if a string contains %x placeholders, but its not adeguate for strftime, because the translated format may contain different set of them (think about %p, ie "AM"/"PM" which is used by englishmen, while other languages, italian to mention one, does not use it and instead prefer %H for 24-hour-based hours): the usual IMHO, a more correct way, although not perfect, would be to check if all %-placeholders in string belongs to the possible strfmt() codes. In other words, strftime() formats shouldn't be marked as "python-format", otherwise most polints I know will wrongly emit alerts about non-matching translations. |
Ah, the last commit indeed broke Python 3 support. That was stupid, and I’ll fix that immediately. The strftime thing is interesting, and something I had never run into before. Luckily it is quite easy to fix. I have a local fix which I’ll push as soon as I have an internet connection again. For what it is worth I do think Babel’s use of the “python-format” flag is incorrect: as I see it this is really c-format. Python format as defined in the Python documentation is {..} notation, something which Babel does not support currently. This is also the naming used by xgettex. |
I saw your fixes on lingua... I will try it out again :-) But is there any reason to keep two different implementation of the extraction machinery? I mean, is there any chance that lingua and Babel cooperate in this area, instead of wasting precious hours duplicating each other code? I can agree on what you state about python-format and c-format confusion, but then I do not see the point of python-format at all, unless you restrict it to {digit}, ie positional arguments, and not generically to {name} placeholder. AFAICT, the c-format marker purpose is only to let polints to check that the translated string contains the same set of placeholders as the original string (assuming it is possible to change the order, generically depending on the actual function using the string, for example the fprintf(3) manpage mentions the syntax Thank you! |
I see no particular reason that to have two implementations. lingua was originally created because Babel's python extractor did not support the i18n syntax user by Pyramid (via translationstring) and Zope (via zope.i18nmessageid), and I needed a new plugin for ZPT files. Unfortunately Babel has not been actively maintained for a long time and bug reports were ignored, which led to a growing desire to drop our dependency on, which is what I worked on over the last few days. I suspect this is also why polib was created instead of using Babel. As I see it I Babel currently tries to do too many things: a Python API for CLDR, a pot/po-handling framework, a gettext-implementation and an implementation of PO/MO file format. Perhaps it would be useful to split Babel up into separate pieces, each with a single responsibility. This is the direction I wanted to take with lingua: I want it do be nothing other than a tool to extract messages from source code. I haven't made up my mind if it should include a tool to compile PO files, when msgfmt already does that very well. And yes, removing the polint tool currently in lingua is on the to-do list :) I do see a point of having a python-format: polint tools can use that to check all placeholders from the msgid are also present in the msgstr, perhaps ignoring changes in format specifier. For example if you translate |
For what its worth I do use Babel for its CLDR API - it is much, much nicer than the CLDR wrappers in zone.i18n (even if that is setting the bar low :) ). |
Yes, right, I missed the obvious checks on typos. |
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
I disagree. For example, how would the python extractor know whether the flag should be
|
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
Is there a particular reason why the PYTHON_FORMAT regexp accepts whitespaces after the initial |
Because Python allows it:
Note the leading whitespace in the result, which is significant here. This is inherited from C, which also allows it: #include <stdio.h>
#include <stdlib.h>
int main() {
printf("% d\n", 1);
return 0;
} produces:
|
Oh, didn't know about that! Thanks! |
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
During extraction, Message instances can be created with the "python-format" flag, indicating that the message string contains Python percent-formatting placeholders. To avoid setting the flag erroneously because the string source is not Python code or otherwise is not expected to contain such placeholders, the extractor interface must be extended to allow extractor functions to indicate which flags are valid. Fixes python-babel#35
This is a copy of old ticket 318.
During extraction the results from an extractor plugin are used to create a message which is then added to the catalog. When the Message instance is created it can set the
python-format
flag if the text matches a specific regular expression. That means that a text like this:will be flagged as a python-format string even it does not originate from python code, or from a piece of python code that will never use formatting. This is breaking our translations in a pretty bad way currently.
As far as I can see the extractor is the only thing that should decide the flags for a message. The Message class itself should never try to guess flags or force them upon plugins, especially since extractors have no way to override this behaviour.
The text was updated successfully, but these errors were encountered: