-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localization Units Formatting #118
Comments
For a minimal example of how a flat string would be the wrong answer, consider the work required to make On the other hand, if we do provide a formatToParts style API, we can get something like this out of our formatter: [ 'Hello, ', { var: '$user', value: $user } ] and have a much easier time all around. So while that doesn't exactly speak to localization units, it is a significantly smaller step to go from array output to object output, i.e. translating a whole unit at once, or at least giving it a scope only once. On the other hand, sometimes you do want to format just one message, so the API should support that too. Continuing with the // hello-prompt.en.js
import { createElement } from 'react'
export default {
meta: { role: () => ["modal window"], ... },
elements: {
label: ({ userName }) => ["Hello, ", createElement('strong', null, [userName]), "!"],
"button-ok": { label: () => ["Ok"], ... },
...
}
} And then provide an API on top of that, maybe something like this (ignoring e.g. error checking): import { getMessage, getUnit } from 'some/where'
// called e.g. formatMessage('hello-prompt', 'en', ['elements', 'label'], { userName: 'Bob' })
// would resolve to ['Hello, ', <b>['Bob']</b>, '!']
export async function formatMessage(unitId, locale, path, scope) {
const unit = await getUnit(unitId, locale)
const message = getMessage(unit, path)
return message(scope)
}
// called e.g. formatUnit('hello-prompt', 'en', { userName: 'Bob' })
export async function formatUnit(unitId, locale, scope) {
const unit = await getUnit(unitId, locale)
return (path) => {
const message = getMessage(unit, path)
return message(scope)
}
} And walking through that, I think our focus should be on the first part, on enabling the work that goes into parsing and transforming messages to an executable form, while making sure that unnecessary limitations are not imposed on other API layers that may work with the data in all sorts of ways. One point where these layers interact is the In other words, if we can have a
This is again one of those things that should probably be warned about by the linter, but which the language should allow. |
I'm going to continue to beat the building-blocks-only drum, same as in #65. I think this issue is a great example of innovation that should be made possible by the low level MessageFormat 2.0 API. I'd like to see it implemented as a userland solution (which might get standardized in the future independently of MF). There's many things to get wrong if we set out to design a holistic solution. Which is why I'm a big believer in low-level agnostic API design which is only concerned about returning a valid sentence in the target language. Similar to @eemeli, I think that a @zbraniecki's list of 13 don't scale questions is full of questions with no obvious good answers. The questions about sync/async, before-paint localization, fallback, retranslation -- all of these involve tradeoffs which I'd prefer be made by the API consumers, not the standard itself. They're great questions, btw, but I think our job is to let other people answer them to suit their business and non-business needs :) I'd also like to caution against nested units, in particular arbitrarily nested. At the extreme side of this idea the whole app is a single unit, with multiple descendants. This might even be theoretically correct, but practically it's rarely desired. From the tooling perspective, it's convenient to have a definitive "leaf" type which cannot have any nesting inside; nested units mean there's no such type. Furthermore, in the nested model, the hierarchy of localization units within other units can become tightly coupled with the layout of the source code; this spells trouble for any sort of refactoring. Even CSS needs patterns like BEM to reduce the tight coupling. IMO the best way to solve this problem is to avoid it altogether by storing translations in a flat list of non-nestable units. To conclude, I think my views are best summarized by the following bullet point from your what's in scope list:
|
@romulocintra hah! great example of bad UX in result of per-string level! @eemeli: I think what you're prototyping is going in the direction of solving the issue, but I don't see how you intend to execute the "at runtime I bind the element to its localization unit" part yet. (not critique, just observation that this is I think important piece of the puzzle) @stas: No need to be defensive about sticking to your position. I think it's a very valuable one and since we are brainstorming many areas and angles, it is natural that we bring our perspective to each angle. I see it as a good thing :) Saying that, I'm confused about your response because your first and last sentence are incompatible for me:
If we go with (c) then we will not tie our work to it (compare that to (b)) and therefore not verify that our decisions lead to enabling such approach to work. To make the distinction between (b) and (c) tangible - when we discuss if we should allow messages to reference one another, we may be at the place where this paradigm is the strongest justification for such feature. There are more pieces like that. Similarly with other areas, Localization is not easy to layer into separate independent layers where you can design a layer in vacuum and expect another layer to just hook-in. |
I understood (b) as saying that this was the right paradigm for the UI localization, and that MF2 should focus on enabling just it. I'm closer to the opinion that it's one of many paradigms, hence I picked (c). I'm also not sure what tying our work to it implies. I think it's helpful to keep this approach as one of the many use-cases that we want to make possible. Is this what you're proposing?
I don't really see how they're related to a point of one being a blocker for another. It might be a good idea to discuss this in a separate issue. |
@zbraniecki I realized that I presumed that the main benefit for Localization Units was around "Context" yesterday on our call. Can you confirm what do you consider would be the main benefit? Catching up to the thread I'm unsure if its "Context" or "Ease of integration". |
@stasm - perhaps I'm not very good at explaining the avenues I see forward! Let me try again:
I don't think this should be treated as a particular issue, it's just an example. To make it more generic - if we will find a feature that would be primarily necessary for such
Great question! I don't think I have a clear answer, but let me try (treat it as an input to a brainstorm):
I think those two are actual reality of the GUI apps for a long time, and they are certainly true for HTML (and thus the Web). I come with a worry that the vibe in this group is that we have "enough on our plate" with lower level considerations that we'll be eager to cast everything we can out of sight and out of mind. "it's not our problem", "that's for the future" is a reasonable statement if we had a good plan for making breaking changes in the future and reasonable hope that if needed we can adapt our data model. But I don't think we have that luxury. If we are successful, JS apps will be written, and many of them work with React UI, HTML etc. If we are successful some future W3C WebL10n Work Group will be kicked off to standardize HTML/DOM bindings for localization and it would be really really bad if they had to conclude that JS message formatting is not compatible with that model. So my hope is that we will conclude here that LU model is a good candidate for the system that Message Format 2.0 should be a foundation for and thus we will consider features deemed necessary for LU to be necessary or at least highly valuable for MF 2.0. |
The way you are explaining triggers one question in my mind: what is the mission of this working group? For me, it’s to build a successor of MessageFormat that offers more than the current version. So, what is the current version offering?
So, unless we decide to become very serious about solving complex linguistic problems which would require lexicons in every language (that could also be tricky to fit natively in mobile browsers), what can this group realistically tackle? My personal thought after months of discussions:
Maybe I’m being too conservative but the more we discuss, the more I realize that this problem is complex and it’s quite easy to get lost into what to tackle. Going back to your original proposal, I love the idea of a “Localization Unit” but also agree with @stasm that this could be dealt with by the library of the technology that would also support the new syntax. Ideally, this syntax should also be agnostic of file formats, programming languages, etc. I’m not sure based on your explanation if this is option b) or c) – But I think we should keep this in mind part of the paradigms that we should easily integrate with if this makes any sense. |
I am totally on board with the idea to have some way to "group" things.
ends up rendered like this (using _ for underscores, I can't find the markdown):
There we have But there are a lot of good ideas there (if you can ignore the fact that the format is XLIFF :-) The spec not only describes the structures, but also why they are needed, how to use them, etc. It will help when we get to the "map the data model to XLIFF" part :-)
Already touched in my Localization concepts doc, slide 16, A MessageFormat Data Model, slide 17, Elango's data model proposal (PlaceholderType : OPEN / CLOSE / STANDALONE) (sorry, can't find the link right now). And also in XLIFF, http://docs.oasis-open.org/xliff/xliff-core/v2.1/os/xliff-core-v2.1-os.html#inlineCodes We need it anyway for other things that require open-close concepts (BiDi spans, formatting)
Placeholders should not contain localizable content, or we end up in the "deep nesting, message inside message" problem Note: I've used For the data model it does not matter if that is a link, a bold, or a span with a style and "onclick" event. |
Just echoing on @mihnita points 6 and 7 and attempting to straighten our terminology. What @zbraniecki calls a localization unit in his example, really is a localization group (of units) as per the localization object model (as described in XLIFF 2) and also our agreed vocabulary. While units can have multiple segments in a linear order (that can be changed in the target language using As @mihnita hinted, the XLIFF 2 spec when read ignoring its XMLisms, describes a generally valid object model, that's why we set up the XLIFF OMOS TC that works on restating the LIOM independently of the traditional XML serialization.. (to help abstract the XML independent business logic for wider reuse in I18n and L10n.. |
In spite of the awesome amount of detail (thanks @zbraniecki for the brain dump!) I suspect this issue has (on the one hand) been superseded for MFv2 by various choices made along the way--particularly that we force complete patterns (no concatenation)--and (on the other hand) pushed out of scope (because TU management, segmentation, and such are more applicable to resource formats wrapping around MFv2 message strings. We should do things consistent with best practices in our syntax (like "complete thought patterns"), but not introduce additional features without cause. Note that I have championed incorporating structures helpful to the localization process, such as XLIFF or ITS markup functions or comment syntax previously, but we have, as a group, excluded these features from the syntax and (so far) default registry. Marking |
Closing resolve-candidates per discussion in 2023-07-24 call |
This is a complete braindump of my late night revelation that may be genius, crazy, foolish or any combination of those.
Background
It started with realization that the irk I have with the name of our group overlaps with the irk that Mihai expressed, but for different reasons. Mihai said "I think we may come up with something very different than MF 1.0, so naming it 2.0 is misleading and may implicitly steer us toward trying to salvage MF similiarity for compatibility reasons which may be a sunk cost fallacy" (paraphrase mine).
I reacted positively to that, because I recognize that there is a natural drift to "add to MF 1.0" just like I may have a drift to "bring Fluent to MF 2.0", and I think it may be limiting us in designing the optimal solution.
But as I dug deeper I realized that the concern I have is with the word "Message". The fact that we talk about formatting messages is already misaligned with how I think modern UI localization mental models should work.
For a simple textual app, you can have something like:
and MessageFormat 1.0 contains data model, syntax, logic and API to internationalize this line of code.
But UI paradigms are fundamentally different.
Let me give you an example:
Example
What does it mean to localize it? What is the "message" and what do we mean by "formatting" it in such context?
There's definitely going to be some formatting going on, there are 4 strings in this widget, and an icon, but what is the "message"?
Well, you can decompose this widget into four separate widgets (title, label, button-ok, button-cancel) and try to say "each one of those has a value and that value is a message!", and I believe that's the most common model of approaching it.
But it doesn't scale in so many ways:
5
is actually a numerical text input, or select dropdown, or your text for this widget is a list of items where the structure and number of items should be controlled by the localizer. How do you handle that when you are merely formatting a single string and you don't have a notion that it is part of a UI that is a nested tree structure with attributes, events, text, icons and data?Two topics, that are intertwined but separate
I recognize that there are two topics here, my last question is from a bit different category.
I believe that the questions are related, because they relate to breaking with the idea that a message is a string and a UI is a list of messages.
In this model, UI is a tree (not list!) of compound widgets, each having multiple strings inside it, and each string may have its own UI fragment inside it.
Both of those issues are rooted in how UI is different from plain text, but we should imho treat those two questions separately and be open to having different solutions, or even considering one in scope, and another out of scope.
I'm bringing them up here because I want to challenge us with thinking about end-to-end localization of UI, and then you need to consider both.
How to design it?
Designing that system is actually very tricky if you stick to thinking of localization step of the UI toolkit as taking messages (strings), formatting them, and then applying in correct positions in the UI widget.
You need a lot of boilerplate code that has to either be controlled by the developer writing the code, or by the widget code, or by the toolkit and in each case is non trivial, hard to handle sync/async, limits fallbacks and, I will argue, ...
misses the point.
Localization Unit
Because you cannot localize a compound nested, rich User Interface widgets by formatting "messages".
You need a concept that is broader than a single string - something I started calling in my mind "Localization Unit".
This of all the data needed to localize the above example:
And once you have it, you can do the most natural thing: you can bind such UI element to a corresponding localization unit.
or:
Such binding is declarative, just like applying a CSS class onto an element is, and it allows the engine to understand that before layout and painting steps for this element some resources need to be retrieved, their Localization Units must be resolved and the combination of the element and its localization unit is what gets laid out and painted.
This model has a huge number of benefits:
button-ok
is a standalone messageLocalizationUnitFormatter
In ICU we actually already have a notion of such intermediate representation of data -
FormattedX
. For example,DateTimeFormatter
producesFormattedDateTime
which has a lot of information allowing users to introspect, operate and maybe even manipulate formatted data. The user can also justtoString()
it to get the result.What if we had
LocalizationFormatter
which has aformat
method that returnsFormattedLocalizationUnit
which has all the information needed for a UI toolkit to combine it withLabel
,MenuItem
orButton
or any other widget and produce aLocalizedElement
orLocalizedWidget
that will be then laid out and painted?And for the imperative case, we could still have
toString
which would take the value of theLocalizationUnit
if it has one, and just print it as a string for the familiarprintf
experience.What's in scope?
I don't know yet. It's kind of a fresh realization and I'm not sure if my recommendation for the group is to:
a) Consider
Localization Unit
in scope as a level above MessageFormatter.b) Consider
Localization Unit
out of scope, but the right paradigm for UI localization and therefore work on having MessageFormat 2.0 be a good lower level API for itc) Consider
Localization Unit
one of many paradigms for UI localization and not tie our work to itd) Consider
Localization Unit
a bad paragidm and design a better oneWhy am I raising it?
The reason I think it is important is that we need to early on decide whether what our target is does:
and we are ok thinking of the receiving end as flat textual strings, or do we want to embrace that fact that this is not how UI localization is today.
That
Label
may have multiple attributes, and icons, and other values and each one may be a nested structure of data and localization may bring its own UI fragments that need to be overlapped with source fragments.That the function in which you call
printf
is not the right place to synchronously annotated the UI with a string, because then the toolkit doesn't know that the UI is localized, cannot retranslate, cannot cache, cannot invalidate that cache, and cannot have responsive localization.I think that decisions around it will have deep consequences for our thinking about many items on our wishlist (#3)
I wrote a separate comment for Raph's new UI toolkit paragidm over last day of wrangling with this concept. If you're interested in more particular tangible application of how it may look like, consider reading raphlinus/crochet#7
The text was updated successfully, but these errors were encountered: