Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Comments Proposal: String Versions #141

Open
zbraniecki opened this issue Jun 1, 2018 · 5 comments
Open

Semantic Comments Proposal: String Versions #141

zbraniecki opened this issue Jun 1, 2018 · 5 comments
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 semantic comments

Comments

@zbraniecki
Copy link
Collaborator

This is part of the series of proposals spanning out of the meta #16.

String Versions

All localization systems have to facilitate the string changes as part of the project life cycle. While adding and removing strings is fairly well understood and covered by Fluent based on the l10n-id model, string updates are more complicated.

We identified three states of invalidation that can happen to a message:

  1. (trivial) source-locale only change
  2. (minor) subtle change to tone, punctuation or wording without affecting the meaning
  3. (major) any other change - meaning, message shape, location in the UI etc.

At the moment at Mozilla we support (1) and (3). For (2), we will usually lean onto (3) and if the change is really minor, we'll put it on (1).

Limitations

That model works quite well, but has several limitations:

1) Any change to the message, even if the message does not lose its meaning, invalidates all translations.

That means that en-US change of tone requires l10n-drivers to decide if we want to invalidate the work of 100 people to inform them about the en-US-specific update?

2) If we deem the change small enough, we have no way to inform localizers of an "optional" update

In case we decide to go with (1) for that particular change, we have no way to communicate to localizers that there's anything to look at.

Solution

Semantic comments create an opportunity to shift that and separate out (2) as soft-fuzzy mode. It would be only applicable for cases where string change is subtle enough that the old message remains valid for production, but allow the localizers to learn about the update and consider updating their translation.

This fits quite well into the feature scope of #139 because it doesn't affect runtime, and in practice it mostly allows us to separate out (1) from (2).

But I believe that this feature can have a more subtle impact on Fluent ecosystem by nurturing the culture of thinking about the social contract. Instead of a culture where developers perceive every change to the string as requiring ID update, developers would be evaluating their changes to the social contract with localizers.
In most cases they'd inflate the ID understanding why are they doing it, while at the same time being incentivized to minimize the changes to copy in order to preserve the social contract and work of the 100 localizers.

It is my hope that the latter will also increase the value of the Fluent system by making it better at salvaging useful translations.

Case study

To illustrate the latter, I'm going to present an example. Two weeks ago we landed this change:

-history-remember-option =
-    .label = Remember my browsing and download history
+history-remember-browser-option =
+    .label = Remember browsing and download history

This change is useful and goes being fixing a spelling in the source locale, thus clearly not qualifying for (1). On the other hand, while updating the string will likely be useful for many locales, many others will probably not have to update their translation in result of this change since subject in such a sentence is implicit, or already not present at all.

For example, in polish the exact translation would be "History of browsing and file downloads" and hence no change is required.

But today, we had to update the ID and in result invalidate all localizations of the message, because otherwise we would have no way to notify those who do want to update their translation about the change. It means that either all 100 of the localizers update their string in time, or users will see the message in the en-US locale, just because we wanted to flag this string as potentially worth updates.

With semantic comments and string versions, such a change would look like this:

-history-remember-option =
-    .label = Remember my browsing and download history
+# @rev 2
+history-remember-option =
+    .label = Remember browsing and download history

In result of it, all 100 localizations of this message would remain valid, but localizers would be notified in their toolchain that their translation of history-remember-option is outdated (implicitly set @rev 1). They then would have a choice - either just mark it as valid (and apply @rev 2 in their localization), or fine tune their translation.

The end result is that we were able to notify the localizers, preserve the translations, and minimize the friction.

p.s. I think an interesting exercise would be to see how many locales changed the string in result of this, and for how many the invalidation was just a friction in the system.

@flodolo
Copy link
Contributor

flodolo commented Jun 1, 2018

p.s. I think an interesting exercise would be to see how many locales changed the string in result of this, and for how many the invalidation was just a friction in the system.

We have 41 translations for the new string: 16 unchanged, 25 changed.

@flodolo
Copy link
Contributor

flodolo commented Jun 1, 2018

I've always been a fan of this idea, but with time I've started wondering what's the problem we're trying to solve, and if it's worth the effort.

How common is case 2? It's hard to tell, but my feeling is that it's not very common, at least not as common as 3. Probably the largest change of this type was removing periods from strings in preferences, but I consider that an exception caused by lack of copy/UX reviews accumulating over years. Hopefully that should not happen these days.

We know developers hate having to change string IDs, because it requires code changes. But most of those cases fall into 3, and that wouldn't improve. So, we're probably not going to make developers much happier.

Supporting string revisions also requires non trivial tooling change:

  • How is a translated strings at rev 1 going to look in Pontoon, when en-US gets the string at rev 2?
  • Pontoon would require serialization changes, given it currently uses en-US (and its comments) as a template (but this is something that would be needed for other semantic comments anyway).

@Pike
Copy link
Contributor

Pike commented Jun 1, 2018

But I believe that this feature can have a more subtle impact on Fluent ecosystem by nurturing the culture of thinking about the social contract. Instead of a culture where developers perceive every change to the string as requiring ID update, developers would be evaluating their changes to the social contract with localizers.

My perception of the status-quo is that we have a versioning scheme, which relies on developers making decisions on behalf of 100+ other teams. Right now they're given two options, and I don't see adding a third option is going to make that decision easier for them.

Instead, I'm afraid that they'll use that revisioning option in cases that are clearly major, but they'd have punted the decision from themselves to 100 localizers. That would create a bad UX on Nightly, and possibly a worse UX on Beta/Release in a cross-channel environment.

Now, we can hope that we can educate developers to not do that. Just I don't have high hopes :-/ Then this would be a small fraction of our localization work, as flod said.

At that point, this feature is only going to be as good as its implementation on Pontoon. VCS sync, dashboards, editing support, if I have a rev on a complex string, can I highlight if it's in label or tooltip? How do I keep track of rev 3 and 4? Getting this right will be a significant amount of work.

Looking at 2018 in pontoon, I think that 2019 is already pretty booked.

@zbraniecki
Copy link
Collaborator Author

zbraniecki commented Jun 6, 2018

How common is case 2?

That's a great question. Being able to answer it would help us evaluate the value of this proposal.

So, we're probably not going to make developers much happier.

I tried to address that in my initial statement. I don't see you referring this claim to that comment of mine, so I'll reiterate it.
My believe is that besides reducing the number of cases where string ID is needed, we'll also bring the nature of the social contract more to the front and make it more explicit. A little bit like how Rust makes ownership/borrowing more explicit triggering a more conscious decisions around it.
My hope is that with three "tiers" we will be able to guide developers through a conscious decision making process on what changes they're making to the social contract, and what are their consequences.
That, in turn, will hopefully also incentivize preserving the social contract and in result reducing the number of tier3.

Right now they're given two options, and I don't see adding a third option is going to make that decision easier for them.

This is not my experience.
In most cases I've seen this happening, the developer is not making any decisions, and remains fully unaware of what and why has to happen. They're either trained to always change the ID even when its not needed, or they don't know about it and they're instructed by their reviewers or by flod to change it (sometimes in form of patch backout).

This is much closer to the (in)famous 5 Monkeys and a Banana Experiment, which is a bad project culture with negative impact on how people see l10n overall.

Instead, I'm afraid that they'll use that revisioning option in cases that are clearly major, but they'd have punted the decision from themselves to 100 localizers.

I understand that concern and I don't treat it lightly. I believe we can mitigate that risk with the use of tooling, and the worst case scenario seems analogous to the current state when the developer doesn't update the social contract ID when they change it.

Then this would be a small fraction of our localization work, as flod said.

Do you have any data on that, or do you have an idea how could we collect data on this?

I'm also suggesting that a change would trigger a shift in the culture of treating strings, with the hopes of increasing the tier2 cases over time.

if I have a rev on a complex string, can I highlight if it's in label or tooltip? How do I keep track of rev 3 and 4?

Those are valid questions that need to be answered, but they seem to be answerable and solvable. I'd like to avoid using implementation questions that can be solved as a reason to drop an idea that can bring value.
For that reason, I suggest we separate conversation about how we'll do something, and when we'll do it, from whether it is a good idea and the value of it for the project.

@stasm stasm added the FUTURE Ideas and requests to consider after Fluent 1.0 label Jun 29, 2018
@flodolo
Copy link
Contributor

flodolo commented Sep 8, 2018

Noting that "safely" is very subjective, and I might have messed up numbers along the way (I was adding data while translating), here's what I got.

Total strings: 444
Changesets: 120
Versioned strings: 69 (15,54%)
Strings that could be safely tagged: 45 (10,14%)

Safely tagged = old string can be still used, without introducing errors or showing a string with a meaning too far from the new value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FUTURE Ideas and requests to consider after Fluent 1.0 semantic comments
Projects
Development

No branches or pull requests

4 participants