-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare three remaining options (draft) #496
Conversation
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
Co-authored-by: Eemeli Aro <[email protected]>
I think I'm still missing some context to fully understand what is proposed, but let me provide some feedback: 1a: I personally like the direction of this syntax. I don't really understand the differences between unquoted, quoted and 'bare' patterns. My expectation would be that 'bare' patterns consolidate whitespace (any whitespace, not just ASCII), and that both the other options would preserve whitespace. In that case, the two could be merged and potentially only use single braces, as the keywords are prefixed with a sigil. 2a: I'm less excited about this syntax. I like the simple 'this is complex, use code-mode' marker, but I'm not a fan of the current code-mode syntax. What detracts me the most are the freestanding 'when' parameters and the requirement for quoting all patterns. The pattern syntax and whitespace handling as I would expect and what I would like to see in 1a as a replacement for 3a: Same thoughts as about 1a, might be easier to write since you'd have to balance less braces. I would like to see more simplified patterns syntax as mentioned above. Regarding sigils: I like the look of both the |
- Candidate 3a uses a sigil-keyword sequence `%when` that required at least some additional escaping. | ||
|
||
It is reasonable to think that we might modify this particular part of the syntax | ||
to improve usability. **_Keep in mind the need for single-line authoring._** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do "single-line authoring" mean that:
- you must be able to express any possible functionality on a single line
or - Carriage Returns are irrelevant. Any MF2.0 string must render the same if all CRs are removed.
or - something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the former. Single-line authoring means that syntactically-allowed-but-not-required whitespace is removed. It is shorthand for: many people will author messages in a resource format in which the message is a string with file-local escaping. Think about that in addition to the pretty formatting we use in examples like this:
{input $var :function opt=val}
{%match {$foo}}
{%when foo}
You have {$foo}
{$when *}
You have really {$foo}
That might be single line as:
myMessage = "{input $var :function opt=val}{%match {$foo}}{%when foo}You have {$foo}{$when *}You have really {$foo}"
(Note that I have trimmed whitespace off of the pattern in the single-line example.)
Why is there an unbalanced The set of options that we've been discussing in the last month are based on the idea that "simple messages occur most frequently, so let's make that easier to type, and the tradeoff is that the non-simple messages get a little harder". In other words, we drop the delimiters on the simple message so that it starts in text mode, and then we add delimiters to non-simple messages to make them start in code mode. Okay, fine, sounds reasonable. The key aspect of Option 2a (to me): it solves the above goal with the least amount of churn and complexity. Our current syntax has no caveats or gotchas about how to interpret a message. It's pretty unambiguous and concise. Relative to our current syntax, Option 2a just moves around delimiters from simple to non-simple like a sort of Conversation of Delimiters going on. The problem with the optional delimiters that Options 1a and 3a introduce is really a chain reaction of multiple decisions that lead us to contradict our previous decisions:
How do we determine what goes in the Comparison Matrix? I will propose a suggestion here to include the following 2 columns, based on the above observations of the chain reaction of complexity, or based on important impact we've been urged to consider:
|
exploration/syntax-exploration-2.md
Outdated
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes | | ||
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ | | ||
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - | | ||
| 2a | Text first, current syntax for complex messages | - | + | - | - | + | | ||
| 3a | Use sigils for code mode | + | - | + | + | + | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes | | |
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ | | |
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - | | |
| 2a | Text first, current syntax for complex messages | - | + | - | - | + | | |
| 3a | Use sigils for code mode | + | - | + | + | + | | |
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes | Amt of Complexity Added | Legibility of single-line | |
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ | :-------------- | :-------------- | |
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - | | - | | - | | |
| 2a | Text first, current syntax for complex messages | - | + | - | - | + | | + | | + | | |
| 3a | Use sigils for code mode | + | - | + | + | + | | - | | - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@echeran The table was meant to include mainly objective differences between the candidates. I think "complexity added" and "legibility of single-line" are somewhat subjective. Do you agree?
The one part of this table that should probably change in that light is "doesn't require quoted pattern" should probably be "allows unquoted pattern" and I will make that change in a second. There are strong arguments for why quoted-pattern is a feature and not something to avoid, which I expect proponents of 2a
will include in their thinking for why they like that candidate.
While I think 2a is a viable option forward, it also introduces some new complexity, which wasn't present in the original always-start-in-code-mode syntax. So it's not strictly equivalent to (complexity of original) + (complexity of simple unquoted patterns), because the sum itself introduces new challenges. I think part of the ongoing excercise is to address this emergent complexity. In 2a, variant patterns are delimited in a way that's unseen in the simple message mode. It's essentially two syntaxes in one (thanks, @eemeli, for making this point yesterday when we chatted). Furthermore, it nests braces in a way that may be surprising, even if it's internally consistent. Some people that I've talked to thought this was too noisy; others appreciated the unambiguity. I've come to accept that there isn't one correct answer here. I attempted to illustrate my mental model about 2a vs. 1a/3a in the following drawing: 2a requires "ascending" to layer 2, which is again a text layer; I call it the nested model. 1a and 3a "descend" back to layer 0 when they enter variant patterns; I call it the flat model. Both models are valid models of thinking, but my current opinion is that going back to layer 0 is something that many people will expect to be able to do. Additionally, the flat model can be transformed to a nested model if we allow to optionally delimit variant patterns with Interstingly enough, I think this illustration hints at some extra challenges related to whitespace handling in the flat model, that we've been discussing. At the same time, it may suggest a possible solution: what if we shifted our mental models to thinking about trimming around statements, rather than around patterns? Lastly, I think we're missing a fourth option: introduce a marker that enters code mode like in 2a (back when it was using
|
Among the 3 (plus suggested variants thereof), I like the simplicity of 1a the best. The worst part of 1a is the leading/trailing spaces. I presume the example below is illustrating that with the alternatives.
I suggest a change to 1a to disallow the {{. The leading/trailing spaces are going to be unusual, and having a single way to achieve them is going to increase overall understandability. So I favor just the following, for the rare cases where it is needed.
|
Assuming the goal here is to preserve the whitespace, I would simplify to this:
If those simple braces introduce problems, this could be an alternative:
|
Thanks all for "overnight" (from Please don't vote on this PR. The goal, you'll recall, is to merge it and then discuss (including stack ranking) on an issue that I raise today. Please, to the degree possible, do not try to lobby in this PR either. Try to focus on material changes to specific candidates or the wording representing them. Currently unaddressed from the above is the question @echeran raises about the code mode delimiters for option 2a. We can do one of two things to address this:
@macchiati Note that we have a design document about pattern exterior whitespace (PEWS) here. The PEWS handling is not really a part of the design choice, except to note that @stasm I love the picture: it made my morning. All: Is there any support for adding an "option 4" (noting that it would be related to the "blocks" family of options in our original list (this would seem to be "option 5b" 😉). I don't want to creep back into having a large number of options. @stasm would you really favor "5b" or is this more of a thought experiment ? |
I would strongly prefer keeping only one 2-ish candidate, esp. as its presentation includes the note:
Regarding the representation of that candidate, I think it should be up to the people who previously voted highly for it, i.e. @macchiati, @echeran, @markusicu, @mihnita, and @stasm. If there's disagreement, we should probably revert to its previous
I would strongly prefer not adding a new candidate to this selection round. Selecting an overall direction for our syntax will still allow for "block" exploration as a follow-on step, much like selecting a syntax now may allow for external-whitespace considerations to be made (again) as a follow-on step. |
@echeran noted:
This came from conversation near the end of the 2023-10-16 call, which the notes capture partially, e.g.:
The specific option (Note: I have personal opinions about this, but this comment is merely to answer the question "where did that come from") |
I'm in favor of an unbalanced delimiter, ideally not composed of a curly brackets. The reason is that because curlies are used in placeholders, I've already seen a few people from the small sample that I approached attempt to put text around the |
This comment was marked as off-topic.
This comment was marked as off-topic.
I'd say it's a viable alternative to 1a and 3a in the family of autotrimming syntaxes. I also think it can be a reasonable middleground between 1a/3a and 2a due to its block preamble. That said, I do appreciate the need to keep the list short. Could this extra proposal (5b) be considered instead of 3a? The |
Having discussed offline with @echeran and @stasm (and attempted to reach others, with no success), I'm going to merge @echeran's changes into this PR and then merge the PR per our discussion in teleconference. I will raise a new "voting" issue and send email/slack to the group explaining the next steps. Thanks to all contributors. |
Co-authored-by: Elango Cheran <[email protected]>
Co-authored-by: Elango Cheran <[email protected]>
Co-authored-by: Elango Cheran <[email protected]>
Co-authored-by: Elango Cheran <[email protected]>
From my perspective: the reason why the balanced ( If we tag "enter in code mode" with something else ( Next, why open with A option would be something like |
Separate issue. I've been trying to think more like an HTML developer, also checked again the dom localization proposal, the Google soy format (which is kind of a templating language). And I think that the "automatic trimming of spaces" will also hurt people used to html. Let' say I do this: <style>
.foo { white-space: pre; }
#bar { white-space: pre-wrap; }
</style>
...
<p>
Hello world one!
</p>
<p space="preserve"> Hello world two! </p>
<p class="foo"> Hello world three! </p>
<p id="bar"> Hello world four! </p> This will render with a space in front of the first message, and preserves all spaces for messages 2, 3 and 4. Now I am asked to internationalize this and prepare for translation. Using DOM localization. So I do: <style>
.foo { white-space: pre; }
#bar { white-space: pre-wrap; }
</style>
...
<p l10n="msg1">
Hello world one!
</p>
<p l10n="msg2" space="preserve"> Hello world two!</p>
<p l10n="msg3" class="foo"> Hello world three!</p>
<p l10n="msg3" id="bar"> Hello world four!</p> and the "message catalog" (might even be extracted automatically, {
"msg1": "Hello world one!",
"msg2": " Hello world two!",
"msg3": " Hello world three!",
"msg3": " Hello world four!"
} One would expect everything to render 100% the same. But IF the messages automatically go through MF2, the spaces in msg2, 3, and 4 are trimmed (by MF2). So it is one of those where "ah, this looks familiar", but then I am hurt by it because it really isn't the same. Yes, the answer is "if you want your spaces wrap the message in But why should I be hurt by that and forced to fix it? That is the reason why I am arguing for WYSIWYG, both in simple mode and in complex mode. **Note: ** I chose json to store the strings instead if the properties-like format in the proposal to not introduce another layer of unknown behavior with the message catalog (I don't know if the proposed TLDR: trimming will actually hurt people familiar with the HTML behavior. |
@mihnita Thanks for this. I know the whitespace issue is tempting, but, as noted elsewhere, I don't think it's that material to the choice of core syntax. Admittedly, I agree that there are many strings for which there is pattern-significant whitespace that needs to both be exposed to the localization process and not trimmed by the MF2 parser. However, I think we can only concern ourselves with the message string that is actually presented to the MF2 parser. It's useful to note that some file syntaxes may unhelpfully trim MF2 messages stored in them. But that's a problem for a different part of the tech stack. We need to focus on our needs, not that of putative resource formats. Similarly, we might carefully preserve whitespace throughout the authoring and localization process only for HTML (or some other presentation environment) to trim the formatted output of MF2. This is also not our problem, so long as our API was faithful in producing the correct results, external spaces and all. For me, the concern has two parts: (i) should we allow unquoted patterns? and (ii) if we do, how do we do boundary detection on the resulting unquoted patterns? If we don't allow unquoted patterns, boundary detection is not an issue. The tradeoff for quoting is whether we inconvenience authors of 100% of patterns to support what appears to be a smaller number of space-significant patterns within those messages. The answer to this can be "yes", particularly if the resulting syntax is also highly consistent and easy to write. If we allow unquoted patterns, the problem becomes "how can I tell unintentional pattern exterior whitespace (PEWS) from intentional PEWS?" There are a number of examples in the whitespace design doc. When we alter the syntax, do we want to require authors to be fastidious about whitespace or not? Option 7 in the whitespace design document answers that "yes". Other options in that design doc showed alternatives. We have a rough consensus around optionally quoting patterns in a 1a/3a-like syntax. But first we need to choose a syntax 🙈 |
No description provided.