Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare three remaining options (draft) #496

Merged
merged 23 commits into from
Oct 18, 2023
Merged

Conversation

aphillips
Copy link
Member

No description provided.

@aphillips aphillips requested review from stasm and eemeli October 16, 2023 19:41
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
@vdelau
Copy link

vdelau commented Oct 17, 2023

I think I'm still missing some context to fully understand what is proposed, but let me provide some feedback:

1a: I personally like the direction of this syntax. I don't really understand the differences between unquoted, quoted and 'bare' patterns. My expectation would be that 'bare' patterns consolidate whitespace (any whitespace, not just ASCII), and that both the other options would preserve whitespace. In that case, the two could be merged and potentially only use single braces, as the keywords are prefixed with a sigil.

2a: I'm less excited about this syntax. I like the simple 'this is complex, use code-mode' marker, but I'm not a fan of the current code-mode syntax. What detracts me the most are the freestanding 'when' parameters and the requirement for quoting all patterns. The pattern syntax and whitespace handling as I would expect and what I would like to see in 1a as a replacement for {{ / }} and {| / |}.

3a: Same thoughts as about 1a, might be easier to write since you'd have to balance less braces. I would like to see more simplified patterns syntax as mentioned above.

Regarding sigils: I like the look of both the # and % characters. While I can see how # could be problematic, any format would have to deal with these types of characters anyway. I could see that using more typical quote symbols would be more problematic and would probably require additional escaping. I also like the >> sigil, as it reminds me a bit of a REPL prompt or HEREDOC. I think using the closing symbol in this way would not have such a strong need for pairing. It might still be problematic for XML syntaxes.

- Candidate 3a uses a sigil-keyword sequence `%when` that required at least some additional escaping.

It is reasonable to think that we might modify this particular part of the syntax
to improve usability. **_Keep in mind the need for single-line authoring._**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do "single-line authoring" mean that:

  • you must be able to express any possible functionality on a single line
    or
  • Carriage Returns are irrelevant. Any MF2.0 string must render the same if all CRs are removed.
    or
  • something else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the former. Single-line authoring means that syntactically-allowed-but-not-required whitespace is removed. It is shorthand for: many people will author messages in a resource format in which the message is a string with file-local escaping. Think about that in addition to the pretty formatting we use in examples like this:

{input $var :function opt=val}
{%match {$foo}}
{%when foo}
    You have {$foo}
{$when *}
    You have really {$foo}

That might be single line as:

myMessage = "{input $var :function opt=val}{%match {$foo}}{%when foo}You have {$foo}{$when *}You have really {$foo}"

(Note that I have trimmed whitespace off of the pattern in the single-line example.)

exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
@echeran
Copy link
Collaborator

echeran commented Oct 18, 2023

Why is there an unbalanced >> for Option 2a, where did that come from? Yes, I see the note about the sigil choice ("Do not fixate on the specific character..."), but it's the unbalanced part that completely misses the point about Option 2a.

The set of options that we've been discussing in the last month are based on the idea that "simple messages occur most frequently, so let's make that easier to type, and the tradeoff is that the non-simple messages get a little harder". In other words, we drop the delimiters on the simple message so that it starts in text mode, and then we add delimiters to non-simple messages to make them start in code mode. Okay, fine, sounds reasonable.

The key aspect of Option 2a (to me): it solves the above goal with the least amount of churn and complexity.

Our current syntax has no caveats or gotchas about how to interpret a message. It's pretty unambiguous and concise. Relative to our current syntax, Option 2a just moves around delimiters from simple to non-simple like a sort of Conversation of Delimiters going on.

The problem with the optional delimiters that Options 1a and 3a introduce is really a chain reaction of multiple decisions that lead us to contradict our previous decisions:

  1. With optional delimiters, you can't have a selection message anymore that looks like match {$foo} when bar1 I'm only happy when it rains when bar2 .... If you did, you would have to say "pattern delimiters are optional, except when you have a select message and your pattern contains the word 'when' ", which is terrible.
  2. To solve that problem, you realize that now you need to tack on a sigil to the keywords (3a), or you wrap all code-mode stuff in delimiters (1a).
  3. If we tack on a sigil to keywords (3a), now we've implicitly contradicted the result of our lengthy discussion in mid-2022 about sigils/delimiters vs. keywords. One camp was in favor of using 3 pairs of delimiters in code mode ({, }, [, ], (, )) because it was unambiguous and concise, the other camp wanted keywords because it was clear and friendly for humans, and the SQL-like nature was seen as a plus point. We went with keywords for human friendliness. Adding sigils makes it looks more like code again. Where do we stand on sigils/delimiters vs. keywords in mid-2023? has our stance changed? do we want to revisit this? can we be clear so we can stay consistent in the future?
  4. By wrapping each code mode bit in order to stay in text mode (1a), we have significantly reduced the legibility of messages in their single-line representation. Try to pick out the patterns from within the entire non-simple message in the 2 following examples:
    {#match {$foo}}{#when foo}Hello {$foo} you have a {$var}{#when *}{$foo} hello you have a {$var}
    
    
    match {$foo}when foo{Hello {$foo} you have a {$var}}when *{{$foo} hello you have a {$var}}
    
    The first is Option 1a, the second is Option 2a. With 1a, you've obviously reduced legibility, which is the high order bit. The fact that Option 1a "conserves delimiters" by moving them from the patterns to the code mode portions doesn't help or hurt, but it does have a cost to it.

How do we determine what goes in the Comparison Matrix? I will propose a suggestion here to include the following 2 columns, based on the above observations of the chain reaction of complexity, or based on important impact we've been urged to consider:

  1. Amount of complexity introduced
  2. Legibility of the single-line representation

Comment on lines 12 to 16
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes |
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ |
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - |
| 2a | Text first, current syntax for complex messages | - | + | - | - | + |
| 3a | Use sigils for code mode | + | - | + | + | + |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes |
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ |
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - |
| 2a | Text first, current syntax for complex messages | - | + | - | - | + |
| 3a | Use sigils for code mode | + | - | + | + | + |
| Option | Description | Doesn’t Nest {} | Doesn’t Need More Escapes | Doesn’t Require Quoted Pattern | Counted {} works | Multiple Expression Syntaxes | Amt of Complexity Added | Legibility of single-line
| :----- | :------------------------------------------------------------- | :-------------- | :------------------------ | :----------------------------- | :--------------- | :------------ | :-------------- | :--------------
| 1a | Invert for text mode, distinguish statements from placeholders | - | + | + | + | - | | - | | - |
| 2a | Text first, current syntax for complex messages | - | + | - | - | + | | + | | + |
| 3a | Use sigils for code mode | + | - | + | + | + | | - | | - |

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echeran The table was meant to include mainly objective differences between the candidates. I think "complexity added" and "legibility of single-line" are somewhat subjective. Do you agree?

The one part of this table that should probably change in that light is "doesn't require quoted pattern" should probably be "allows unquoted pattern" and I will make that change in a second. There are strong arguments for why quoted-pattern is a feature and not something to avoid, which I expect proponents of 2a will include in their thinking for why they like that candidate.

exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
exploration/syntax-exploration-2.md Outdated Show resolved Hide resolved
@stasm
Copy link
Collaborator

stasm commented Oct 18, 2023

The set of options that we've been discussing in the last month are based on the idea that "simple messages occur most frequently, so let's make that easier to type, and the tradeoff is that the non-simple messages get a little harder". In other words, we drop the delimiters on the simple message so that it starts in text mode, and then we add delimiters to non-simple messages to make them start in code mode. Okay, fine, sounds reasonable.

The key aspect of Option 2a (to me): it solves the above goal with the least amount of churn and complexity.

While I think 2a is a viable option forward, it also introduces some new complexity, which wasn't present in the original always-start-in-code-mode syntax. So it's not strictly equivalent to (complexity of original) + (complexity of simple unquoted patterns), because the sum itself introduces new challenges. I think part of the ongoing excercise is to address this emergent complexity.

In 2a, variant patterns are delimited in a way that's unseen in the simple message mode. It's essentially two syntaxes in one (thanks, @eemeli, for making this point yesterday when we chatted).

Furthermore, it nests braces in a way that may be surprising, even if it's internally consistent. Some people that I've talked to thought this was too noisy; others appreciated the unambiguity. I've come to accept that there isn't one correct answer here.


I attempted to illustrate my mental model about 2a vs. 1a/3a in the following drawing:

Screenshot 2023-10-18 at 13 29 37

2a requires "ascending" to layer 2, which is again a text layer; I call it the nested model. 1a and 3a "descend" back to layer 0 when they enter variant patterns; I call it the flat model. Both models are valid models of thinking, but my current opinion is that going back to layer 0 is something that many people will expect to be able to do.

Additionally, the flat model can be transformed to a nested model if we allow to optionally delimit variant patterns with {{/}} (or some other syntax).

Interstingly enough, I think this illustration hints at some extra challenges related to whitespace handling in the flat model, that we've been discussing. At the same time, it may suggest a possible solution: what if we shifted our mental models to thinking about trimming around statements, rather than around patterns?


Lastly, I think we're missing a fourth option: introduce a marker that enters code mode like in 2a (back when it was using {{ }} for code mode), but also go back to layer 0 for variant patterns, like 1a/3a. Something like the following (all sigils TBD):

{{ input {$count :number}
   match {$count :plural}
}}                         ← The }} ends the code-only preamble.
   %[1] One thing.         ← Delimit the variant keys rather than text.
   %[*] Many things.       ↵

@macchiati
Copy link
Member

Among the 3 (plus suggested variants thereof), I like the simplicity of 1a the best.

The worst part of 1a is the leading/trailing spaces. I presume the example below is illustrating that with the alternatives.

{#match {$foo :function option=value} {$bar :function option=value}}
{#when a b} {{  {$foo} is {$bar}  }}
{#when x y} {{  {$foo} is {$bar}  }}
{#when * *} {|  |}{$foo} is {$bar}{|  |}

I suggest a change to 1a to disallow the {{. The leading/trailing spaces are going to be unusual, and having a single way to achieve them is going to increase overall understandability. So I favor just the following, for the rare cases where it is needed.

{#match {$foo :function option=value} {$bar :function option=value}}
{#when a b} {|  |}{$foo} is {$bar}{|  |}
{#when x y} {|  |}{$foo} is {$bar}{|  |}
{#when * *} {|  |}{$foo} is {$bar}{|  |}

@vdelau
Copy link

vdelau commented Oct 18, 2023

I suggest a change to 1a to disallow the {{. The leading/trailing spaces are going to be unusual, and having a single way to achieve them is going to increase overall understandability. So I favor just the following, for the rare cases where it is needed.

{#match {$foo :function option=value} {$bar :function option=value}}
{#when a b} {|  |}{$foo} is {$bar}{|  |}
{#when x y} {|  |}{$foo} is {$bar}{|  |}
{#when * *} {|  |}{$foo} is {$bar}{|  |}

Assuming the goal here is to preserve the whitespace, I would simplify to this:

{#match {$foo :function option=value} {$bar :function option=value}}
{#when a b} {  {$foo} is {$bar}  }
{#when x y} {  {$foo} is {$bar}  }
{#when * *} {  {$foo} is {$bar}  }

If those simple braces introduce problems, this could be an alternative:

{#match {$foo :function option=value} {$bar :function option=value}}
{#when a b} {|  {$foo} is {$bar}  |}
{#when x y} {|  {$foo} is {$bar}  |}
{#when * *} {|  {$foo} is {$bar}  |}

@aphillips
Copy link
Member Author

Thanks all for "overnight" (from America/Los_Angeles point of view) contributions.

Please don't vote on this PR. The goal, you'll recall, is to merge it and then discuss (including stack ranking) on an issue that I raise today. Please, to the degree possible, do not try to lobby in this PR either. Try to focus on material changes to specific candidates or the wording representing them.

Currently unaddressed from the above is the question @echeran raises about the code mode delimiters for option 2a. We can do one of two things to address this:

  1. Everyone agrees to a single candidate for the code mode sigil (>> or {} or whatever for open-only or we agree to go to @echeran's enclosing syntax)
  2. I create an "option 2b" with Elango's syntax

@macchiati Note that we have a design document about pattern exterior whitespace (PEWS) here. The PEWS handling is not really a part of the design choice, except to note that 2a (which always quotes the pattern) does not need to handle PEWS. We can revisit the PEWS handling after we have chosen a syntax, if necessary (although I hope it is not)

@stasm I love the picture: it made my morning. All: Is there any support for adding an "option 4" (noting that it would be related to the "blocks" family of options in our original list (this would seem to be "option 5b" 😉). I don't want to creep back into having a large number of options. @stasm would you really favor "5b" or is this more of a thought experiment ?

@eemeli
Copy link
Collaborator

eemeli commented Oct 18, 2023

Currently unaddressed from the above is the question @echeran raises about the code mode delimiters for option 2a. We can do one of two things to address this:

  1. Everyone agrees to a single candidate for the code mode sigil (>> or {} or whatever for open-only or we agree to go to @echeran's enclosing syntax)

  2. I create an "option 2b" with Elango's syntax

I would strongly prefer keeping only one 2-ish candidate, esp. as its presentation includes the note:

The use of >> to represent the "starting code-mode sigil" is not final. Do not fixate on the specific character sequence when choosing (or not) this design.

Regarding the representation of that candidate, I think it should be up to the people who previously voted highly for it, i.e. @macchiati, @echeran, @markusicu, @mihnita, and @stasm. If there's disagreement, we should probably revert to its previous {{ ... }} representation.

Is there any support for adding an "option 4" (noting that it would be related to the "blocks" family of options in our original list (this would seem to be "option 5b" 😉). I don't want to creep back into having a large number of options. @stasm would you really favor "5b" or is this more of a thought experiment ?

I would strongly prefer not adding a new candidate to this selection round. Selecting an overall direction for our syntax will still allow for "block" exploration as a follow-on step, much like selecting a syntax now may allow for external-whitespace considerations to be made (again) as a follow-on step.

@aphillips
Copy link
Member Author

@echeran noted:

Why is there an unbalanced >> for Option 2a, where did that come from?

This came from conversation near the end of the 2023-10-16 call, which the notes capture partially, e.g.:

MIH: I’m not against removing the double curlies etc. If we have another way to enter code mode instead of curlies, that’s good but then we need to escape the sigil. Two open curlies require closing them. It feels wrong to not close opened brackets. Once you enter code mode, you stay in code mode.

The specific option >> was suggested (possibly as a joke?) by Ujjwal in the chat. Lacking another suggestion, I used that, thinking other options might be hotly debated (hence the text you quote above and the section about available sigils at the end).

(Note: I have personal opinions about this, but this comment is merely to answer the question "where did that come from")

@stasm
Copy link
Collaborator

stasm commented Oct 18, 2023

Regarding the representation of that candidate, I think it should be up to the people who previously voted highly for it (...)

I'm in favor of an unbalanced delimiter, ideally not composed of a curly brackets. The reason is that because curlies are used in placeholders, I've already seen a few people from the small sample that I approached attempt to put text around the match. This was one of the main reasons why we picked code-first mode in #256. Can we add this risk to the table?

@stasm

This comment was marked as off-topic.

@stasm
Copy link
Collaborator

stasm commented Oct 18, 2023

@stasm I love the picture: it made my morning. All: Is there any support for adding an "option 4" (noting that it would be related to the "blocks" family of options in our original list (this would seem to be "option 5b" 😉). I don't want to creep back into having a large number of options. @stasm would you really favor "5b" or is this more of a thought experiment

I'd say it's a viable alternative to 1a and 3a in the family of autotrimming syntaxes. I also think it can be a reasonable middleground between 1a/3a and 2a due to its block preamble. That said, I do appreciate the need to keep the list short.

Could this extra proposal (5b) be considered instead of 3a? The when syntax is the same in 3a and 5b. OTOH, 3a's match/input/local statements are similar to 1a in that each of them needs to be introduced separately by a special character. 5b brings something new to the discussion.

@aphillips
Copy link
Member Author

Having discussed offline with @echeran and @stasm (and attempted to reach others, with no success), I'm going to merge @echeran's changes into this PR and then merge the PR per our discussion in teleconference. I will raise a new "voting" issue and send email/slack to the group explaining the next steps. Thanks to all contributors.

@aphillips aphillips merged commit df76819 into main Oct 18, 2023
@aphillips aphillips deleted the aphillips-syntax-exploration-2 branch October 18, 2023 19:53
@mihnita
Copy link
Collaborator

mihnita commented Oct 18, 2023

From my perspective: the reason why the balanced ({{...}}) delimiters in 2a is because as developers we are trained (Pavlov reflex level) that open brackets (no matter what kind) should have matching closing brackets.
Otherwise the compiler will yell at us, runtime will throw exceptions, the world will collapse.

If we tag "enter in code mode" with something else (#/bin/mf2 :-), all good with me :-)

Next, why open with { and not something else (>>, #/, whatever): the fact that we already require the plain text to escape the {. And we start in text mode.
If we choose something else, then the next question is: "but what if I need my message to start with that?"
And now we need a way to escape that "other" thing.

A option would be something like {#mf} (or whatever). Correct open/close, and no extra character to escape.

@mihnita
Copy link
Collaborator

mihnita commented Oct 18, 2023

Separate issue.

I've been trying to think more like an HTML developer, also checked again the dom localization proposal, the Google soy format (which is kind of a templating language).

And I think that the "automatic trimming of spaces" will also hurt people used to html.

Let' say I do this:

<style>
  .foo { white-space: pre; }
  #bar { white-space: pre-wrap; }
</style>
...
<p>
   Hello world one!
</p>
<p space="preserve">           Hello world two!      </p>
<p class="foo">           Hello world three!      </p>
<p id="bar">           Hello world four!      </p>

This will render with a space in front of the first message, and preserves all spaces for messages 2, 3 and 4.

Now I am asked to internationalize this and prepare for translation. Using DOM localization.

So I do:

<style>
  .foo { white-space: pre; }
  #bar { white-space: pre-wrap; }
</style>
...
<p l10n="msg1">
   Hello world one!
</p>
<p l10n="msg2" space="preserve">           Hello world two!</p>
<p l10n="msg3" class="foo">           Hello world three!</p>
<p l10n="msg3" id="bar">           Hello world four!</p>

and the "message catalog" (might even be extracted automatically, gettext-like):

{
"msg1": "Hello world one!",
"msg2": "           Hello world two!",
"msg3": "           Hello world three!",
"msg3": "           Hello world four!"
}

One would expect everything to render 100% the same.
All I did was move the strings in a "string bundle".

But IF the messages automatically go through MF2, the spaces in msg2, 3, and 4 are trimmed (by MF2).
And things don't work like before, where I had one or more leading spaces rendered.

So it is one of those where "ah, this looks familiar", but then I am hurt by it because it really isn't the same.
Our trimming of spaces interacts (negatively) with the way the browser treats spaces.

Yes, the answer is "if you want your spaces wrap the message in {...}, it is allowed (and optional)"

But why should I be hurt by that and forced to fix it?
I already control what happens with the spaces somewhere else (in html or css).
Every time you try to control one single behavior with several switches we are asking for trouble, because they interfere.
And as a translator (sometimes even as a developer) I have no idea what the css says about the spaces.
They are there. Should I escape them, or not?

That is the reason why I am arguing for WYSIWYG, both in simple mode and in complex mode.
So in 2a the simple message Hello world! does not trim the spaces.
The storage file might do that.
Or the rendering engine (HTML?) might do that.
But the string that the MF2 API sees should not do that. If it does, it hinders more than helps.

**Note: ** I chose json to store the strings instead if the properties-like format in the proposal to not introduce another layer of unknown behavior with the message catalog (I don't know if the proposed .messages trims the spaces or not)


TLDR: trimming will actually hurt people familiar with the HTML behavior.

@aphillips
Copy link
Member Author

@mihnita Thanks for this.

I know the whitespace issue is tempting, but, as noted elsewhere, I don't think it's that material to the choice of core syntax. Admittedly, 2a provides a syntax that quotes the pattern, and thus has what you're calling WYSIWYG built-in. It might be one of the things that makes you prefer that syntax. However, I would caution you not to overlook the other syntaxes if what you really want to do is ensure that we force the patterns to be quoted inside the syntax. Each of them can also quote the pattern and a useful discussion will be whether to require this or not.

I agree that there are many strings for which there is pattern-significant whitespace that needs to both be exposed to the localization process and not trimmed by the MF2 parser.

However, I think we can only concern ourselves with the message string that is actually presented to the MF2 parser. It's useful to note that some file syntaxes may unhelpfully trim MF2 messages stored in them. But that's a problem for a different part of the tech stack. We need to focus on our needs, not that of putative resource formats.

Similarly, we might carefully preserve whitespace throughout the authoring and localization process only for HTML (or some other presentation environment) to trim the formatted output of MF2. This is also not our problem, so long as our API was faithful in producing the correct results, external spaces and all.

For me, the concern has two parts: (i) should we allow unquoted patterns? and (ii) if we do, how do we do boundary detection on the resulting unquoted patterns?

If we don't allow unquoted patterns, boundary detection is not an issue. The tradeoff for quoting is whether we inconvenience authors of 100% of patterns to support what appears to be a smaller number of space-significant patterns within those messages. The answer to this can be "yes", particularly if the resulting syntax is also highly consistent and easy to write.

If we allow unquoted patterns, the problem becomes "how can I tell unintentional pattern exterior whitespace (PEWS) from intentional PEWS?" There are a number of examples in the whitespace design doc. When we alter the syntax, do we want to require authors to be fastidious about whitespace or not?

Option 7 in the whitespace design document answers that "yes". Other options in that design doc showed alternatives. We have a rough consensus around optionally quoting patterns in a 1a/3a-like syntax. But first we need to choose a syntax 🙈

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants