Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Description on how soft line breaks are treated in browsers doesn't consider the combination of Firefox & Chinese/Japanese text and should be improved #744

Open
tats-u opened this issue Aug 15, 2023 · 4 comments

Comments

@tats-u
Copy link

tats-u commented Aug 15, 2023

https://spec.commonmark.org/0.30/#softbreak

The description on a soft line break looked ambiguous or questionable for me.

A soft line break may be rendered in HTML either as a line ending or as a space.

Does it mean Markdown-to-HTML converters are allowed to convert a soft line break in Markdown to either of "\n" (or possibly "\r" or "\r\n") or " " in HTML? Generally "may" in specifications means "is allowed to and does not have to" (RFC 2119) and confused me.
I have no idea when "a soft line break is rendered in HTML as a line ending". Is it when whitespace: preserve or some other values is passed in CSS? Also is there a case when a soft line break is rendered in HTML as other than a line ending or a space?

The result will be the same in browsers.

This is wrong. How "\n" in HTML is rendered differs among browsers when Chinese or Japanese are contained.

https://drafts.csswg.org/css-text-4/#line-break-transform

Then any remaining segment break is either transformed into a space (U+0020) or removed depending on the context before and after the break. The rules for this operation are UA-defined in this level.

This means how a soft line break is rendered depends on browsers' implementations.

In languages that have no word separators, such as Chinese, “unbreaking” a line requires joining the two lines with no intervening space.

這個段落是那麼長,
在一行寫不行。最好
用三行寫。

這個段落是那麼長,在一行寫不行。最好用三行寫。

Only Firefox follows this recommendation as of now. (However, spaces are inserted like the other browsers when copied and pasted on somewhere else!)

https://codepen.io/tats-u/pen/YzdKKyN

<p lang="zh-Hant-tw">這個段落是那麼長,
在一行寫不行。最好
用三行寫。</p>

image
↑Firefox (intended)

image
↑Edge (WebKit / Blink / IE; not intended; space after "," is selected)

https://codepen.io/tats-u/pen/poQQVyR (what kind of CJK letters remove a newline between them? → Korean is treated like alphanumeric characters unlike Japanese)

<p>잘자
(
잘자
)
잘자
잘자
あああ
(
あああ
)
ああああ
1
ああ
ああ
。
1
。</p>

image
↑ FIrefox

image
↑Edge (and other WebKit & Blink based browsers / IE)

https://spec.commonmark.org/dingus/?text=%23%23%20%E6%97%A5%E6%9C%AC%E8%AA%9E%E3%81%A8%E4%B8%AD%E5%9B%BD%E8%AA%9E%E3%81%AE%E4%BE%8B%0A%0A%E3%81%93%E3%82%8C%E3%81%AF%E6%97%A5%E6%9C%AC%0A%E8%AA%9E%E3%81%AE%E6%96%87%E7%AB%A0%E3%81%A7%0A%E3%81%99%E3%80%82%E8%BF%99%E6%98%AF%E4%B8%80%0A%E4%B8%AA%E4%B8%AD%E6%96%87%E5%8F%A5%E5%AD%90%E3%80%82%0A%0A

image
↑Firefox (looks natural)

image
↑Edge (Wekit / Blink / IE; doesn't look natural)

From these results, we can conclude only "\n" between Chinese or Japanese letters (han/kana) or punctuation marks is removed instead of replaced with " " in Firefox.

Also,

The result will be the same in browsers.

This sentence must be replaced with like:

The result will follow the CSS Text Module specification and might depend on browsers (but should be same in languages that use a space to segment words).

@tats-u tats-u changed the title Description on how soft line breaks are treated in browsers Description on how soft line breaks are treated in browsers should be improved Aug 15, 2023
@tats-u tats-u changed the title Description on how soft line breaks are treated in browsers should be improved Description on how soft line breaks are treated in browsers doesn't consider the combination of Firefox & Chinese/Japanese text and should be improved Aug 15, 2023
@wooorm
Copy link
Contributor

wooorm commented Aug 15, 2023

The phrasing is a bit weird in my opinion, “rendered in HTML”, more like: “when compiled to HTML, a soft line break may be shown as a line ending or as a space”.

To recap this issue:

  • CSS was changed to allow browsers to be smarter in some cases
  • Some browsers now have different defaults, so this text in the CM spec is no longer correct

Right?

Compared to your suggestion, I don’t think it’s good to mention deep specs.
How about:

- (A soft line break may be rendered in HTML either as a [line ending](https://spec.commonmark.org/0.30/#line-ending) or as a space. The result will be the same in browsers. In the examples here, a [line ending](https://spec.commonmark.org/0.30/#line-ending) will be used.)
+ (A soft line break may be shown by browsers as a [line ending](https://spec.commonmark.org/0.30/#line-ending), a space, or nothing at all. In the examples here, a [line ending](https://spec.commonmark.org/0.30/#line-ending) will be used.)

I’d also personally prefer to be a bit stronger in our markdown spec, and say that we actually specify \n -> \n (trimmed)?

@tats-u
Copy link
Author

tats-u commented Aug 16, 2023

“when compiled to HTML, a soft line break may be shown as a line ending or as a space”

It is much clearer than the expression in the spec.

CSS was changed to allow browsers to be smarter in some cases

Correct. The first change is introduced in the Working Draft 15 of the Text Module Level 3 in 2011.

Otherwise, if the script context on one side of the line feed is Hangul, then the line feed is converted to a space (U+0020).
Otherwise, if the East Asian Width property [UAX11] of both the character before and after the line feed is F, W, or H (not A), then the line feed is removed.

The behavior changed to browsers-defined in 2021 because of w3c/csswg-drafts#5086. A strict rule existed in the version just before it. (WebKit-based browsers and IE didn't follow it at all though)

As you know, no browsers except for Firefox have not followed since today even though more than 10 years passed. Firefox changed its behavior in 2008.

Some browsers now have different defaults, so this text in the CM spec is no longer correct

We might have to say "The CM spec has ignored the behavior of some browsers." instead. It depends on when the first CM spec before v0.5 (in 2014) was published. Firefox's current behavior has existed since 2008. I don't believe Firefox's change is earlier because Markdown seems to have been born in 2004. At least we can't say "now" because Firefox's change is as many as 15 years old.

Compared to your suggestion, I don’t think it’s good to mention deep specs.

FYI, at first I thought HTML itself had decided the rule and tried to find one in the HTML spec, but I couldn't. Finally I found it in the CSS spec instead. I do not want readers of the CM spec to repeat the same mistake. I want those who want to find the most basic specification to access to the CSS spec first instead of the HTML spec.

(A soft line break may be shown by browsers as a line ending, a space, or nothing at all. In the examples here, a line ending will be used.)

It will be clearer if we split the description in the former sentence into 2 phases:

  • Markdown => HTML
  • HTML => rendering by browsers

A soft line break must be converted to (rendered as) a line ending or a space in HTML. In the examples here, a line ending will be used. A line ending in HTML is rendered as a space or simply removed by browsers.

@wooorm
Copy link
Contributor

wooorm commented Aug 16, 2023

at first I thought HTML itself had decided the rule

For HTML, it’s all “inter-element whitespace”.
CM cares about HTML, not really about CSS.

I do not want readers of the CM spec to repeat the same mistake.

Can you put this “mistake” into concrete words? What are you worried about that other people might do?

I want those who want to find the most basic specification to access to the CSS spec first instead of the HTML spec.

What?

It will be clearer if we split the description in the former sentence into 2 phases:

I don’t want to talk about CSS, just the markdown -> html part? I feel like it’s better to not touch on CSS if we don’t need it, and keep it simple?

@tats-u
Copy link
Author

tats-u commented Aug 27, 2023

Can you put this “mistake” into concrete words? What are you worried about that other people might do?

I thought HTML also had a rule of how to render “inter-element whitespace" in the screen and tried to find one in the HTML spec first.
I worry other people who want to find a rule like me turn the HTML spec, not the CSS spec, upside down first, too.

What?

Could you tell me what follows after that "What"?

I don’t want to talk about CSS, just the markdown -> html part?

We wouldn't have to mention CSS if the CM spec banned conversion the soft line break to other than a newline.
If it allows to convert it to " " or "", we need to encourage developers of formatters (, renderers ,)and converters that convert the soft linebreak to a space or remove it to align those conversion rules with the rendering rules in browsers.

I want to the CM spec to mean either of the following two (1. or 2.):

  1. Softbreak must be converted to a newline.
    1. Softbreak must be converted to a newline or a space, or just be removed.
    2. Converting it to a newline is the most reliable and recommended way.
    3. If it is not converted to a newline, it is recommended to conform to the way browsers display line breaks.

Once we describe the details of "the way browsers display line breaks" in 2-iii, we won't be able to help mentioning the CSS spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants