Add &nnbsp; entity for U+202F #5121

ygoe · 2019-12-03T11:46:46Z

There's   for U+00A0. It's a full-width no-break space. It can be used between numbers and their short unit names, or in other places.

Typography and regional norms require (or at least recommend) using a thin no-brak space (or narrow no-break space) in several places:

As thousands separator, Source or DIN 5008 (to avoid ambiguous presentation of point or comma)
Between abbreviated words like “z. B.” (German: zum Beispiel), Source
As fine space before certain punctuation in French, Source

(These are the first and best sources I could find now. There may be better or more authoritative sources available, but they're usually hard to find.)

While it is technically possible to create a keyboard layout that produces this character, not many users have this installed and even then it's hard to distinguish it from other space characters when reading and revising text. Most editors don't even show a replacement symbol for this space character.

AFAIK Wikipedia suggests writing   in these places. And that's probably a good idea in team projects as well. But this is actually the wrong character in these places.

To use the correct narrow no-break space, one has to use a different HTML entity representation, like   or   which are frankly hard to remember or recognise.

As a solution, the new entity &nnbsp; should be added to HTML to make it easy to write readable text following the correct typographic rules and recommendations.

The text was updated successfully, but these errors were encountered:

kosek · 2019-12-18T10:33:48Z

If new entity will be added effort should be coordinated with MathML to keep entity definitions synchronized -- https://w3c.github.io/xml-entities/

ygoe · 2020-06-07T18:49:34Z

Mozilla is not interested in this. I guess that's a bad starting point already? I don't have the best experiences with the Chrome developers, maybe I'll try it there anyway.

Unfortunately, entities is something that's not extensible in HTML, so I can't even run my own little happy solution.

Celdron · 2020-08-10T21:02:59Z

If HTML standard evolves, Mozilla, and others, must follow the new specifications, that's an evidence.

I'm currently interested about having &nnbsp;, or equal, entity for a French wiki project, as narrow non-breaking space is recommended in some cases, as explained by ygoe.

Futhermore, HTML entities exist for a numerous characters, in my opinions, almost never used, like &pr; and such.

r12a · 2020-08-11T12:56:58Z

In my opinion this would be extremely useful for French authors, but also for other languages. The NNBSP character was initially added to Unicode for Mongolian suffix handling, where it is important to visually distinguish between spaces separating suffixes and those separating words. It is also being proposed as an ideal fit for a morphological separator in the numerous languages written in the Canadian Aboriginal script (see w3c/amlreq#4). An entity would significantly help authors produce correct (and better machine-readable) text in all these languages.

[@annevk could you add i18n-mlreq and i18n-amlreq labels to the repo, so i can alert those folks to the discussion? Thanks.]

Here is an extension of this issue, which i can raise in a new issue if preferred.

There are other invisible characters for which a named character reference would be very useful for producing correctly authored Unicode text, for the same reasons as mentioned in the first comment. Here, for example, is a list of formatting characters used for Arabic, but most are essential characters for all RTL script-based languages.

Characters with entities:

&zwj;
&zwnj;
&rlm;
&lrm;

Characters without entities:
RLI
LRI
FSI
PDI
RLE
LRE
PDF
RLM
LRM
CGJ
ALM

Keyboards generally don't address the problem of inputting the characters, but it's also a problem that the characters themselves are invisible. It would really help to have Named character references. As someone who works with people who use these languages, and works with them myself, it seems to me that from a user's perspective it would be well worth the effort to add them. I don't remember why that hasn't happened before now.

annevk · 2020-08-14T16:54:39Z

(New labels are to be introduced through https://github.com/whatwg/meta.)

xfq · 2020-08-15T02:12:12Z

(New labels are to be introduced through https://github.com/whatwg/meta.)

I just filed whatwg/meta#182

hsivonen · 2020-10-30T08:25:57Z

I believe I've commented previously along the following lines when this has come up:

For wiki projects, it's irrelevant whether this is in HTML. The wiki software processes the wiki syntax before generating HTML output, so wiki software can introduce whatever macro expansions its developers see fit and users find useful.
In the case of HTML itself, I think the backward-compatibility characteristics of this feature request are bad. The requested feature doesn't expand the expressiveness of HTML in any way: You can already express U+202F unescaped in UTF-8 or escaped as a numeric reference. However, if a named entity was added, it would break in the currently-existing HTML parsers (not only in the currently-existing browsers). This could either lead to unwanted breakage or to lead to non-usage of the feature (i.e. using the numeric form or unescaped UTF-8 anyway for better compat).
Making this change would set a precedent for others to request named entities for characters they find important causing a repeat of the previous point over and over again.

annevk · 2020-10-30T08:29:47Z

Curious to hear what others think, but I tend to agree. Perhaps the best course of action here would be to update https://github.com/whatwg/html/blob/master/FAQ.md and close these type of feature requests.

fantasai · 2020-10-30T15:13:18Z

@hsivonen I think what makes this request a bit different from others is that it's for invisible characters. As @r12a points out, it's hard to work with invisible characters. And letting wiki markup handle it isn't helpful at all: this is something that needs to work across all input modes into HTML, because it has to be reliable and consistent to be useful to the people who need them.

So while I understand your general premise about the update cycle being, potentially, 5 years or so, I think it's worth it in this case. If we want to take the time to batch up all the invisible characters we need to care about so we can do it at once, let's do that and make a coordinated update to the parser that makes languages that need invisible characters easier to typeset in HTML.

ygoe · 2020-11-01T13:30:07Z

What wikis or any other applications do is entirely irrelevant here. And following @hsivonen 's argumentation, any progress is bad. So why care at all? Just leave it forever as it was defined some 30 years ago. Never change a running system (which is generally bad advice).

I'm fully aware that not all existing HTML parsers and renderers will properly handle this overnight when it's added. It'll take time. But we're in the fortunate (and also unfortunate) situation that the number of relevant HTML parsers in use is very limited, and these are actively maintained and automatically updated most of the time. So changes like this will eventually trickle through to all users and in a few years we can benefit from it without worrying too much. If you're not willing to wait such a long time, you shouldn't work in such projects. Web projects already have a large number of dependencies on browsers and this could be just one of them. As soon as you discover that all browsers that support everything else you already need also support this entity, you can safely use it.

Also, of course I can use any Unicode character directly. But this one hasn't made it onto physical or software-defined keyboards. As the NBSP. Or the SHY. Or the MINUS. So this argumentation is moot. Also, of course I can escape any Unicode character by its codepoint value. But nobody will remember those numbers, which means that 1. nobody will be able to fluently write these characters and 2. nobody will be able to fluently read and understand them. This is about as big as a usability fail as it can get. Then, we already have similar entities, like NBSP. Why do they exist? I imagine they exist because they cannot be written with keyboards, their codepoint cannot be remembered, this one is even visually indistinguishable from a more common character (SP) and its use is required sometimes.

While not being strictly "required" and not used as often, NNBSP falls exactly in the same category. So I definitely see reason for its existence as an entity. On the other hand, it doesn't hurt anybody. Any undefined HTML entity is invalid markup, and the "nnbsp" entity is undefined, so it can safely be assigned. As could other invisible Unicode whitespace, like some zero-width characters that affect wrapping and/or hyphenation.

hsivonen · 2020-11-02T08:19:31Z

But this one hasn't made it onto physical or software-defined keyboards.

Why is that?

Crissov · 2021-03-14T11:10:39Z

In addition to what @fantasai said, for some characters it’s not about the decision of direct UTF encoding vs. numeric character reference, if there is no named entity reference available, but between the proper character and some inferior replacement character. For invisible characters in particular, that’s either a space or nothing.

Closes #3655. Closes #5121. Closes #6049.

Closes whatwg#3655. Closes whatwg#5121. Closes whatwg#6049.

aphillips · 2023-03-04T22:12:11Z

I was actioned by I18N to reopen this issue.

We are well aware of #7071 which notes that HTML will not add new named character references. The argument in favor of that policy is that newly added named entities would be broken in all parsers (not just browsers) until such time as the parsers adopted the change and that this would be a barrier to use (users would not adopt the new entities because they do not work).

The sense of I18N is that we want to reopen the discussion anyway. We have a particular interest in the new isolating bidi controls, although other invisible characters are also in this request. Invisible characters are hard to use and harder to manage when authoring a page. When using NCRs, the user must memorize the code point number, which is more prone to error. Most of these characters have memorable short names that lend themselves to entities, such as RLI for U+2067 RIGHT TO LEFT ISOLATE.

Adding the invisible characters to the named entity list would not enable users soon, but could become commonly supported in just a few years.

Please advise how best to prosecute this issues and whether you would like to discuss it in our teleconference or some other venue.

ras52 · 2023-12-12T11:04:41Z

I hope it's okay for an outsider to post to this thread. It seems to me that one of the bigger barriers to adding entities is not merely that existing parsers will not recognise them, but more specifically the manner in which they fail. §13.1.2 of the current HTML 5 spec says ambiguous ampersands are invalid in most contexts. That means all bets are off, but in the various browsers I've tested the entity is displayed literally in the text, which is pretty bad in this particular case. The argument is probably to cope with HTML like <p>I ordered fish&chips; John had a pie.</p>, though I wonder how common this really is. (Are there languages where ampersands are commonly used without surrounding space?) If HTML5 starts adding new entities, this is probably no longer the best behaviour. Would it be better to display U+FFFD in place of the full entity-like-thing when an ambiguous ampersand? At least that makes it clear to a reader that something is off, which the raw entity name may not. If so, might it be sensible to change the spec to mandate this behaviour in advance of actually adding new entities?

xfq · 2023-12-14T04:39:58Z

Are there languages where ampersands are commonly used without surrounding space?

I'm not sure, but note that in English, there are words like P&G, R&D, and AT&T that don't have the surrounding space.

annevk · 2023-12-14T08:18:58Z

Apologies for the lack of reply here. I just noticed @aphillips's request to discuss this in person. I'll mark it agenda+ and suggest we discuss it somewhere in January at a time suitable for the US and Europe given the locations of the relevant experts. January 11 looks to be the first available such slot at 9AM PST.

Crissov · 2023-12-14T21:03:05Z

What exactly is the I18N proposal to be discussed?

Introduce named character references for …

… some specific non-spacing (control) characters?
… all existing non-spacing characters?
… all existing and future non-spacing characters?
… some specific non-spacing and whitespace characters?
… all existing non-spacing and whitespace characters?
… all existing and future non-spacing and whitespace characters?

aphillips · 2023-12-14T21:26:42Z

@annevk

Thanks! Let's look for a suitable time slot. I'm not familiar with HTML's call schedule. Would it be possible to do a week later (assuming you have calls weekly??) such as the 18th? That way we could include @r12a, who has previously contributed on this thread. We can also host you in our regular call (Thursdays at 7 AM Pacific)

@Crissov

We would like to discuss the possibility of additions of this type in general. We have specific existing non-spacing characters and, it appears, perhaps a few specific whitespace characters in mind. Obviously, if we "broke the dam" on additions, there is also the question of establishing criteria for any future additions. We do not propose to add named entities in a broad or general sense.

ygoe · 2023-12-14T22:40:47Z

Maybe, what sets this one apart from others is that it's invisible. You could potentially use smart input methods to generate just about any visible character and anybody else reading the document would see it. Of course you can also use smart input methods to generate special white-space characters (like I do with my modified keyboard layout), but the problem is that other people editing the document likely won't see it if they're not familiar with the various spaces and have the tools to see them. So to be safe, it could be a good solution to use a presentation that makes it visible.   is already widely used in Wikipedia content, for example.

So if you're looking for criteria, this might be one. 🙂

annevk · 2023-12-15T10:05:26Z

@aphillips for that time slot the next one is Feb 22. There are two other meetings, but one is not useful for Europe and one is not useful for the US. Getting WHATNOT participants to join another meeting could maybe work, but it probably requires explicitly pinging some people and making sure they can all make it which is not work I can sign up for right now. Maybe next year.

aphillips · 2023-12-15T15:04:44Z

@annevk Thanks. This isn't urgent, so let's go for February? Thinking aloud, perhaps we (meaning me) should make a list of I18N issues that could use attention ahead of time and we can have a section of the call for I18N?

annevk · 2023-12-15T15:08:37Z

Sounds good to me!

annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser labels Dec 3, 2019

r12a added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. i18n-alreq Notifies Arabic script experts of relevant issues labels Aug 11, 2020

w3cbot mentioned this issue Aug 12, 2020

Add &nnbsp; entity for U+202F w3c/i18n-activity#955

Open

xfq mentioned this issue Aug 15, 2020

Add more i18n labels whatwg/meta#182

Closed

domenic mentioned this issue Oct 11, 2020

Add entity "&diameter;" for U+2300 #6049

Closed

hsivonen added i18n-amlreq Notifies experts in languages of the Americas of relevant issues i18n-mlreq Notifies traditional Mongolian script experts of relevant issues labels Oct 30, 2020

domenic added the agenda+ To be discussed at a triage meeting label Jul 22, 2021

past mentioned this issue Aug 5, 2021

Upcoming HTML standard issue triage meeting on 8/5/2021 #6884

Closed

past mentioned this issue Sep 2, 2021

Upcoming HTML standard issue triage meeting on 9/2/2021 #6936

Closed

past removed the agenda+ To be discussed at a triage meeting label Sep 2, 2021

domenic added a commit that referenced this issue Sep 14, 2021

Note that we will not add new named character references

1b37f8c

Closes #3655. Closes #5121. Closes #6049.

domenic mentioned this issue Sep 14, 2021

Note that we will not add new named character references #7071

Merged

domenic closed this as completed in #7071 Sep 20, 2021

domenic added a commit that referenced this issue Sep 20, 2021

Note that we will not add new named character references

ef29c39

Closes #3655. Closes #5121. Closes #6049.

past mentioned this issue Oct 11, 2021

Upcoming HTML standard issue triage meeting on 10/7/2021 #7014

Closed

mfreed7 pushed a commit to mfreed7/html that referenced this issue Jun 3, 2022

Note that we will not add new named character references

5662df8

Closes whatwg#3655. Closes whatwg#5121. Closes whatwg#6049.

aphillips reopened this Mar 4, 2023

annevk added the agenda+ To be discussed at a triage meeting label Dec 14, 2023

past mentioned this issue Dec 15, 2023

Upcoming WHATNOT meeting on 12/14/2023 #9966

Closed

aphillips mentioned this issue Dec 15, 2023

Make list of HTML issues for February 22 call with WHATWG w3c/i18n-actions#64

Closed

past mentioned this issue Jan 11, 2024

Upcoming WHATNOT meeting on 1/11/2024 #9993

Closed

past mentioned this issue Jan 26, 2024

Upcoming WHATNOT meeting on 1/25/2024 #10052

Closed

past mentioned this issue Feb 8, 2024

Upcoming WHATNOT meeting on 2/8/2024 #10094

Closed

past mentioned this issue Feb 22, 2024

Upcoming WHATNOT meeting on 2/22/2024 #10128

Closed

past removed the agenda+ To be discussed at a triage meeting label Feb 22, 2024

annevk mentioned this issue Apr 25, 2024

Provide named character entities for invisible and ambiguous Unicode characters #10297

Open

Crissov mentioned this issue Sep 7, 2024

[css-text] &ncsp; - Non-Collapsible Space w3c/csswg-drafts#10821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add &nnbsp; entity for U+202F #5121

Add &nnbsp; entity for U+202F #5121

ygoe commented Dec 3, 2019

kosek commented Dec 18, 2019

ygoe commented Jun 7, 2020 •

edited

Loading

Celdron commented Aug 10, 2020 •

edited

Loading

r12a commented Aug 11, 2020 •

edited

Loading

annevk commented Aug 14, 2020

xfq commented Aug 15, 2020

hsivonen commented Oct 30, 2020

annevk commented Oct 30, 2020

fantasai commented Oct 30, 2020 •

edited

Loading

ygoe commented Nov 1, 2020

hsivonen commented Nov 2, 2020

Crissov commented Mar 14, 2021

aphillips commented Mar 4, 2023

ras52 commented Dec 12, 2023 •

edited

Loading

xfq commented Dec 14, 2023

annevk commented Dec 14, 2023

Crissov commented Dec 14, 2023 •

edited

Loading

aphillips commented Dec 14, 2023

ygoe commented Dec 14, 2023 •

edited

Loading

annevk commented Dec 15, 2023

aphillips commented Dec 15, 2023

annevk commented Dec 15, 2023

Add &nnbsp; entity for U+202F #5121

Add &nnbsp; entity for U+202F #5121

Comments

ygoe commented Dec 3, 2019

kosek commented Dec 18, 2019

ygoe commented Jun 7, 2020 • edited Loading

Celdron commented Aug 10, 2020 • edited Loading

r12a commented Aug 11, 2020 • edited Loading

annevk commented Aug 14, 2020

xfq commented Aug 15, 2020

hsivonen commented Oct 30, 2020

annevk commented Oct 30, 2020

fantasai commented Oct 30, 2020 • edited Loading

ygoe commented Nov 1, 2020

hsivonen commented Nov 2, 2020

Crissov commented Mar 14, 2021

aphillips commented Mar 4, 2023

ras52 commented Dec 12, 2023 • edited Loading

xfq commented Dec 14, 2023

annevk commented Dec 14, 2023

Crissov commented Dec 14, 2023 • edited Loading

aphillips commented Dec 14, 2023

ygoe commented Dec 14, 2023 • edited Loading

annevk commented Dec 15, 2023

aphillips commented Dec 15, 2023

annevk commented Dec 15, 2023

ygoe commented Jun 7, 2020 •

edited

Loading

Celdron commented Aug 10, 2020 •

edited

Loading

r12a commented Aug 11, 2020 •

edited

Loading

fantasai commented Oct 30, 2020 •

edited

Loading

ras52 commented Dec 12, 2023 •

edited

Loading

Crissov commented Dec 14, 2023 •

edited

Loading

ygoe commented Dec 14, 2023 •

edited

Loading