Provide named character entities for invisible and ambiguous Unicode characters #10297
Labels
addition/proposal
New features or enhancements
i18n-needs-resolution
Issue the Internationalization Group has raised and looks for a response on.
needs implementer interest
Moving the issue forward requires implementers to express interest
What problem are you trying to solve?
It is much easier for content authors to spot and work with invisible Unicode characters if they are coded using named entities. Some users have to deal with many such characters on a regular basis (Arabic authors work with 12 or more regularly) and it is difficult to remember the Unicode code points. Others only use these characters infrequently, and it is equally difficult to remember the appropriate code point value when needed. In addition, invisible characters in the code can be problematic to work with, especially if they impact the display (such as paired directional embeddings, in RTL scripts), because they are overlooked or duplicated, or miscopied.
What solutions exist today?
Some of these characters have named character entities, but some of the more frequently used ones do not.
How would you solve it?
The W3C i18n WG proposes the following additions. For convenience, the list includes characters for which we already have named entities; these are indicated using ✅. Possible named entities are suggested for the new items; these are derived from standard Unicode abbreviations, where available.
Latin 1 Supplement — Latin-1 punctuation and symbols
­
Combining Diacritical Marks — Grapheme joiner
&cgj;
Arabic — Format character
&alm;
Ogham — Space
Mongolian — Format controls
&fvs1;
&fvs2;
&fvs3;
&mvs;
&fvs4;
General Punctuation — Spaces
&nqsp;
&mqsp;
 
 
 
 
&6msp;
 
 
 
AND 
 
AND 
AND part of  
(U+0205F U+200A)General Punctuation — Format character
​
AND​
AND​
AND​
AND​
‌
‍
‎
‏
&lre;
&rle;
&pdf;
&lro;
&rlo;
⁠
&lri;
&rli;
&fsi;
&pdi;
We would also like to coin a new
&zwsp;
entity name, in addition to the too long and complicated​
for U+200B.General Punctuation — Separators
&lsep;
&psep;
General Punctuation — Space
&nnbsp;
 
AND part of  
(U+205F U+200A)General Punctuation — Invisible operators
⁡
AND⁡
⁢
⁣
AND⁣
CJK Symbols And Punctuation — CJK symbols and punctuation
&idsp;
Emoji Variation Selectors - turns on and off colour
&vs15;
&vs16;
Potential additional candidates
Hangul Jamo — Old initial consonants
&hcf;
Hangul Jamo — Medial vowels
&hjf;
Hangul Compatibility Jamo — Special character
&hf;
Halfwidth And Fullwidth Forms — Halfwidth Hangul variants
&hwhf;
General Punctuation — Invisible operators
&aafs;
Shorthand Format Controls — Shorthand format controls
Musical Symbols — Beams and slurs
Anything else?
There are other invisible characters which probably do not need entities. The list above selects those most likely to be useful. In particular, only 2 of the many, many variation selectors are listed here – these are the two that are regularly used for emojis.
There may also be a need to support Egyptian hieroglyph formatting controls, some of which will come out with Unicode 16 later this year.
The text was updated successfully, but these errors were encountered: