-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please consider adding the generated markdown directly in this repo #2
Comments
Hey @wez, all good. I now just don't know what to do with this ticket. For certain tasks I think LaTeX is just better suited, especially since it can render to Markdown as well. I am always open to improve the publisher format, and therfore I am absolutely open to suggestions. But apart from that, also feedback to the spec is very welcome. There are (in the end) not yet many terminals that do in fact try to move forward on the grapheme cluster end, and I think TUI/CLI apps need this kind of discoverability to gain trust and also start relying on the modern way of laying out complex graphemes in the terminal. I think we can have the markdown uploaded to a github.io page upon push/merge to master branch, such that it is easy to read from there as well (should suit you for sure) |
re: the spec, I think it sounds fine. FWIW, wezterm reports permanently-enabled for this setting and doesn't allow disabling it. wez/wezterm#4223 was a request to offer application level control, but as part of looking into it, I decided that it was a lot of effort to undo what I was already doing :-p re: the markdown and this issue, I don't have a strong preference on the implementation details, but I think the goal should be to make it as quick and easy as possible for someone to view it, without having extra steps to download or open a helper application. Personally, I would probably just check it in directly, but deploying it to GH pages is also OK. |
I don't read tex natively and it's super inconvenient to download and read the markdown outside of just allowing github to render it here. The PDF format doesn't respect my dark mode preference either!
I'm going to cheekily paste in the current version of the markdown here so that I can read the spec in the meantime!
author:
date: '2021-09-04 (draft, revision 1)'
title: |
Unicode in Terminals
a proposal to standardizing basic Unicode features
History and current state
Historically, only 7-bit characters with C0 control codes were supported
by terminals and different languages by selecting their respective code
pages.
Later on this was extended to 8-bit ASCII and along with C1 control
codes.
With the introduction of Unicode there were no need to have codepages
anymore, but the Unicode spec was not explicitly designed to also cover
terminals, except that C0 and C1 codepoints were preserved.
With Unicode UTF-8 it was possible to at least pass Unicode characters
to the terminal, but rendering of a few characters as well as their
respective cursor placement is not defined in the Unicode standard.
Also, Unicode introduced codepoint sequences that are mapping to a
single user perceived character - so called grapheme clusters. The
terminal has never attempted any formalization on how to deal with
grapheme clusters, variation selectors, their east asian width, nor
emoji and emoji presentation handling.
This spec tries to address some of the problems terminals are suffering
with Unicode today.
Backwards Compatibility
basic points are: Everything is disabled by default, so legacy apps
don't break more than they used to break already.
Backwards compatibility is retained by leaving everything as undefined
as it is without this specification.
The application can test for the availability of this feature and has to
explicitly enable it in order to get the set of properties as defined in
this document guaranteed.
Future Compatibility and Stability
Unicode itself had a major breakage at version between version 8 and 9
with regards to some codepoints having their east asian width changed.
While this may happen any time again, we do not expect that to happen
that soon nor that frequent to address future incompatibilities as of
this spec and leave this for a later point.
Feature and Mode State Detection
[
CSI ? 2027 $ p
]{style="background-color: light-gray"}([ref:DECRQM]{reference-type="ref"reference="ref:DECRQM"}) can be used for testing the availability of
this feature as well as the current mode the terminal is in with regards
to this specification, the
[
CSI ? 2027 $ p
]{style="background-color: light-gray"}reply willindigate each state acurately enough not not need any new VT sequence
introduced.
Mode Switching
[
CSI ? 2027 h
]{style="background-color: light-gray"}([ref:DECSM]{reference-type="ref"
reference="ref:DECSM"}) for ensuring conformance to all rules as
defined by this specification
[
CSI ? 2027 l
]{style="background-color: light-gray"}([ref:DECRM]{reference-type="ref"
reference="ref:DECRM"}) for undefined behavior
Semantics
The following set of semantics MUST be adhered to if this VT mode
[
2027
]{style="background-color: light-gray"} is enabled. If the VTmode [
2027
]{style="background-color: light-gray"} is not set, then thebehavior is as undefined as if this specification was not implemented at
all in order to retain behavior of current terminals and their legacy
applications.
Grapheme Cluster
{#section .unnumbered}
With this mode enabled, the terminal MUST support grapheme clusters
in conformance to algorithm as described in UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"}.
{#section-1 .unnumbered}
This implies that every consecutively written character on the terminal
stream that is non-breakable as per UTS 29
[ref:UTS-29]{reference-type="ref"
reference="ref:UTS-29"} will always end up in the same terminal's grid
cell.
{#section-2 .unnumbered}
Therefore, extending a grapheme cluster with consecutively added
codepoints will not move the cursor except for variation selector 16
(VS16) that may have caused the width of the grapheme cluster to change
to wide (2 grid cells).
{#section-3 .unnumbered}
When the cursor moves to a grid cell that contains a complete or
incomplete grapheme cluster, this grid cell's contents will be erased
and overwritten rather then textually concatinated.
{#section-4 .unnumbered}
Therefore cursor movement semantics of the terminal remain unchanged.
Emoji
{#section-5 .unnumbered}
Emoji symbols are always rendered in square aspect ratio (as proposed by
UTS 51 [ref:UTS-51]{reference-type="ref"
reference="ref:UTS-51"}), implying a East Asian Width of Wide, 2 grid
cells.
{#section-6 .unnumbered}
ZWJ emoji are required to be displayed as a single image with a width of
2 grid cells.
{#section-7 .unnumbered}
The alternate display of ZWJ emoji in a decomposed sequence of
sub-images must not be used as a fallback as it will break cursor
movemeent guarantees.
{#section-8 .unnumbered}
If a ZWJ emoji cannot be rendered the display behavior is undefined -
for example, a unicode replacement character
[
U+FFFD
]{style="background-color: light-gray"} could be displayedinstead.
{#section-9 .unnumbered}
In emoji emoji presentation, the cursor will always move by 2 grid
cells.
{#section-10 .unnumbered}
SGR attributes applied to a grid cell containing an emoji symbol are not
strictly defined and it is left to the terminal emulator to have
sensible meaningful semantics with regards to emoji symbols.
Variation Selector 16
VS16 promotes the grapheme cluster to emoji emoji presentation, implying
that this will force the grapheme cluster's width to be 2, which may
possibly cause reflowing of that symbol to the next line if on right
margin with AutoWrap mode is set.
Variation Selector 15
{#section-11 .unnumbered}
VS15 forces the grapheme cluster to emoji text presentation. This will
NOT change the underlying width but only change the display to
prefer textual non-colored presentation.
{#section-12 .unnumbered}
This matches the behavior of todays web browsers and should thus feel
most intuitive to users.
{#section-13 .unnumbered}
The cursor will move by columns if the symbol has the default
presentation of emoji.
Margins and AutoWrap with Emoji
Emoji written at the right margin with AutoWrap mode disabled may or may
not be rendered in half or not be displayed at all. This behavior is
undefined to ease implementation and adoption of this specification.
References
[[ref:DECRQM]]{#ref:DECRQM label="ref:DECRQM"}DECRQM,
https://vt100.net/docs/vt510-rm/DECRQM.html
[[ref:DECSM]]{#ref:DECSM label="ref:DECSM"}DECSM,
https://vt100.net/docs/vt510-rm/SM.html
[[ref:DECRM]]{#ref:DECRM label="ref:DECRM"}DECRM,
https://vt100.net/docs/vt510-rm/RM.html
[[ref:UTS-29]]{#ref:UTS-29 label="ref:UTS-29"}UTS 29, Grapheme
segmentation algorithm
https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules
[[ref:UTS-51]]{#ref:UTS-51 label="ref:UTS-51"}UTS 51, Unicode
Emoji https://unicode.org/reports/tr51/#Display, paragraph 2
The text was updated successfully, but these errors were encountered: