Unicode Character Geometry Modifiers

In the terminal world, character width detection is not well defined and is context dependent, which is the cause of the following issues:

No way to specify a custom width for displayed characters.
No way to simultaneously display the same characters in both narrow and wide variants.
No way to use triple and quadruple characters along with narrow and wide.
Different assumptions about character widths in applications and terminal emulators.
No way to display wide characters partially.
No way to display characters higher two cells.
No way to display subcell sized characters.
No way to rotate or mirror characters.

Character Matrix

By defining that the graphical representation of a character is a cellular matrix (1x1 matrix consists of one fragment), the concept of "wide/narrow" can be completely avoided.

1x1	2x2	3x1

Each character is a sequence of codepoints (one or more) - this is the so-called grapheme cluster. Using a font, this sequence is translated into a glyph run. The final scaling and rasterization of the glyph run is done into a rectangular terminal cell matrix, defined either implicitly based on the Unicode properties of the cluster codepoints, or explicitly using a modifier codepoint from the Unicode codepoint range 0xD0000-0xD02A2.

Matrix fragments up to 8x4 cells require at least four associated integer values, which can be packed into Unicode codepoint space by enumerating "wh_xy" values:

w: Character matrix width.
h: Character matrix height.
x: Horizontal fragment selector inside the matrix.
y: Vertical fragment selector inside the matrix.
For character matrices larger than 8x4, pixel graphics should be used.

Terminals can annotate each scrollback cell with character matrix metadata and use it to display either the entire character image or a specific fragment within the cell.

Users can explicitly specify the size of the character matrix (by zeroing _xy) or select any fragment of it (non-zero _xy) by placing a specific modifier character after the grapheme cluster.

Example 1. Output a 3x1 character:

pwsh
```
"👩‍👩‍👧‍👧`u{D0033}"
```

wsl/bash

printf "👩‍👩‍👧‍👧\UD0033\n"

Example 2. Output a 6x2 character (by stacking two 6x1 fragments on top of each other due to the linear nature of the terminal):

pwsh

"👩‍👩‍👧‍👧`u{D00C9}`n👩‍👩‍👧‍👧`u{D00F6}"

wsl/bash

printf "👩‍👩‍👧‍👧\UD00C9\n👩‍👩‍👧‍👧\UD00F6\n"

Screenshot:

Helper functions

Example functions for converting between modifier codepoints and character matrix parameter tuples wh_xy.

struct wh_xy
{
    int w, h, x, y;
};
static int p(int n) { return n * (n + 1) / 2; }
const int kx = 8; // Max width of the character matrix.
const int ky = 4; // Max height of the character matrix.
const int mx = p(kx + 1); // Lookup table boundaries.
const int my = p(ky + 1); //
const int unicode_block = 0xD0000; // Unicode codepoint block for geometry modifiers.

// Returns the modifier codepoint value for the specified tuple w, h, x, y.
static int modifier(int w, int h, int x, int y) { return unicode_block + p(w) + x + (p(h) + y) * mx; };

// Returns a tuple w, h, x, y for the specified codepoint modifier using a static lookup table.
static wh_xy matrix(int codepoint)
{
    static auto lut = []
    {
        auto v = std::vector(mx * my, wh_xy{});
        for (auto w = 1; w <= kx; w++)
        for (auto h = 1; h <= ky; h++)
        for (auto y = 0; y <= h; y++)
        for (auto x = 0; x <= w; x++)
        {
            v[p(w) + x + (p(h) + y) * mx] = wh_xy{ w, h, x, y };
        }
        return v;
    }();
    return lut[codepoint - unicode_block];
}

Grapheme cluster boundaries

By default, grapheme clustering occurs according to Unicode UAX #29 https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules.

To set arbitrary boundaries, the C0 control character ASCII 0x02 STX is used, signaling the beginning of a grapheme cluster. The closing character of a grapheme cluster in that case is always a codepoint from the range 0xD0000-0xDFFFF, which sets the dimension of the character matrix. All codepoints between STX and the closing codepoint that sets the matrix size will be included in the grapheme cluster.

Another brick in the wall

At present only standardized variation sequences with VS1, VS2, VS3, VS15 and VS16 have been defined; VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji) respectively.

VS4–VS14 (U+FE03–U+FE0D) are not used for any variation sequences

https://en.wikipedia.org/wiki/Variation_Selectors_(Unicode_block)
https://www.unicode.org/Public/UNIDATA/StandardizedVariants.txt
https://www.unicode.org/reports/tr51/tr51-16.html#Direction

So, let's try to play this way:

Glyph run alignment inside the matrix

VS	Codepoint	Axis	Alignment
VS4	0xFE03	Horizontal	Left
VS5	0xFE04	Horizontal	Center
VS6	0xFE05	Horizontal	Right
VS7	0xFE06	Vertical	Top
VS8	0xFE07	Vertical	Middle
VS9	0xFE08	Vertical	Bottom

Notes:

We are not operating at a low enough level to support justified alignment.
By default, glyphs are aligned on the baseline at the writing origin.

Matrix rotation and flips

VS	Codepoint	Fx
VS10	0xFE09	Rotate 90° CCW
VS11	0xFE0A	Rotate 180° CCW
VS12	0xFE0B	Rotate 270° CCW
VS13	0xFE0C	Horizontal flip
VS14	0xFE0D	Vertical flip

Example functions for applying a rotation operation to the current three bits integer state:

void VS10(int& state) { state = (state & 0b100) | ((state + 0b001) & 0b011); }
void VS11(int& state) { state = (state & 0b100) | ((state + 0b010) & 0b011); }
void VS12(int& state) { state = (state & 0b100) | ((state + 0b011) & 0b011); }
void VS13(int& state) { state = (state ^ 0b100) | ((state + (state & 1 ? 0 : 0b010)) & 0b011); }
void VS14(int& state) { state = (state ^ 0b100) | ((state + (state & 1 ? 0b010 : 0)) & 0b011); }

int get_angle(int state) { int angle = 90 * (state & 0b011); return angle; }
int get_hflip(int state) { int hflip = state >> 2; return hflip; }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

character_geometry.md

character_geometry.md

Unicode Character Geometry Modifiers

Character Matrix

Helper functions

Grapheme cluster boundaries

Another brick in the wall

Glyph run alignment inside the matrix

Matrix rotation and flips

Summary

Files

character_geometry.md

Latest commit

History

character_geometry.md

File metadata and controls

Unicode Character Geometry Modifiers

Character Matrix

Helper functions

Grapheme cluster boundaries

Another brick in the wall

Glyph run alignment inside the matrix

Matrix rotation and flips

Summary