Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .claude/rules/unicode-graphemes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Unicode and Grapheme Handling

**Source:** [CONTRIBUTING.md - Unicode and Grapheme Handling](../../CONTRIBUTING.md#unicode-and-grapheme-handling)

## Core Principle

**Think in graphemes, not runes.** A grapheme cluster is what the user perceives as a single character, but it may consist of multiple `Rune` values (e.g., base character + combining marks, or ZWJ emoji sequences).

## Width Measurement

**Always use `string.GetColumns()`** to measure display width.

```csharp
// ✅ CORRECT — grapheme-aware, handles ZWJ emoji, combining marks, CJK
int width = text.GetColumns ();

// ❌ WRONG — sums individual rune widths, inflates multi-rune grapheme clusters
int width = text.EnumerateRunes ().Sum (r => r.GetColumns ());

// ❌ WRONG — counts chars, not terminal cells (wrong for CJK, emoji, surrogates)
int width = text.Length;
```

**Why:** A ZWJ family emoji like 👨‍👩‍👦‍👦 occupies 2 terminal cells, but `EnumerateRunes().Sum(GetColumns)` returns 8 and `string.Length` returns 11. `string.GetColumns()` correctly returns 2 by iterating grapheme clusters and clamping each to max 2 columns.

## Text Iteration and Rendering

**Iterate by grapheme cluster** using `GraphemeHelper.GetGraphemes()` and render with `AddStr`.

```csharp
// ✅ CORRECT — preserves grapheme cluster integrity
foreach (string grapheme in GraphemeHelper.GetGraphemes (text))
{
AddStr (grapheme);
}

// ❌ WRONG — breaks combining marks, ZWJ sequences
foreach (Rune rune in text.EnumerateRunes ())
{
AddRune (rune);
}
```

**Why:** Combining marks (e.g., `e` + U+0301 combining acute = `é`) must be sent together. `AddRune` sends them separately, preventing composition. `AddStr` with a complete grapheme string preserves the cluster.

## When Rune Iteration Is Appropriate

Rune-level iteration is correct when you need to inspect individual Unicode scalar values rather than render text — for example, counting zero-width runes for vertical text layout, or checking character properties. The key distinction: **rendering and measurement should use graphemes; analysis may use runes.**
32 changes: 32 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,36 @@ Welcome! This guide provides everything you need to know to contribute effective

**⚠️ CRITICAL - These conventions apply to ALL code - production code, test code, examples, documentation, and samples.**

### Unicode and Grapheme Handling

**Think in graphemes, not runes.** A grapheme cluster is what the user perceives as a single character, but it may consist of multiple `Rune` values (e.g., base character + combining marks, or ZWJ emoji sequences).

- **Always use `string.GetColumns()`** to measure display width — never `EnumerateRunes().Sum(r => r.GetColumns())` (inflates multi-rune clusters) or `string.Length` (counts chars, not terminal cells)
- **Iterate by grapheme** using `GraphemeHelper.GetGraphemes()` when rendering text — never iterate by `Rune` and call `AddRune` for each (breaks combining marks and ZWJ sequences)
- **Render with `AddStr`** passing complete grapheme strings — `AddRune` with individual runes from a cluster will not compose correctly

```csharp
// ✅ CORRECT — grapheme-aware width measurement
int width = text.GetColumns ();

// ❌ WRONG — inflates width for ZWJ emoji (e.g., 👨‍👩‍👦‍👦 → 8 instead of 2)
int width = text.EnumerateRunes ().Sum (r => r.GetColumns ());

// ✅ CORRECT — grapheme-aware rendering
foreach (string grapheme in GraphemeHelper.GetGraphemes (text))
{
AddStr (grapheme);
}

// ❌ WRONG — breaks combining marks (é rendered as e + ́ separately)
foreach (Rune rune in text.EnumerateRunes ())
{
AddRune (rune);
}
```

**Exception:** Rune-level iteration is appropriate when inspecting individual Unicode scalar values (e.g., counting zero-width runes for vertical text layout), not for rendering or measurement.

## Testing Requirements

### Code Coverage
Expand Down Expand Up @@ -324,5 +354,7 @@ The workflow will:
- ❌ **Don't use `var` for anything but built-in simple types** (use explicit types)
- ❌ **Don't use redundant type names with `new`** (**ALWAYS PREFER** target-typed `new ()`)
- ❌ **Don't introduce new warnings** (fix warnings in files you modify; exception: `[Obsolete]` warnings)
- ❌ **Don't use `EnumerateRunes().Sum(GetColumns)`** for display width — use `string.GetColumns()`
- ❌ **Don't use `AddRune` in a rune loop** to render text — iterate by grapheme with `GraphemeHelper.GetGraphemes()` and use `AddStr`

**Thank you for contributing to Terminal.Gui!** 🎉