diff --git a/.claude/rules/unicode-graphemes.md b/.claude/rules/unicode-graphemes.md new file mode 100644 index 0000000000..c1b7ec5b11 --- /dev/null +++ b/.claude/rules/unicode-graphemes.md @@ -0,0 +1,48 @@ +# Unicode and Grapheme Handling + +**Source:** [CONTRIBUTING.md - Unicode and Grapheme Handling](../../CONTRIBUTING.md#unicode-and-grapheme-handling) + +## Core Principle + +**Think in graphemes, not runes.** A grapheme cluster is what the user perceives as a single character, but it may consist of multiple `Rune` values (e.g., base character + combining marks, or ZWJ emoji sequences). + +## Width Measurement + +**Always use `string.GetColumns()`** to measure display width. + +```csharp +// ✅ CORRECT — grapheme-aware, handles ZWJ emoji, combining marks, CJK +int width = text.GetColumns (); + +// ❌ WRONG — sums individual rune widths, inflates multi-rune grapheme clusters +int width = text.EnumerateRunes ().Sum (r => r.GetColumns ()); + +// ❌ WRONG — counts chars, not terminal cells (wrong for CJK, emoji, surrogates) +int width = text.Length; +``` + +**Why:** A ZWJ family emoji like 👨‍👩‍👦‍👦 occupies 2 terminal cells, but `EnumerateRunes().Sum(GetColumns)` returns 8 and `string.Length` returns 11. `string.GetColumns()` correctly returns 2 by iterating grapheme clusters and clamping each to max 2 columns. + +## Text Iteration and Rendering + +**Iterate by grapheme cluster** using `GraphemeHelper.GetGraphemes()` and render with `AddStr`. + +```csharp +// ✅ CORRECT — preserves grapheme cluster integrity +foreach (string grapheme in GraphemeHelper.GetGraphemes (text)) +{ + AddStr (grapheme); +} + +// ❌ WRONG — breaks combining marks, ZWJ sequences +foreach (Rune rune in text.EnumerateRunes ()) +{ + AddRune (rune); +} +``` + +**Why:** Combining marks (e.g., `e` + U+0301 combining acute = `é`) must be sent together. `AddRune` sends them separately, preventing composition. `AddStr` with a complete grapheme string preserves the cluster. + +## When Rune Iteration Is Appropriate + +Rune-level iteration is correct when you need to inspect individual Unicode scalar values rather than render text — for example, counting zero-width runes for vertical text layout, or checking character properties. The key distinction: **rendering and measurement should use graphemes; analysis may use runes.** diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 0b1abf6c57..f7bfc635b6 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -169,6 +169,36 @@ Welcome! This guide provides everything you need to know to contribute effective **⚠️ CRITICAL - These conventions apply to ALL code - production code, test code, examples, documentation, and samples.** +### Unicode and Grapheme Handling + +**Think in graphemes, not runes.** A grapheme cluster is what the user perceives as a single character, but it may consist of multiple `Rune` values (e.g., base character + combining marks, or ZWJ emoji sequences). + +- **Always use `string.GetColumns()`** to measure display width — never `EnumerateRunes().Sum(r => r.GetColumns())` (inflates multi-rune clusters) or `string.Length` (counts chars, not terminal cells) +- **Iterate by grapheme** using `GraphemeHelper.GetGraphemes()` when rendering text — never iterate by `Rune` and call `AddRune` for each (breaks combining marks and ZWJ sequences) +- **Render with `AddStr`** passing complete grapheme strings — `AddRune` with individual runes from a cluster will not compose correctly + +```csharp +// ✅ CORRECT — grapheme-aware width measurement +int width = text.GetColumns (); + +// ❌ WRONG — inflates width for ZWJ emoji (e.g., 👨‍👩‍👦‍👦 → 8 instead of 2) +int width = text.EnumerateRunes ().Sum (r => r.GetColumns ()); + +// ✅ CORRECT — grapheme-aware rendering +foreach (string grapheme in GraphemeHelper.GetGraphemes (text)) +{ + AddStr (grapheme); +} + +// ❌ WRONG — breaks combining marks (é rendered as e + ́ separately) +foreach (Rune rune in text.EnumerateRunes ()) +{ + AddRune (rune); +} +``` + +**Exception:** Rune-level iteration is appropriate when inspecting individual Unicode scalar values (e.g., counting zero-width runes for vertical text layout), not for rendering or measurement. + ## Testing Requirements ### Code Coverage @@ -324,5 +354,7 @@ The workflow will: - ❌ **Don't use `var` for anything but built-in simple types** (use explicit types) - ❌ **Don't use redundant type names with `new`** (**ALWAYS PREFER** target-typed `new ()`) - ❌ **Don't introduce new warnings** (fix warnings in files you modify; exception: `[Obsolete]` warnings) +- ❌ **Don't use `EnumerateRunes().Sum(GetColumns)`** for display width — use `string.GetColumns()` +- ❌ **Don't use `AddRune` in a rune loop** to render text — iterate by grapheme with `GraphemeHelper.GetGraphemes()` and use `AddStr` **Thank you for contributing to Terminal.Gui!** 🎉