Skip to content

Remove resource data from repo, #81#85

Open
paulirwin wants to merge 1 commit into
NightOwl888:mainfrom
paulirwin:issue/81
Open

Remove resource data from repo, #81#85
paulirwin wants to merge 1 commit into
NightOwl888:mainfrom
paulirwin:issue/81

Conversation

@paulirwin

Copy link
Copy Markdown
Collaborator

Fixes #81

@paulirwin

Copy link
Copy Markdown
Collaborator Author

GitHub's UI does not take kindly to deleting ~2500 files, so if you want an easier way to just list the files in the commit that are removed to verify them (after pulling down my commit so it has that hash):

git diff-tree --no-commit-id --name-status 1712233 -r | more

paulirwin added a commit to paulirwin/ICU4N that referenced this pull request Apr 20, 2026
ReorderingBuffer:
- Rewrote <remarks> so the pool-backing mention reflects that
  ValueStringBuilder is only conditionally backed by an ArrayPool rental
  (either via the initialCapacity ctor or by outgrowing a stackalloc'd
  initialBuffer), not unconditionally.
- Removed the stale "ICU4N TODO: Evaluate whether this approach makes
  sense and if not, remove" comment above the
  (ReadOnlySpan<char>, int) ctor — this PR's fix relies on that ctor
  shape, so the design question is settled.
- Added internal ReorderingBuffer(Normalizer2Impl, StringBuilder?,
  Span<char>) and (..., int) ctors that Append the StringBuilder
  directly into the inner ValueStringBuilder, eliminating the outer
  stackalloc/pool dance that the previous fix used to materialize
  `first` into a span before handing it to ReorderingBuffer.

Normalizer2Impl.Decompose(ReadOnlySpan<char>, StringBuilder, int):
- Restored the "ICU4N TODO: Make public TryDecompose()..." reminder
  above the method (it previously lived above the removed
  (ref ValueStringBuilder, int) overload; the reviewer wants it kept
  as a standing note).

Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ...):
- Collapsed to the single-allocation form enabled by the new
  StringBuilder? ctors. No more intermediate char[]/stackalloc to
  copy `first` into before building the ReorderingBuffer.

FCDUTF16CollationIterator.Normalize:
- Reworked per reviewer's suggestion: hoist estimatedLength/value
  above the OpenStringBuilder allocation, size OpenStringBuilder with
  capacity up front when creating it fresh (avoids internal growths),
  and restore the "ICU4N: Corrected 2nd parameter" translation marker
  on the slice expression.

Restored ICU4N translation-marker comments on call sites I rewrote:
- FCDUTF16CollationIterator.cs:496 — "Corrected 2nd parameter"
- RuleBasedCollator.cs:1208 — "value is sliced prior to passing to
  this method, nfdQCYesLimit is based on this slice"
- RuleBasedCollator.cs:1226 — "Corrected 2nd parameter"
- RuleBasedCollator.cs:1485 — "Corrected 2nd parameter" (was "3rd",
  now matches the revised slice)
- RuleBasedCollator.cs:1502 — new "Corrected 2nd parameter" on the
  initial-value slice
- RuleBasedCollator.cs:1508 — "Corrected 2nd parameter" (was "3rd")

Regression test TestNormalizeSecondAndAppendConcurrency:
- Dropped the nfkc_cf.nrm embedded resource from ICU4N.Tests.csproj
  (it was duplicating production data; will break once NightOwl888#85 removes
  the source file from the repo).
- Switched to Normalizer2.GetInstance(null, "nfkc_cf", Compose), the
  same null-stream pattern every other Normalizer2 test uses. The
  test now inherits the same runtime dependency on the
  ICU4N.resources satellite as the rest of the suite.
- Tried the reviewer's suggestion of basing the test on testnorm.nrm
  but confirmed via TDD it cannot reproduce the bug: testnorm's
  maximum expansion is ~17% (one case goes 6→7 UTF-16 units), well
  under the pool-bucket rounding (~2×) that determines whether Grow
  fires. Without Grow the aliasing hazard never manifests. Documented
  this rationale inline in the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NightOwl888 pushed a commit that referenced this pull request Apr 20, 2026
…2 output (#122)

* Fixed concurrency issue in ReorderingBuffer that corrupted Normalizer2 output under contention

ReorderingBuffer had an internal constructor that took a `ref ValueStringBuilder` and
assigned it by value into the `str` field. Because ValueStringBuilder is a ref struct
that rents a char[] from ArrayPool<char>.Shared (stored in _arrayToReturnToPool), the
struct copy aliased the rental between the outer caller and the ReorderingBuffer. When
Grow() fired on `str`, the old array was returned to the pool while the outer caller's
ValueStringBuilder still held a reference to it and would return it again on Dispose().
Under concurrency two threads could rent the same array simultaneously and corrupt each
other's normalization output — reproducible at >10% mismatch rates with 8 threads running
NormalizeSecondAndAppend on NFKC_CF input that forces Grow() via ligature expansion.

Removed the unsafe ReorderingBuffer(Normalizer2Impl, ref ValueStringBuilder, int) ctor
and rewrote the three call sites to use the existing safe constructors that take an
initial ReadOnlySpan<char> value plus an owned buffer:

  - Normalizer2Impl.Decompose(scoped ReadOnlySpan<char>, scoped ref ValueStringBuilder, int)
  - Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ReadOnlySpan<char>, bool)
  - Norm2AllModes.NormalizeSecondAndAppend(scoped ref ValueStringBuilder, ReadOnlySpan<char>, bool)
  - RuleBasedCollator.MakeFCD call site in the NFD identical-level-run branch

Also removes a now-unsound "HACK" line in Decompose that manually transferred the length
from the ReorderingBuffer's aliased ValueStringBuilder back to `dest`; the fixed version
owns its own buffer and copies the result into `dest` explicitly.

Added a regression test (TestNormalizeSecondAndAppendConcurrency) that runs 8 threads x
2000 iterations of NormalizeSecondAndAppend against NFKC_CF with an input that forces
Grow() via FB03 "ffi" ligature expansion, and fails loudly if any thread's output
diverges from the single-threaded expected result. Before this fix: ~2200/16000
mismatches. After: 0. The test embeds nfkc_cf.nrm as a resource in the test assembly so
it runs without requiring the ICU4N.resources satellite assembly (which depends on
tooling that isn't available on every dev machine).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address PR #122 review feedback

ReorderingBuffer:
- Rewrote <remarks> so the pool-backing mention reflects that
  ValueStringBuilder is only conditionally backed by an ArrayPool rental
  (either via the initialCapacity ctor or by outgrowing a stackalloc'd
  initialBuffer), not unconditionally.
- Removed the stale "ICU4N TODO: Evaluate whether this approach makes
  sense and if not, remove" comment above the
  (ReadOnlySpan<char>, int) ctor — this PR's fix relies on that ctor
  shape, so the design question is settled.
- Added internal ReorderingBuffer(Normalizer2Impl, StringBuilder?,
  Span<char>) and (..., int) ctors that Append the StringBuilder
  directly into the inner ValueStringBuilder, eliminating the outer
  stackalloc/pool dance that the previous fix used to materialize
  `first` into a span before handing it to ReorderingBuffer.

Normalizer2Impl.Decompose(ReadOnlySpan<char>, StringBuilder, int):
- Restored the "ICU4N TODO: Make public TryDecompose()..." reminder
  above the method (it previously lived above the removed
  (ref ValueStringBuilder, int) overload; the reviewer wants it kept
  as a standing note).

Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ...):
- Collapsed to the single-allocation form enabled by the new
  StringBuilder? ctors. No more intermediate char[]/stackalloc to
  copy `first` into before building the ReorderingBuffer.

FCDUTF16CollationIterator.Normalize:
- Reworked per reviewer's suggestion: hoist estimatedLength/value
  above the OpenStringBuilder allocation, size OpenStringBuilder with
  capacity up front when creating it fresh (avoids internal growths),
  and restore the "ICU4N: Corrected 2nd parameter" translation marker
  on the slice expression.

Restored ICU4N translation-marker comments on call sites I rewrote:
- FCDUTF16CollationIterator.cs:496 — "Corrected 2nd parameter"
- RuleBasedCollator.cs:1208 — "value is sliced prior to passing to
  this method, nfdQCYesLimit is based on this slice"
- RuleBasedCollator.cs:1226 — "Corrected 2nd parameter"
- RuleBasedCollator.cs:1485 — "Corrected 2nd parameter" (was "3rd",
  now matches the revised slice)
- RuleBasedCollator.cs:1502 — new "Corrected 2nd parameter" on the
  initial-value slice
- RuleBasedCollator.cs:1508 — "Corrected 2nd parameter" (was "3rd")

Regression test TestNormalizeSecondAndAppendConcurrency:
- Dropped the nfkc_cf.nrm embedded resource from ICU4N.Tests.csproj
  (it was duplicating production data; will break once #85 removes
  the source file from the repo).
- Switched to Normalizer2.GetInstance(null, "nfkc_cf", Compose), the
  same null-stream pattern every other Normalizer2 test uses. The
  test now inherits the same runtime dependency on the
  ICU4N.resources satellite as the rest of the suite.
- Tried the reviewer's suggestion of basing the test on testnorm.nrm
  but confirmed via TDD it cannot reproduce the bug: testnorm's
  maximum expansion is ~17% (one case goes 6→7 UTF-16 units), well
  under the pool-bucket rounding (~2×) that determines whether Grow
  fires. Without Grow the aliasing hazard never manifests. Documented
  this rationale inline in the test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Task: Remove resource data from the ICU4N repository

1 participant