Remove resource data from repo, #81#85
Open
paulirwin wants to merge 1 commit into
Open
Conversation
Collaborator
Author
|
GitHub's UI does not take kindly to deleting ~2500 files, so if you want an easier way to just list the files in the commit that are removed to verify them (after pulling down my commit so it has that hash): git diff-tree --no-commit-id --name-status 1712233 -r | more |
3 tasks
paulirwin
added a commit
to paulirwin/ICU4N
that referenced
this pull request
Apr 20, 2026
ReorderingBuffer: - Rewrote <remarks> so the pool-backing mention reflects that ValueStringBuilder is only conditionally backed by an ArrayPool rental (either via the initialCapacity ctor or by outgrowing a stackalloc'd initialBuffer), not unconditionally. - Removed the stale "ICU4N TODO: Evaluate whether this approach makes sense and if not, remove" comment above the (ReadOnlySpan<char>, int) ctor — this PR's fix relies on that ctor shape, so the design question is settled. - Added internal ReorderingBuffer(Normalizer2Impl, StringBuilder?, Span<char>) and (..., int) ctors that Append the StringBuilder directly into the inner ValueStringBuilder, eliminating the outer stackalloc/pool dance that the previous fix used to materialize `first` into a span before handing it to ReorderingBuffer. Normalizer2Impl.Decompose(ReadOnlySpan<char>, StringBuilder, int): - Restored the "ICU4N TODO: Make public TryDecompose()..." reminder above the method (it previously lived above the removed (ref ValueStringBuilder, int) overload; the reviewer wants it kept as a standing note). Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ...): - Collapsed to the single-allocation form enabled by the new StringBuilder? ctors. No more intermediate char[]/stackalloc to copy `first` into before building the ReorderingBuffer. FCDUTF16CollationIterator.Normalize: - Reworked per reviewer's suggestion: hoist estimatedLength/value above the OpenStringBuilder allocation, size OpenStringBuilder with capacity up front when creating it fresh (avoids internal growths), and restore the "ICU4N: Corrected 2nd parameter" translation marker on the slice expression. Restored ICU4N translation-marker comments on call sites I rewrote: - FCDUTF16CollationIterator.cs:496 — "Corrected 2nd parameter" - RuleBasedCollator.cs:1208 — "value is sliced prior to passing to this method, nfdQCYesLimit is based on this slice" - RuleBasedCollator.cs:1226 — "Corrected 2nd parameter" - RuleBasedCollator.cs:1485 — "Corrected 2nd parameter" (was "3rd", now matches the revised slice) - RuleBasedCollator.cs:1502 — new "Corrected 2nd parameter" on the initial-value slice - RuleBasedCollator.cs:1508 — "Corrected 2nd parameter" (was "3rd") Regression test TestNormalizeSecondAndAppendConcurrency: - Dropped the nfkc_cf.nrm embedded resource from ICU4N.Tests.csproj (it was duplicating production data; will break once NightOwl888#85 removes the source file from the repo). - Switched to Normalizer2.GetInstance(null, "nfkc_cf", Compose), the same null-stream pattern every other Normalizer2 test uses. The test now inherits the same runtime dependency on the ICU4N.resources satellite as the rest of the suite. - Tried the reviewer's suggestion of basing the test on testnorm.nrm but confirmed via TDD it cannot reproduce the bug: testnorm's maximum expansion is ~17% (one case goes 6→7 UTF-16 units), well under the pool-bucket rounding (~2×) that determines whether Grow fires. Without Grow the aliasing hazard never manifests. Documented this rationale inline in the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NightOwl888
pushed a commit
that referenced
this pull request
Apr 20, 2026
…2 output (#122) * Fixed concurrency issue in ReorderingBuffer that corrupted Normalizer2 output under contention ReorderingBuffer had an internal constructor that took a `ref ValueStringBuilder` and assigned it by value into the `str` field. Because ValueStringBuilder is a ref struct that rents a char[] from ArrayPool<char>.Shared (stored in _arrayToReturnToPool), the struct copy aliased the rental between the outer caller and the ReorderingBuffer. When Grow() fired on `str`, the old array was returned to the pool while the outer caller's ValueStringBuilder still held a reference to it and would return it again on Dispose(). Under concurrency two threads could rent the same array simultaneously and corrupt each other's normalization output — reproducible at >10% mismatch rates with 8 threads running NormalizeSecondAndAppend on NFKC_CF input that forces Grow() via ligature expansion. Removed the unsafe ReorderingBuffer(Normalizer2Impl, ref ValueStringBuilder, int) ctor and rewrote the three call sites to use the existing safe constructors that take an initial ReadOnlySpan<char> value plus an owned buffer: - Normalizer2Impl.Decompose(scoped ReadOnlySpan<char>, scoped ref ValueStringBuilder, int) - Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ReadOnlySpan<char>, bool) - Norm2AllModes.NormalizeSecondAndAppend(scoped ref ValueStringBuilder, ReadOnlySpan<char>, bool) - RuleBasedCollator.MakeFCD call site in the NFD identical-level-run branch Also removes a now-unsound "HACK" line in Decompose that manually transferred the length from the ReorderingBuffer's aliased ValueStringBuilder back to `dest`; the fixed version owns its own buffer and copies the result into `dest` explicitly. Added a regression test (TestNormalizeSecondAndAppendConcurrency) that runs 8 threads x 2000 iterations of NormalizeSecondAndAppend against NFKC_CF with an input that forces Grow() via FB03 "ffi" ligature expansion, and fails loudly if any thread's output diverges from the single-threaded expected result. Before this fix: ~2200/16000 mismatches. After: 0. The test embeds nfkc_cf.nrm as a resource in the test assembly so it runs without requiring the ICU4N.resources satellite assembly (which depends on tooling that isn't available on every dev machine). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Address PR #122 review feedback ReorderingBuffer: - Rewrote <remarks> so the pool-backing mention reflects that ValueStringBuilder is only conditionally backed by an ArrayPool rental (either via the initialCapacity ctor or by outgrowing a stackalloc'd initialBuffer), not unconditionally. - Removed the stale "ICU4N TODO: Evaluate whether this approach makes sense and if not, remove" comment above the (ReadOnlySpan<char>, int) ctor — this PR's fix relies on that ctor shape, so the design question is settled. - Added internal ReorderingBuffer(Normalizer2Impl, StringBuilder?, Span<char>) and (..., int) ctors that Append the StringBuilder directly into the inner ValueStringBuilder, eliminating the outer stackalloc/pool dance that the previous fix used to materialize `first` into a span before handing it to ReorderingBuffer. Normalizer2Impl.Decompose(ReadOnlySpan<char>, StringBuilder, int): - Restored the "ICU4N TODO: Make public TryDecompose()..." reminder above the method (it previously lived above the removed (ref ValueStringBuilder, int) overload; the reviewer wants it kept as a standing note). Norm2AllModes.NormalizeSecondAndAppend(StringBuilder, ...): - Collapsed to the single-allocation form enabled by the new StringBuilder? ctors. No more intermediate char[]/stackalloc to copy `first` into before building the ReorderingBuffer. FCDUTF16CollationIterator.Normalize: - Reworked per reviewer's suggestion: hoist estimatedLength/value above the OpenStringBuilder allocation, size OpenStringBuilder with capacity up front when creating it fresh (avoids internal growths), and restore the "ICU4N: Corrected 2nd parameter" translation marker on the slice expression. Restored ICU4N translation-marker comments on call sites I rewrote: - FCDUTF16CollationIterator.cs:496 — "Corrected 2nd parameter" - RuleBasedCollator.cs:1208 — "value is sliced prior to passing to this method, nfdQCYesLimit is based on this slice" - RuleBasedCollator.cs:1226 — "Corrected 2nd parameter" - RuleBasedCollator.cs:1485 — "Corrected 2nd parameter" (was "3rd", now matches the revised slice) - RuleBasedCollator.cs:1502 — new "Corrected 2nd parameter" on the initial-value slice - RuleBasedCollator.cs:1508 — "Corrected 2nd parameter" (was "3rd") Regression test TestNormalizeSecondAndAppendConcurrency: - Dropped the nfkc_cf.nrm embedded resource from ICU4N.Tests.csproj (it was duplicating production data; will break once #85 removes the source file from the repo). - Switched to Normalizer2.GetInstance(null, "nfkc_cf", Compose), the same null-stream pattern every other Normalizer2 test uses. The test now inherits the same runtime dependency on the ICU4N.resources satellite as the rest of the suite. - Tried the reviewer's suggestion of basing the test on testnorm.nrm but confirmed via TDD it cannot reproduce the bug: testnorm's maximum expansion is ~17% (one case goes 6→7 UTF-16 units), well under the pool-bucket rounding (~2×) that determines whether Grow fires. Without Grow the aliasing hazard never manifests. Documented this rationale inline in the test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #81