Skip to content

Fix issue with cloned RuleBasedBreakIterator failing under heavy parallel load, #95#96

Merged
NightOwl888 merged 2 commits into
NightOwl888:mainfrom
paulirwin:issue/95
Nov 25, 2024
Merged

Fix issue with cloned RuleBasedBreakIterator failing under heavy parallel load, #95#96
NightOwl888 merged 2 commits into
NightOwl888:mainfrom
paulirwin:issue/95

Conversation

@paulirwin

@paulirwin paulirwin commented Nov 24, 2024

Copy link
Copy Markdown
Collaborator

Fixes #95

This fixes an issue with RuleBasedBreakIterator failing under heavy parallel load. This originally was found via Lucene.NET's TestRandomStrings methods, which spawn multiple threads and generate random Unicode strings to run through analysis. See apache/lucenenet#269 for some examples.

This PR adds a test that isolates the problem, and can be easily reproduced if you revert the change to DequeI. The problem was that cloned RuleBasedBreakIterator instances did not properly clone all of their object graph's data, and DequeI modified itself rather than modifying the clone. This created invalid state and thus the modified integer array would accidentally leak across threads to multiple instances incorrectly.

This surfaced reliably when new CjkBreakEngine(korean: false) was in the set of break engines, and would generally not be a problem when it wasn't in the list. I'm thinking this is possibly due to the large set of characters in this version of CjkBreakEngine making it more likely to "hit" and handle the random characters, rather than anything particularly problematic about CjkBreakEngine itself.

Comment thread tests/ICU4N.Tests/Text/RuleBasedBreakIteratorTest.cs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:bug-fix Contains a fix for a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Random failures in cloned RuleBasedBreakIterator under parallel heavy load with CJK strings

2 participants