ThaiTokenizer + ICUTokenizer: Fixed ICU4N BreakIterator issues to address concurrency problems (Fixes #1035, #1135, and #1159)#1161
Merged
Conversation
…tation.ICUTokenizer: Removed static locks that were used to prevent concurent access to BreakIterator. The concurrency issue with BreakIterator has been addressed in ICU4N 60.1.0-alpha.437. Fixes apache#1044. Fixes apache#1135.
…the ICU4N satellite assemblies to the build output.
…atellite assemblies are copied to the build output.
…nsitions array when resetting the text, since there may be state from a previous use.
…(): AssertTokenStreamContents already consumes the token stream and calls Close(), so removed duplicate Close() call in finally block. See apache#1159.
…s(): reset the token stream upon error so Close() doesn't throw an invalid state exception that would obscure our real test error message.
…nts() + CheckAnalysisConsistency()): When using MockTokenizer, disable the state checks upon exception so our call to Close() won't throw another exception that may obscure the original test failure.
… MockTokenizer, disable the state checks upon exception so our call to Close() won't throw another exception that may obscure the original test failure.
…Log any secondary exceptions from TokenStream.Close(), since we don't want them to obscure our original test failure message. Throw only in the case where there was no other failure and Close() throws.
…erloads that only existed to synchronize access to the BreakIterator instances
paulirwin
approved these changes
May 29, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ThaiTokenizer + ICUTokenizer: Fixed ICU4N BreakIterator issues to address concurrency problems
Fixes #1044, Fixes #1135, Fixes #1159
Description
This bumps ICU4N to 60.1.0-alpha.438 which contains patches to address:
This also fixes:
Close()inBaseTokenStreamTestCasewhen there is already an exception being thrown by the testThaiWordBreaker: Clear the state of thetransitionsqueue whenSetText()is called. This was causing randomCheckRandomData()test failures when there was a failure in the middle of a set of transitions.