Skip to content

ThaiTokenizer + ICUTokenizer: Fixed ICU4N BreakIterator issues to address concurrency problems (Fixes #1035, #1135, and #1159)#1161

Merged
NightOwl888 merged 12 commits into
apache:masterfrom
NightOwl888:fix/GH-1044
May 29, 2025
Merged

ThaiTokenizer + ICUTokenizer: Fixed ICU4N BreakIterator issues to address concurrency problems (Fixes #1035, #1135, and #1159)#1161
NightOwl888 merged 12 commits into
apache:masterfrom
NightOwl888:fix/GH-1044

Conversation

@NightOwl888

@NightOwl888 NightOwl888 commented May 29, 2025

Copy link
Copy Markdown
Contributor
  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a change, please open an issue to discuss the change or find an existing issue.

ThaiTokenizer + ICUTokenizer: Fixed ICU4N BreakIterator issues to address concurrency problems

Fixes #1044, Fixes #1135, Fixes #1159

Description

This bumps ICU4N to 60.1.0-alpha.438 which contains patches to address:

This also fixes:

…tation.ICUTokenizer: Removed static locks that were used to prevent concurent access to BreakIterator. The concurrency issue with BreakIterator has been addressed in ICU4N 60.1.0-alpha.437. Fixes apache#1044. Fixes apache#1135.
…the ICU4N satellite assemblies to the build output.
…atellite assemblies are copied to the build output.
…nsitions array when resetting the text, since there may be state from a previous use.
…(): AssertTokenStreamContents already consumes the token stream and calls Close(), so removed duplicate Close() call in finally block. See apache#1159.
…s(): reset the token stream upon error so Close() doesn't throw an invalid state exception that would obscure our real test error message.
…nts() + CheckAnalysisConsistency()): When using MockTokenizer, disable the state checks upon exception so our call to Close() won't throw another exception that may obscure the original test failure.
… MockTokenizer, disable the state checks upon exception so our call to Close() won't throw another exception that may obscure the original test failure.
…Log any secondary exceptions from TokenStream.Close(), since we don't want them to obscure our original test failure message. Throw only in the case where there was no other failure and Close() throws.
@NightOwl888 NightOwl888 added the notes:bug-fix Contains a fix for a bug label May 29, 2025
…erloads that only existed to synchronize access to the BreakIterator instances
@NightOwl888 NightOwl888 requested a review from paulirwin May 29, 2025 13:52
@NightOwl888 NightOwl888 merged commit 5a9516c into apache:master May 29, 2025
275 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:bug-fix Contains a fix for a bug

Projects

None yet

2 participants