Skip to content

Conversation

@FrankYFTang
Copy link
Contributor

@FrankYFTang FrankYFTang commented Sep 26, 2023

To avoid very slow return from the constructor, we return an error while the Collation rule expand too big.
Add a limit constant to limit to the number of loop needed for 8 Hanguls.
Necessary number of loops: H(0)=0; H(i)=3H(i-1)+2.
Where i is the length of Hangul in the rule.
H(1) = 2, H(2) = 8, H(3)=26, H(4)=80, H(5) = 242 ...

Checklist
  • Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22517
  • Required: The PR title must be prefixed with a JIRA Issue number.
  • Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
  • Required: Each commit message must be prefixed with a JIRA Issue number.
  • Issue accepted (done by Technical Committee after discussion)
  • Tests included, if applicable
  • API docs and/or User Guide docs changed or added, if applicable

@markusicu markusicu added the incomplete Needs work; do not approve/merge as is. label Sep 28, 2023
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/collationbuilder.cpp is now changed in the branch
  • icu4c/source/test/intltest/regcoll.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang changed the title ICU-22517 CollationBuilder::addRelation very slow with multiple Hangul ICU-22517 Limit the clouser expansion loop and return error Sep 29, 2023
@FrankYFTang FrankYFTang removed the incomplete Needs work; do not approve/merge as is. label Sep 29, 2023
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/i18n/collationbuilder.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang
Copy link
Contributor Author

@markusicu I try to set the limit to smaller value, such as 242 (which allow 5 hanguls) , but that break collate/CollationTest/TestBuilderContextsOverflow so I have to change it to 6560 which could allow 8 hanguls on the righthand side and also passing collate/CollationTest/TestBuilderContextsOverflow

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remember to port the test and the fix to Java as well.

IcuTestErrorCode errorCode(*this, "TestICU22517");
char16_t data[] = u"&a=b쫊쫊쫊쫊쫊쫊쫊쫊";
icu::UnicodeString rule(true, data, -1);
for (int i = 4; i <= rule.length(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does this test run? Should we test only up to rule.length() - 1 or - 2 unless in exhaustive mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

55.5s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost a minute... is too long I think for normal testing.
Please test only up to rule.length() - 1 or - 2 unless in exhaustive mode.

@markusicu
Copy link
Member

Please also fix the typo in the commit message and in the PR title: s/clouser/closure/

FrankYFTang added a commit to FrankYFTang/icu that referenced this pull request Sep 30, 2023
@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang FrankYFTang changed the title ICU-22517 Limit the clouser expansion loop and return error ICU-22517 Limit the closure expansion loop and return error Oct 1, 2023
@FrankYFTang
Copy link
Contributor Author

I have some problem to run Java test w/ mvn now (could be my disk full issue). I will try to change Java in a different PR

@FrankYFTang FrankYFTang closed this Oct 1, 2023
@FrankYFTang FrankYFTang reopened this Oct 1, 2023
@FrankYFTang FrankYFTang requested a review from markusicu October 2, 2023 17:36
@markusicu
Copy link
Member

Please also fix the typo in the commit message and in the PR title: s/clouser/closure/

The PR title is fixed but the commit message still has this typo.

@markusicu
Copy link
Member

I have some problem to run Java test w/ mvn now (could be my disk full issue). I will try to change Java in a different PR

ok

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4c/source/test/intltest/regcoll.cpp is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang
Copy link
Contributor Author

Please also fix the typo in the commit message and in the PR title: s/clouser/closure/

The PR title is fixed but the commit message still has this typo.

done

@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationBuilder.java is now changed in the branch
  • icu4j/main/collate/src/test/java/com/ibm/icu/dev/test/collator/CollationRegressionTest.java is now changed in the branch

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

@FrankYFTang
Copy link
Contributor Author

I have some problem to run Java test w/ mvn now (could be my disk full issue). I will try to change Java in a different PR

I added them now.

To avoid very slow return from the constructor, we return
error while the Collation rule expand too big.
Add a soft limit to limit to the number of loop needed for 8 Hanguls
  Necessary number of loop: H(0)=0; H(i)=3H(i-1)+2.
  Where i is the length of Hangul in the rule.
  H(1) = 2, H(2) = 8, H(3)=26, H(4)=80, H(5) = 242 ...
@jira-pull-request-webhook
Copy link

Notice: the branch changed across the force-push!

  • icu4j/main/collate/src/main/java/com/ibm/icu/impl/coll/CollationBuilder.java is different

View Diff Across Force-Push

~ Your Friendly Jira-GitHub PR Checker Bot

Copy link
Member

@markusicu markusicu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm tnx

@FrankYFTang FrankYFTang merged commit 05b0e7a into unicode-org:main Oct 3, 2023
@FrankYFTang FrankYFTang deleted the ICU-22517-ICU-22517 branch October 3, 2023 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants