-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wasm] Enable ICU sharding #51665
[wasm] Enable ICU sharding #51665
Conversation
Tagging subscribers to this area: @tarekgh, @safern Issue DetailsInitial work to enable ICU sharding according to locale and features.
Current size differences:
We see size gains in EFIGS in no_CJK, but size increases in CJK due to increased size of collation data. Currently references Additional documentation can be found here.
|
@@ -146,7 +154,7 @@ int32_t GlobalizationNative_LoadICUData(const char* path) | |||
|
|||
fclose(fp); | |||
|
|||
if (load_icu_data(icu_data) == 0) { | |||
if (load_icu_data(icu_data, strcasecmp("icudt.dat", path)) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this always be named icudt.dat? I think in some cases the platform native .dat file might have a version number in its path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should rename all the shards to something like .app.dat and assume anything that doesn't match that pattern should use setCommonData?
Could you add test(s) that the feature works as expected? |
Do you know why the size of the collation data is increasing? Also, how hard would it be to split up locales further as a followup (and how much potential size would we save by doing so)? |
{ | ||
|
||
UErrorCode status = 0; | ||
udata_setCommonData(pData, &status); | ||
if (type == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give this enum some names? That way it reads better.
src/installer/pkg/sfx/Microsoft.NETCore.App/Directory.Build.props
Outdated
Show resolved
Hide resolved
Not completely sure....Splitting up the collation by locales on the EFIGS does not make a huge difference, but might make a difference in CJK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the specifics of the file names and the shard resolution javascript to dotnet/icu (see my comment there) and then just import the props and .js here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we move the specifics of the file names and the shard resolution javascript to dotnet/icu (see my comment there) and then just import the props and .js here?
@@ -146,7 +154,7 @@ int32_t GlobalizationNative_LoadICUData(const char* path) | |||
|
|||
fclose(fp); | |||
|
|||
if (load_icu_data(icu_data) == 0) { | |||
if (load_icu_data(icu_data, strcasecmp("icudt.dat", path)) == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should rename all the shards to something like .app.dat and assume anything that doesn't match that pattern should use setCommonData?
Tagging subscribers to 'arch-wasm': @lewing Issue DetailsInitial work to enable ICU sharding according to locale and features.
Current size differences:
We see size gains in EFIGS in no_CJK, but size increases in CJK due to increased size of collation data. Currently references Additional documentation can be found here.
|
…et/icu, load ICU.DataFiles.props file generated by the icu
@@ -178,7 +178,7 @@ run-tests-%: | |||
EMSDK_PATH=$(EMSDK_PATH) PATH="$(JSVU):$(PATH)" $(DOTNET) build $(TOP)/src/libraries/$*/tests/ /t:Test $(_MSBUILD_WASM_BUILD_ARGS) $(MSBUILD_ARGS) | |||
|
|||
run-build-tests: | |||
PATH="$(JSVU):$(PATH)" $(DOTNET) build $(TOP)/src/tests/BuildWasmApps/Wasm.Build.Tests/ /t:Test $(_MSBUILD_WASM_BUILD_ARGS) $(MSBUILD_ARGS) | |||
PATH="$(JSVU):$(PATH)" $(DOTNET) build $(TOP)/src/tests/BuildWasmApps/Wasm.Build.Tests/ /t:Test $(_MSBUILD_WASM_BUILD_ARGS) $(MSBUILD_ARGS) /p:XUnitMethodName=Wasm.Build.Tests.InvariantGlobalizationTests.Invariant_WithSharding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an intentional change to check in?
@tqiu8 What is needed to move this PR forward? |
@lewing the PR seems to have conflicts, I'll close the PR. |
Initial work to enable ICU sharding according to locale and features.
--enable-sharding
flag inruntime-test.js
to toggle feature on and off.setAppData
andsetCommonData
, which allows shards likeicudt_CJK.dat
andicudt_EFIGS.dat
to be loaded simultaneously (needs more investigating).Current size differences:
icudt_currency.dat
icudt_normalization.dat
icudt_coll.dat
icudt_locales.dat
icudt_efigs_locales.dat
icudt_currency.dat
icudt_normalization.dat
icudt_efigs_coll.dat
icudt_cjk_locales.dat
icudt_currency.dat
icudt_normalization.dat
icudt_cjk_coll.dat
icudt_no_cjk_locales.dat
icudt_currency.dat
icudt_normalization.dat
icudt_no_cjk_coll.dat
We see size gains in EFIGS in no_CJK, but size increases in CJK due to increased size of collation data.
Currently references
icu_dictionary.json
which is a mapping between culture and relevant files (generated by in dotnet/icu#104).Related to #49220 and #49221
Additional documentation can be found here.