Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] Enable Icu sharding #104

Closed
wants to merge 22 commits into from
Closed

Conversation

tqiu8
Copy link

@tqiu8 tqiu8 commented Apr 22, 2021

Initial work for enabling ICU sharding.

Summary

PR Checklist

  • I have verified that my change is specific to this fork and cannot be made upstream.
  • I am making a maintenance related change.
  • I am making a change that is related to usage internal to Microsoft.
  • I am making a change that is related to the Windows OS build of ICU.
  • CLA signed. If not, please see here to sign the CLA.

Detailed Description

This PR will enable sharding to work on WASM. Also added functionality to auto-generate icu_dictionary.json which maps relevant files to features and locales. Runtime functionalities included in dotnet/runtime#51665

Additional documentation for sharding can be found in here.
Related to dotnet/runtime#49220 and dotnet/runtime#49221

@marek-safar
Copy link

Could you add readme.md file with some details about the reasoning and steps on how to regenerate the data on ICU update?

Copy link
Member

@lewing lewing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider a dictionary structure more like (just an example not correct)

{
   "packs": {
	"base": {
		"core": [
			"icudt_base.dat",
			"icudt_normalization.app.dat",
		],
		"currency": [ "icudt_currency.app.dat" ]
	},
	"efigs": {
		"extends": "base",
		"core": [
			"icudt_efigs_locales.app.dat",
			"icudt_efigs_coll.app.dat"
		],
		"zones": [ "icudt_efigs_zones.app.dat" ]
	},
	"cjk": {
		"extends": "base",
		"core": [
                      "icudt_cjk_locales.app.dat",
		      "icudt_cjk_coll.app.dat"
		]
	},
	"nocjk": {
		"extends": "base",
	},
	"full": {
		"core": [ "icudt_all.dat" ]
	},
   },
  "locales": {
      "efigs": [ "en", "fr", "de", "se" ],
      "cjk": [ "cn", "jp", "kr" ],
       "nocjk": [ "*" ],
       "full": "full",
       "invariant": null
    }
}
That would reduce the dictionary size a lot it would also potentially simplify the loading js (which could also be produced here).  Ultimately the goal is to load the right pack from the browser not at build, we could (and probably should) select the optional features at build time. Also given a robust enough dictionary format we could probably use it to generate the filters instead of building them by hand.

@tqiu8
Copy link
Author

tqiu8 commented Jun 2, 2021

Current dictionary looks like this:

dictionary = {
  "packs": {
    "base": {
      "core": [
        "icudt_base.dat",
        "icudt_normalization.dat"
      ],
      "currency": [
        "icudt_currency.dat"
      ]
    },
    "efigs": {
      "extends": "base",
      "full": [
        "icudt_efigs_full.dat"
      ],
      "core": [
        "icudt_efigs_locales.dat",
        "icudt_efigs_coll.dat"
      ],
      "zones": [
        "icudt_efigs_zones.dat"
      ]
    },
    "cjk": {
      "extends": "base",
      "full": [
        "icudt_cjk_full.dat"
      ],
      "core": [
        "icudt_cjk_locales.dat",
        "icudt_cjk_coll.dat"
      ],
      "zones": [
        "icudt_cjk_zones.dat"
      ]
    },
    "no_cjk": {
      "extends": "base",
      "full": [
        "icudt_no_cjk_full.dat"
      ],
      "core": [
        "icudt_no_cjk_locales.dat",
        "icudt_no_cjk_coll.dat"
      ],
      "zones": [
        "icudt_no_cjk_zones.dat"
      ]
    },
    "full": "icudt_full_full.dat"
  },
  "shards": {
    "efigs": "(?:en|fr|it|de|es)",
    "cjk": "(?:zh|ja|ko)",
    "no_cjk": "^(?!.*(zh|ja|ko))",
    "full": "full"
  }
}

ilonatommy added a commit that referenced this pull request Jul 19, 2022
ilonatommy added a commit to ilonatommy/icu that referenced this pull request Jul 19, 2022
ilonatommy added a commit to ilonatommy/icu that referenced this pull request Jul 19, 2022
@akoeplinger akoeplinger deleted the branch dotnet:maint/maint-67 August 5, 2022 09:44
@akoeplinger akoeplinger closed this Aug 5, 2022
@ilonatommy ilonatommy mentioned this pull request Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants