From 6fec5205ec961b6d9e3e999e2846235db16e0a95 Mon Sep 17 00:00:00 2001 From: Mark Davis Date: Wed, 25 Sep 2024 17:45:09 -0700 Subject: [PATCH 1/3] CLDR-17830 Update modifications section --- docs/ldml/tr35.md | 89 +++++++++++++++++++---------------------------- 1 file changed, 36 insertions(+), 53 deletions(-) diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index c164fb4f8f6..87055b7bb18 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -4287,59 +4287,42 @@ Other contributors to CLDR are listed on the [CLDR Project Page](https://www.uni **Differences from LDML Version 45** -* Part 4: [Dates](tr35-dates.md#Contents) - * In [Element dayPeriods](tr35-dates.md#dayPeriods), added a note on special formatting usable with - dayPeriods `noon` and `midnight`. - -**Differences from LDML Version 44.1** - -* Part 1: [Core](tr35.md#Contents) - * In [Parent Locales](#Parent_Locales), made substantial changes to the way that parentLocales work, - including a new attribute for algorithmic handling of inheritance - that avoids needing a long (and fragile) list of language-script codes - to skip when falling back to root. - That list was retained for migration, but will be withdrawn in the future. - * In [Special Script Codes](#special-script-codes), added a description of special script codes, - such as Jpan and Aran. - * In [Lateral Inheritance](#Lateral_Inheritance), improved the formatting for clarity. - * In [LocaleId Canonicalization:Preprocessing](#preprocessing), restructured the steps for clarity, added more examples. - * In [Likely Subtags](#Likely_Subtags), clarified that language subtags iw, in, and yi are treated specially in the data, - to allow for applications that use them as canonical language subtags. - Also removed the substitution for macroregions, - and noted that some elements could be NOOPs in customized data, but could be misleading. - * In [EBNF](#ebnf), added more differences from W3C EBNF, - and documented use of wfc: and vc: for wellformedness and validity constraints. - Marked clauses with that format where appropriate, and grouped constraints after the relevant EBNF. - * In [Element alias](#Alias_Elements), clarified the location to which the path in the `alias` element is relative. - -* Part 3: [Numbers](tr35-numbers.md#Contents) - * In [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data), for the `currency` element, - added attributes `tz` and `to-tz` to clarify the `from` and `to` dates. - -* Part 4: [Dates](tr35-dates.md#Contents) - * In [Date Format Patterns](tr35-dates.md#Date_Format_Patterns), reserved date Pattern field lengths of greater than 16 - as private use. - -* Part 6: [Supplemental](tr35-info.md#Contents) - * In [Mixed Units](tr35-info.md#mixed-units), clarified many aspects of mixed units (such as foot-and-inch), - including how to handle rounding and precision. - * In [Testing](tr35-info.md#testing), listed the additional test files. - * In [Unit Preferences Overrides](tr35-info.md#Unit_Preferences_Overrides), made substantial changes including - handling of edge cases, such as where there is no quantity for a unit, or no preference data for a quantity; - how to handle invalid subtags; - negative unit amounts; - the usage of each of the subtags that affect unit preferences, and others. - * In [Conversion Data](tr35-info.md#conversion-data), added the `special` attribute for `convertUnit`, used for handling beaufort. - * In [Unit Prefixes](tr35-info.md#unit-prefixes), added the SI unit prefixes and the power of 10 - (or 2, for binary prefixes) that they represent. - -* Part 7: [Keyboards](tr35-keyboards.md#Contents) - * Added substantial changes from v44 to bring the Keyboard 3.0 specification out of Tech Preview, including: - * New sections for Definitions, Notation, and Normalization. - * Many clarifications and modifications in other sections. - -* Part 9: [MessageFormat](tr35-messageFormat.md#Contents) - * Added the completely new specification for MessageFormat 2.0 (in Tech Preview) +### Conformance Modifications + +Updates to [LDML Conformance](#Conformance) including +- clarification of conformance requirements +- an expanded list of major sections +- details about customization +- a summary of conformance data files + +### Locale Identifiers and Inheritance Modifications +- Clarified definitions of _Unicode BCP 47 locale identifier_ and _Unicode CLDR locale identifier_, moving them to [Unicode CLDR locale identifier](#unicode-locale-identifier) +- Clarified useage of [Special Script Codes](#unicode-locale-identifier). +- Added definition of [Ordered Elements](#definitions), replacing the obsolete definition of _blocking_ elements. +- Clarified the usage of the `path` attribute with [aliases](#element-alias). + +### Message Format Modifications + +Significant updates to [Message Format](tr35-messageFormat.html#Contents) +- Removed all of the reserved and private use syntax constructs, simplifying the grammar. +- Changed the structure of the .match (selector) to require use of local or input declarations. This is a breaking change for existing messages. +- Added support for bidirectional isolates and marks and clarified whitespace handling to better enable messages that contains right-to-left identifiers and text. + +### Date Modifications +- Added a tech preview section on [semantic skeletons](tr35-dates.html#Semantic_Skeletons), allowing for less data and faster performance in formatting dates. +- Clarified that if [dayPeriods](tr35-collation.html#grouping_classes_of_characters) are specified for `noon` and `midnight`, they can often be formatted without also specifying the numeric time +- In [Element dayPeriods](tr35-dates.html#dayPeriods), added a note on special formatting usable with dayPeriods `noon` and `midnight`. + +### Units Modifications +- Changed the EBNF for [`unit_identifier`](tr35-general.html#Annotations): + - Replacing `number_prefix` by [unit_constant](tr35-general.html#syntax) to generalize expressions like liter-per-100-kilometers, and provide a compact form for longer constants (such as 1e9). + - Adding EBNF constraints on `si_prefix` and `binary_prefix`, and adding links to more named components. +- Clarified the use of -rg for [computing regions](tr35-info.html#compute-regions) in user preferences + +### Misc. Modifications +- Clarified the usage model for [emoji search keywords](tr35-general.html#Annotations). +- In collation, modified [Grouping classes of characters](tr35-collation.html#grouping_classes_of_characters) to reflect the changes in collation listed below. + Note that small changes such as typos and link fixes are not listed above. Modifications in previous versions are listed in those respective versions. From dcf9e5443cb1ab5f1af33a1e37f749ecfe378423 Mon Sep 17 00:00:00 2001 From: Mark Davis Date: Wed, 25 Sep 2024 18:01:35 -0700 Subject: [PATCH 2/3] CLDR-17830 Cleanup the collation modifications --- docs/ldml/tr35.md | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index 87055b7bb18..1b2f04c939d 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -4319,9 +4319,26 @@ Significant updates to [Message Format](tr35-messageFormat.html#Contents) - Adding EBNF constraints on `si_prefix` and `binary_prefix`, and adding links to more named components. - Clarified the use of -rg for [computing regions](tr35-info.html#compute-regions) in user preferences +### Collation Data Changes +- Modified [Grouping classes of characters](tr35-collation.html#grouping_classes_of_characters) to reflect two major changes. + - The [CLDR root collation](tr35-collation.html#Root_Collation) is a tailoring of the [DUCET](https://www.unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table). +Changes have been made to both to align them better: + - non-decimal-digit numeric characters now sort after decimal digits + - the CLDR root collation no longer tailors any currency symbols (making some of them sort like letter sequences, as in the DUCET). + - Starting with CLDR 46, the CLDR radical-stroke order matches that of the [Unicode Radical-Stroke Index (large PDF)](https://www.unicode.org/Public/UCD/latest/charts/RSIndex.pdf). +[Its sorting algorithm is defined in UAX #38](https://www.unicode.org/reports/tr38/#SortingAlgorithm). + +#### Improved Han Radical-Stroke Order +CLDR includes [data for sorting Han (CJK) characters in radical-stroke order](https://cldr-smoke.unicode.org/spec/main/ldml/tr35-collation.md#File_Format_FractionalUCA_txt). +It used to distinguish traditional and simplified forms of radicals on a higher level than sorting by the number of residual strokes. + +Traditional vs. simplified forms of radicals are distinguished on a lower level than the number of residual strokes. +This also has an effect on [alphabetic indexes](tr35-collation.md#Collation_Indexes) for radical-stroke sort orders, +where only the traditional forms of radicals are now available as index characters. + + ### Misc. Modifications - Clarified the usage model for [emoji search keywords](tr35-general.html#Annotations). -- In collation, modified [Grouping classes of characters](tr35-collation.html#grouping_classes_of_characters) to reflect the changes in collation listed below. Note that small changes such as typos and link fixes are not listed above. From 78e8e94239fdd92ff77acf98588ac2b4444d35f3 Mon Sep 17 00:00:00 2001 From: Mark Davis Date: Wed, 25 Sep 2024 18:08:20 -0700 Subject: [PATCH 3/3] CLDR-17830 Cleanup the collation modifications2 --- docs/ldml/tr35.md | 9 --------- 1 file changed, 9 deletions(-) diff --git a/docs/ldml/tr35.md b/docs/ldml/tr35.md index 1b2f04c939d..1557d15e338 100644 --- a/docs/ldml/tr35.md +++ b/docs/ldml/tr35.md @@ -4328,15 +4328,6 @@ Changes have been made to both to align them better: - Starting with CLDR 46, the CLDR radical-stroke order matches that of the [Unicode Radical-Stroke Index (large PDF)](https://www.unicode.org/Public/UCD/latest/charts/RSIndex.pdf). [Its sorting algorithm is defined in UAX #38](https://www.unicode.org/reports/tr38/#SortingAlgorithm). -#### Improved Han Radical-Stroke Order -CLDR includes [data for sorting Han (CJK) characters in radical-stroke order](https://cldr-smoke.unicode.org/spec/main/ldml/tr35-collation.md#File_Format_FractionalUCA_txt). -It used to distinguish traditional and simplified forms of radicals on a higher level than sorting by the number of residual strokes. - -Traditional vs. simplified forms of radicals are distinguished on a lower level than the number of residual strokes. -This also has an effect on [alphabetic indexes](tr35-collation.md#Collation_Indexes) for radical-stroke sort orders, -where only the traditional forms of radicals are now available as index characters. - - ### Misc. Modifications - Clarified the usage model for [emoji search keywords](tr35-general.html#Annotations).