You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -902,157 +902,160 @@ As with other identifiers in CLDR, the American English spelling is used for uni
902
902
903
903
> In keeping with U.S. and International practice (see Sec. C.2), this Guide uses the dot on the line as the decimal marker. In addition this Guide utilizes the American spellings “meter,” “liter,” and “deka” rather than “metre,” “litre,” and “deca,” and the name “metric ton” rather than “tonne.”
904
904
905
-
#### Syntax
906
-
907
-
The formal syntax for identifiers is provided below.
908
-
Some of the constraints reference data from the unitIdComponents in [Unit_Conversion](tr35-info.md#Unit_Conversion).
<li><em>Constraint:</em> The numeric value of the unit constant must be an integer greater than one.</li>
961
-
<li><em>Note:</em> The normal interpretation of <code>e</code> is used, where 2e6 = 2×10⁶.</li>
962
-
<li><em>Note:</em> The <code>e</code> notation is optional: per-100-kilometer and per-1e2-kilometer are equivalent unit_identifiers.</li>
963
-
<li><em>Note:</em> When constructing identifiers, exponents should be greater than 3 and multiples of 3, even though parsers must accept the wider range.</li>
<li><em>Note:</em> Three simple units are currently allowed as legacy usage, for tokens that wouldn’t otherwise be a base_component due to length (eg, "<strong>g</strong>-force").
981
-
We will likely deprecate those and add conformant aliases in the future: the "hg" and "of" are already only in deprecated simple_units.</li>
<li><em>Constraint:</em> The first part of the currency_unit is a standard prefix; the second part of the currency unit must be a valid <a href="tr35.md#UnicodeCurrencyIdentifier">Unicode currency identifier</a>.</li>
1047
-
</ul>
1048
-
<ul>
1049
-
<li><em>Examples:</em> <b>curr-eur</b>-per-square-meter, or pound-per-<b>curr-usd</b></li>
1050
-
<li><em>Note:</em> CLDR does not provide conversions for currencies; this is only intended for formatting.
1051
-
The locale data for currencies is supplied in the <code>currencies</code> element, not in the <code>units</code> element.</li>
1052
-
</ul>
1053
-
</td></tr>
1054
-
1055
-
</tbody></table>
905
+
<aname="syntax"></a>
906
+
#### Unit Syntax
907
+
908
+
The formal syntax for identifiers is provided below, in [EBNF](tr35.md#ebnf).
909
+
Some of the constraints reference data from various elements in the unit conversion data [units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml).
910
+
These may be either element values or element attribute values.
911
+
See [Unit_Conversion](tr35-info.md#Unit_Conversion).
* Five simple units are currently allowed as legacy usage, for tokens that wouldn’t otherwise be a base\_component due to length (eg, "g-force").Those are likely to be deprecated in teh future, with conformant aliases added: the "hg" and "of" are already only in deprecated simple\_units.
*[ vc: must be value in: \<unitIdComponent type="prefix"\>. ]
1016
+
**Notes:*
1017
+
* The set of prefix components often expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint. *
*[ wfc: must not have a prefix as an initial segment. ]
1022
+
*[ wfc: must not be a value in \<unitIdComponent type="X"\> for X in \{prefix, suffix, power, and, per} ]
1023
+
*[ vc: Must be an attribute value of the `source` in: \<convertUnit source='…' …\> or the `type` in \<unitAlias type="…" replacement="…" …\>]
1024
+
**Notes:*
1025
+
* The set of base components typically expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint.
1026
+
* The base-components in unitAlias `type` are deprecated, should be converted to their replacement values.
1027
+
* No two different base\_components will share the first 8 letters; see [Unit Identifier Uniqueness](https://github.com/unicode-org/cldr/edit/main/docs/ldml/tr35-general.md#Unit_Identifier_Uniqueness).) ]
*[ vc: must be value in: \<unitIdComponent type="suffix"\>]
1032
+
**Notes:*
1033
+
* The set of suffix components often expands in new releases, so the requirement to be one of these attribute values is a validity constraint, not a well-formedness constraint.
*[ wfc: The first part of the currency\_unit is a standard prefix; the second part of the currency unit must be a valid [Unicode currency identifier](https://github.com/unicode-org/cldr/blob/main/docs/ldml/tr35.md#UnicodeCurrencyIdentifier). ]
1053
+
**Examples:*
1054
+
* curr-eur-per-square-meter
1055
+
* pound-per-curr-usd
1056
+
**Notes:*
1057
+
* CLDR does not provide conversions for currencies; this is only intended for formatting.
1058
+
* The locale data for currency display names is supplied in the `currencies` element, not in the `units` element.
1056
1059
1057
1060
Note that while the syntax allows for unit_constants in multiple places, the typical use case is only one instance, after a "-per-".
1058
1061
The normalized form of a unit identifier has at most one unit_constant in the numerator and one in the denominator.
@@ -3143,4 +3146,4 @@ The authors, contributors, and publishers have taken care in the preparation of
3143
3146
but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom.
3144
3147
This publication is provided “AS-IS” without charge as a convenience to users.
3145
3148
3146
-
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
3149
+
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
Copy file name to clipboardExpand all lines: docs/ldml/tr35-info.md
+12-2
Original file line number
Diff line number
Diff line change
@@ -1208,9 +1208,19 @@ Instructions for use are supplied in the header of the file.
1208
1208
1209
1209
Different locales have different preferences for which unit or combination of units is used for a particular usage, such as measuring a person’s height. This is more fine-grained than merely a preference for metric versus US or UK measurement systems. For example, one locale may use meters alone, while another may use centimeters alone or a combination of meters and centimeters; a third may use inches alone, or (informally) a combination of feet and inches.
1210
1210
1211
+
The determination of preferred units uses the user preference data in [units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml) together with **input unit**, the **input unit usage**, and the **input locale identifer**.
1212
+
* The _well-formed_ and _valid_**units** are defined according to [Unit Syntax](tr35-general.html#unit-syntax).
1213
+
* The _well-formed_**unit usages** are of the form [a-z0-9]{3-8}("-" [a-z0-9]{3-8})*.
1214
+
The _valid_**unit usages** are the union of the set of `NMTOKENS` in the `usage` attribute value for the `unitPreferences` element in [units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml).
1215
+
For example, the following `unitPreferences` elements produce the set {default, floor, geograph, land}.
The determination of preferred units uses the user preference data together with **input unit**, the **input usage**, and the **input locale identifer**.
1214
1224
Within the locale identifier, the subtags that can affect the result are:
1215
1225
* the value of the keys mu, ms, and rg
1216
1226
* the region in the locale identifier (if there is one)
@@ -1473,4 +1483,4 @@ The authors, contributors, and publishers have taken care in the preparation of
1473
1483
but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom.
1474
1484
This publication is provided “AS-IS” without charge as a convenience to users.
1475
1485
1476
-
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
1486
+
Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.
0 commit comments