-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add numerical ordering option for string comparison operations #109861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
87a8d03
2c51664
28b74a4
45f0b4e
48b3c6e
3d29499
8034de5
215c1d6
74150f7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,17 +3,75 @@ | |
|
|
||
| namespace System.Globalization | ||
| { | ||
| /// <summary> | ||
| /// Defines the string comparison options to use with <see cref="CompareInfo"/>. | ||
| /// </summary> | ||
| [Flags] | ||
| public enum CompareOptions | ||
| { | ||
| /// <summary> | ||
| /// Indicates the default option settings for string comparisons | ||
| /// </summary> | ||
| None = 0x00000000, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must ignore case. | ||
| /// </summary> | ||
| IgnoreCase = 0x00000001, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must ignore nonspacing combining characters, such as diacritics. | ||
| /// The <see href="https://go.microsoft.com/fwlink/?linkid=37123">Unicode Standard</see> defines combining characters as | ||
| /// characters that are combined with base characters to produce a new character. Nonspacing combining characters do not | ||
| /// occupy a spacing position by themselves when rendered. | ||
| /// </summary> | ||
| IgnoreNonSpace = 0x00000002, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, | ||
| /// the percent sign, mathematical symbols, the ampersand, and so on. | ||
| /// </summary> | ||
| IgnoreSymbols = 0x00000004, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must ignore the Kana type. Kana type refers to Japanese hiragana and katakana characters, which represent phonetic sounds in the Japanese language. | ||
| /// Hiragana is used for native Japanese expressions and words, while katakana is used for words borrowed from other languages, such as "computer" or "Internet". | ||
| /// A phonetic sound can be expressed in both hiragana and katakana. If this value is selected, the hiragana character for one sound is considered equal to the katakana character for the same sound. | ||
| /// </summary> | ||
| IgnoreKanaType = 0x00000008, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must ignore the character width. For example, Japanese katakana characters can be written as full-width or half-width. | ||
| /// If this value is selected, the katakana characters written as full-width are considered equal to the same characters written as half-width. | ||
| /// </summary> | ||
| IgnoreWidth = 0x00000010, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must sort sequences of digits (Unicode general category "Nd") based on their numeric value. | ||
| /// For example, "2" comes before "10". Non-digit characters such as decimal points, minus or plus signs, etc. | ||
| /// are not considered as part of the sequence and will terminate it. This flag is not valid for indexing | ||
| /// (such as <see cref="CompareInfo.IndexOf(string, string, CompareOptions)"/>, <see cref="CompareInfo.IsPrefix(string, string, CompareOptions)"/>, etc.). | ||
| /// </summary> | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. will be good to add a remarks to this one giving more information like this option can be used in comparisons but not for search (IndexOf/StartsWith/EndsWith). Will be good to hint the behavior difference too when ICU is used against NLS. And last tell this option cannot be combined with the ordinal operations,
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment was getting long so I just included just the part about indexing. Mentioning search might be confusing because people might consider IndexOf (which isn't supported) as search and GetHashCode (which is supported) as not. I think it's easier to just say that NumericOrdering works in all cases except for indexing. I prefer to keep the combination behavior with Ordinal and OrdinalIgnoreCase on those members since this is really a property of theirs instead of numeric ordering. This is also consistent with the other options as well which don't mention Ordinal even though they can't combine with them either. I think the ICU and NLS differences should probably go in docs rather than in doc comments since NLS usage is not going to be high. There are already docs about NLS/ICU differences that we can append to (https://learn.microsoft.com/en-us/dotnet/core/extensions/globalization-icu#behavioral-differences).
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, if you prefer that, it will be better to edit the docs of the indexing APIs and add the remark there. I am trying to make it easy for the API users to understand when this new enum value is not allowed. I guess users can be puzzled if they get exceptions and do not understand what is wrong. You don't have to block the PR on that, but it will be good to open a doc issue/PR to add the info as needed.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| NumericOrdering = 0x00000020, | ||
|
|
||
| /// <summary> | ||
| /// String comparison must ignore case, then perform an ordinal comparison. This technique is equivalent to | ||
| /// converting the string to uppercase using the invariant culture and then performing an ordinal comparison on the result. | ||
| /// This value cannot be combined with other <see cref="CompareOptions" /> values and must be used alone. | ||
| /// </summary> | ||
| OrdinalIgnoreCase = 0x10000000, // This flag can not be used with other flags. | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must use the string sort algorithm. In a string sort, the hyphen and the apostrophe, | ||
| /// as well as other nonalphanumeric symbols, come before alphanumeric characters. | ||
| /// </summary> | ||
| StringSort = 0x20000000, | ||
|
|
||
| /// <summary> | ||
| /// Indicates that the string comparison must use successive Unicode UTF-16 encoded values of the string (code unit by code unit comparison), | ||
| /// leading to a fast comparison but one that is culture-insensitive. A string starting with a code unit XXXX16 comes before a string starting with YYYY16, | ||
| /// if XXXX16 is less than YYYY16. This value cannot be combined with other <see cref="CompareOptions" /> values and must be used alone. | ||
| /// </summary> | ||
| Ordinal = 0x40000000, // This flag can not be used with other flags. | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9186,6 +9186,7 @@ public enum CompareOptions | |
| IgnoreSymbols = 4, | ||
| IgnoreKanaType = 8, | ||
| IgnoreWidth = 16, | ||
| NumericOrdering = 32, | ||
| OrdinalIgnoreCase = 268435456, | ||
| StringSort = 536870912, | ||
| Ordinal = 1073741824, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: should we change this to use hex numbers for the sake of the readability?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is how the GenAPI tool generates it and the guidance I've heard is to make as few diffs from that tool as possible.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but still can make diffs :-)
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The ref assembly isn’t really designed for readability, it’s designed to be correct and automatically generated Deviations and manual diffs just cause later downstream pain and hinder the ability to rerun the tool, as people have to fight against it The better option would be to submit a bug or better a patch such that the output produced by the tool for flags enabled enums is “better” such as using hex or logical shifts to represent the bits instead (
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've created an issue: dotnet/sdk#44999. GenAPI seems to be in a kind of limbo state where there is a new Roslyn based version we want to switch to (tracked by dotnet/sdk#31088) and we don't want to maintain the current CCI-based one (https://github.com/dotnet/arcade/blob/main/src/Microsoft.DotNet.GenAPI/README.md). |
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.