-
Notifications
You must be signed in to change notification settings - Fork 2
RegexUtility: miscellaneous string manipulation and regex operations
The static RegexUtility class features a number of useful methods. A summary of categories and methods appear below. Visit each section for further details and examples.
-
Split Methods
Split
SplitRemoveEmptyEntries
SplitIncludeDelimiters
SplitMatchWholeWords
SplitTrimWhitespace
-
Formatting Methods
TrimWhitespace
FormatCamelCase
-
Named Groups Conversion Methods
MatchesToNamedGroupsDictionaries
MatchesToNamedGroupsLookup
-
Split
: performs a split on the given delimiters and accepts flag enum options that can be combined to perform specific actions:-
SplitOptions.IncludeDelimiters
: includes the delimiters in the split result -
SplitOptions.MatchWholeWords
: splits the input by matching whole words based on the delimiters -
SplitOptions.TrimWhitespace
: trims leading and trailing whitespace in split results -
SplitOptions.RemoveEmptyEntries
: removes empty split result entries -
SplitOptions.All
: splits using all of the aboveSplitOptions
-
-
SplitRemoveEmptyEntries
: this method has 2 overloads- convenience method to
Split
withSplitOptions.RemoveEmptyEntries
- accepts a regex pattern, performs a
Regex.Split
, then removes any empty split result entries
- convenience method to
-
SplitIncludeDelimiters
: convenience method toSplit
withSplitOptions.IncludeDelimiters
-
SplitMatchWholeWords
: convenience method toSplit
withSplitOptions.MatchWholeWords
-
SplitTrimWhitespace
: convenience method toSplit
withSplitOptions.TrimWhitespace
The Split
method's signature is:
string[] Split(string input, string[] delimiters, RegexOptions regexOptions = RegexOptions.None, SplitOptions splitOptions = SplitOptions.None)
Note: all delimiters
are escaped. In other words, regex metacharacters are ignored.
The following examples will focus on the various SplitOptions
.
Splitting usually excludes the delimiters. This option uses a pattern that includes them in the result.
string input = "123xx456yy789";
string[] delimiters = { "xx", "yy" };
var result = RegexUtility.Split(input, delimiters, splitOptions: SplitOptions.IncludeDelimiters);
// { "123", "xx", "456", "yy", "789" }
Splitting on whole words returns the words which the delimiter is part of, rather than finding whole words and splitting at that point.
string input = "StackOverflow Stack OverStack";
string[] delimiters = { "Stack" };
var result = RegexUtility.Split(input, delimiters, splitOptions: SplitOptions.MatchWholeWords);
// { "StackOverflow ", " OverStack" }
Without TrimWhitespace
the following result would've been: { "Hello ", " World" }
(notice the leading/trailing whitespace). Instead, TrimWhitespace
cleans that up.
string input = "Hello . World";
string[] delimiters = { "." };
var result = RegexUtility.Split(input, delimiters, splitOptions: SplitOptions.TrimWhitespace);
// { "Hello", "World" }
Sometimes splitting includes empty entries (""
). This option removes those empty entries. Without this option the following would've been: { "", " Hello ", " World", "" }
.
string input = "() Hello . World?";
string[] delimiters = { "()", ".", "?" };
var result = RegexUtility.Split(input, delimiters, splitOptions: SplitOptions.RemoveEmptyEntries);
// { " Hello ", " World" }
SplitOptions
can be combined using the OR |
operator. SplitOptions.All
combines all the options: IncludeDelimiters | MatchWholeWords | TrimWhitespace | RemoveEmptyEntries
.
string input = "Stack StackOverflow Stack OverStack Stack";
string[] delimiters = { "Stack" };
var result = RegexUtility.Split(input, delimiters, splitOptions: SplitOptions.All);
// { "Stack", "StackOverflow", "Stack", "OverStack", "Stack" }
Takes a regex pattern, splits, and removes empty entries. Unlike this method, Regex.Split
would've returned: { "", "hello", "world", "goodbye", "", "world", "" }
var input = "x hello x world x goodbye !x world!";
var pattern = @"\s*[x!]\s*";
var result = RegexUtility.SplitRemoveEmptyEntries(input, pattern);
// { "hello", "world", "goodbye", "world" }
TrimWhitespace
FormatCamelCase
Removes leading, trailing, and duplicate whitespace (consecutive whitespace in the middle of inputs).
var result = RegexUtility.TrimWhitespace(" Hello World ");
// "Hello World"
Formats PascalCase (upper CamelCase) and (lower) camelCase words to a friendly format separated by the given delimiter (space by default). It also accepts an CamelCaseOptions
enum.
It properly handles acronyms too. For example "XML" is properly preserved when given an input of "PickUpXMLInFiveDays"
. The result is "Pick Up XML In Five Days"
.
-
CapitalizeFirstCharacter
: capitalizes the first character of camelCase inputs -
CapitalizeFirstCharacterInvariantCulture
: same as above, using the invariant culture
RegexUtility.FormatCamelCase("PascalCase") // Pascal Case
RegexUtility.FormatCamelCase("camelCase42", "_") // camel_Case_42
// Returns "Camel Case" (first C is now uppercase)
RegexUtility.FormatCamelCase("camelCase", camelCaseOptions: CamelCaseOptions.CapitalizeFirstCharacter);
These methods expect a pattern with named groups and will convert the named groups to specific collections.
MatchesToNamedGroupsDictionaries
MatchesToNamedGroupsLookup
Returns an array of Dictionary<string, string>
of each match with the named groups as the keys, and the group's corresponding value.
var input = "123-456-7890 hello 098-765-4321";
var pattern = @"(?<AreaCode>\d{3})-(?<First>\d{3})-(?<Last>\d{4})";
var results = RegexUtility.MatchesToNamedGroupsDictionaries(input, pattern);
This code returns the following result:
Returns an ILookup<string, string>
of each named group as the keys, and the group of corresponding match values.
var input = "123-456-7890 hello 098-765-4321";
var pattern = @"(?<AreaCode>\d{3})-(?<First>\d{3})-(?<Last>\d{4})";
var result = RegexUtility.MatchesToNamedGroupsLookup(input, pattern);
This code returns the following result: