-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
Earlier in .NET 8, we added an IndexOfAnyValues<T> type that represents an immutable set of values optimized for efficient {Last}IndexOfAny{Except} searching: #68328 (comment).
You obtain an instance of IndexOfAnyValues by passing the full set of values to Create, which picks the most efficient algorithm for that set of values and capabilities of the current platform.
We've already exposed Create methods for searching for any byte or char in a given set.
This proposal would allow you to create IndexOfAnyValues<string> instances for searching for multiple substrings in a given text, instead of just individual characters.
Like with bytes and chars, Create for IndexOfAnyValues<string> can analyze the values in advance and pick the most optimal algorithm (e.g., Aho-Corasick, Rabin-Karp, Teddy, ...).
As we're working with strings, we're proposing adding a StringComparison parameter to Create.
This method would only work for Ordinal/OrdinalIgnoreCase. It's unlikely that we can meaningfully accelerate culture-aware multi-substring searching, and the proposed APIs assume that matches are of equal lengths.
Semantically, IndexOfAny would behave the same as doing an IndexOf of every value, and taking the minimum index.
While a *Last*IndexOfAny variant is possible, we are currently only proposing the left-to-right IndexOfAny.
Regex is likely to be the main consumer of this API.
cc: @stephentoub
cc: @tarekgh, @GrabYourPitchforks as we're working with strings
API Proposal
namespace System.Buffers;
public static class IndexOfAnyValues
{
// Existing
public static IndexOfAnyValues<byte> Create(ReadOnlySpan<byte> values);
public static IndexOfAnyValues<char> Create(ReadOnlySpan<char> values);
// Proposed
public static IndexOfAnyValues<string> Create(ReadOnlySpan<string> values, StringComparison comparisonType);
}namespace System;
public static class MemoryExtensions
{
// Existing
public static int IndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int IndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAny<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
public static int LastIndexOfAnyExcept<T>(this ReadOnlySpan<T> span, IndexOfAnyValues<T> values);
// Proposed
public static int IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);
}API Usage
private static readonly IndexOfAnyValues<string> s_names = IndexOfAnyValues.Create(
new[] { "Sherlock", "Holmes", "Watson" }, StringComparison.Ordinal);
public static int CountNames(ReadOnlySpan<char> text)
{
int count = 0;
while (!text.IsEmpty)
{
int matchOffset = text.IndexOfAny(s_names);
if ((uint)matchOffset >= (uint)text.Length) break;
int matchLength = text[matchOffset] == 'S' ? 8 : 6;
text = text.Slice(matchOffset + matchLength);
count++;
}
return count;
}Alternative Designs
Emphasize that only Ordinal{IgnoreCase} works:
public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, StringComparison comparisonType);
// or
public static IndexOfAnyValues<string> CreateOrdinal(ReadOnlySpan<string> values, bool ignoreCase);An API that returns both a match offset and length (instead of just the offset).
So far we've rejected this variant as it adds a policy to IndexOfAny semantics -- are we returning the leftmost-first vs leftmost-longest match.
public static (int MatchOffset, int MatchLength) IndexOfAny(this ReadOnlySpan<char> span, IndexOfAnyValues<string> values);