Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace string creation with static strings for Regex options conversion to string #1504

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

paulomorgado
Copy link

The BsonRegularExpression(Regex) constructor uses string concatenation to create the options string.

If there's none only one option, it will result on a static interned string. Otherwise, it will create new strings. One less than the number of options.

Using a buffer to create the string is only better when there are 4 options.

However, using precomputed strings (because they aren't that much) beat every option in time and memory consumption.

Benchmarks
[HideColumns("Error", "StdDev", "RatioSD")]
public class RegexOptionsEnumToBsonRegularExpressionStringBenchmark
{
    [Benchmark(Baseline = true)]
    [Arguments(RegexOptions.None)]
    [Arguments(RegexOptions.IgnoreCase)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace)]
    public string WithStringConcatenation(RegexOptions regexOptions)
    {
        var options = "";
        if ((regexOptions & RegexOptions.IgnoreCase) != 0)
        {
            options += "i";
        }
        if ((regexOptions & RegexOptions.Multiline) != 0)
        {
            options += "m";
        }
        if ((regexOptions & RegexOptions.Singleline) != 0)
        {
            options += "s";
        }
        if ((regexOptions & RegexOptions.IgnorePatternWhitespace) != 0)
        {
            options += "x";
        }

        return options;
    }

    [Benchmark]
    [Arguments(RegexOptions.None)]
    [Arguments(RegexOptions.IgnoreCase)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace)]
    public string WithoutStringConcatenation(RegexOptions regexOptions)
    {
        if (regexOptions == RegexOptions.None)
        {
            return string.Empty;
        }
        var options = new char[4];
        var o = 0;
        if ((regexOptions & RegexOptions.IgnoreCase) != 0)
        {
            options[o++] = 'i';
        }
        if ((regexOptions & RegexOptions.Multiline) != 0)
        {
            options[o++] = 'm';
        }
        if ((regexOptions & RegexOptions.Singleline) != 0)
        {
            options[o++] = 's';
        }
        if ((regexOptions & RegexOptions.IgnorePatternWhitespace) != 0)
        {
            options[o++] = 'x';
        }

        return new string(options, 0, o);
    }

    [Benchmark]
    [Arguments(RegexOptions.None)]
    [Arguments(RegexOptions.IgnoreCase)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline)]
    [Arguments(RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace)]
    public string PreComputed(RegexOptions regexOptions)
    {
        switch (regexOptions & (RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace))
        {
            case RegexOptions.None:
                return string.Empty;
            case RegexOptions.IgnoreCase:
                return "i";
            case RegexOptions.Multiline:
                return "m";
            case RegexOptions.Singleline:
                return "s";
            case RegexOptions.IgnorePatternWhitespace:
                return "x";
            case RegexOptions.IgnoreCase | RegexOptions.Multiline:
                return "im";
            case RegexOptions.IgnoreCase | RegexOptions.Singleline:
                return "is";
            case RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace:
                return "ix";
            case RegexOptions.Multiline | RegexOptions.Singleline:
                return "ms";
            case RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace:
                return "mx";
            case RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace:
                return "sx";
            case RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline:
                return "ims";
            case RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace:
                return "imx";
            case RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace:
                return "isx";
            case RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace:
                return "msx";
            case RegexOptions.IgnoreCase | RegexOptions.Multiline | RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace:
                return "imsx";
            default:
                return string.Empty;
        }
    }
}

BenchmarkDotNet v0.13.8, Windows 11 (10.0.26100.2152)
13th Gen Intel Core i9-13900K, 1 CPU, 32 logical and 24 physical cores
.NET SDK 9.0.100-rc.2.24474.11
[Host] : .NET 8.0.10 (8.0.1024.46610), X64 RyuJIT AVX2

Method regexOptions Mean Ratio Gen0 Gen1 Allocated Alloc Ratio
WithStringConcatenation None 0.1918 ns 1.00 - - - NA
WithoutStringConcatenation None 0.0331 ns 0.17 - - - NA
PreComputed None 0.0155 ns 0.08 - - - NA
WithStringConcatenation IgnoreCase 2.1721 ns 1.00 - - - NA
WithoutStringConcatenation IgnoreCase 9.8270 ns 4.53 0.0030 0.0000 56 B NA
PreComputed IgnoreCase 1.1709 ns 0.54 - - - NA
WithStringConcatenation IgnoreCase,Multiline 9.0000 ns 1.00 0.0017 0.0000 32 B 1.00
WithoutStringConcatenation IgnoreCase,Multiline 9.6087 ns 1.07 0.0034 0.0000 64 B 2.00
PreComputed IgnoreCase,Multiline 0.9576 ns 0.11 - - - 0.00
WithStringConcatenation IgnoreCase,Multiline,Singleline 15.3435 ns 1.00 0.0034 0.0000 64 B 1.00
WithoutStringConcatenation IgnoreCase,Multiline,Singleline 9.3241 ns 0.59 0.0034 0.0000 64 B 1.00
PreComputed IgnoreCase,Multiline,Singleline 0.9865 ns 0.06 - - - 0.00
WithStringConcatenation IgnoreCase,Multiline,Singleline,IgnorePatternWhitespace 20.9242 ns 1.00 0.0051 0.0000 96 B 1.00
WithoutStringConcatenation IgnoreCase,Multiline,Singleline,IgnorePatternWhitespace 9.0079 ns 0.43 0.0034 0.0000 64 B 0.67
PreComputed IgnoreCase,Multiline,Singleline,IgnorePatternWhitespace 0.9218 ns 0.04 - - - 0.00

@paulomorgado paulomorgado requested a review from a team as a code owner October 18, 2024 17:59
@paulomorgado paulomorgado requested review from adelinowona and removed request for a team October 18, 2024 17:59
Copy link
Contributor

@adelinowona adelinowona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulomorgado Thank you for this creative optimization PR! While the switch-based lookup achieves impressive performance improvements (20ns to 0.9ns), We have concerns about the long-term maintainability of this approach. After reviewing your implementation, we're proposing an alternative approach that maintains the performance benefits while offering better long-term maintainability.

Here's the proposed solution:

private static readonly string[] RegexOptionStrings = new[]
{
    "",    // 0000
    "i",   // 0001
    "m",   // 0010
    "im",  // 0011
    "s",   // 0100
    "is",  // 0101
    "ms",  // 0110
    "ims", // 0111
    "x",   // 1000
    "ix",  // 1001
    "mx",  // 1010
    "imx", // 1011
    "sx",  // 1100
    "isx", // 1101
    "msx", // 1110
    "imsx" // 1111
};

public string GetRegexOptions(RegexOptions options)
{
    int index = ((options & RegexOptions.IgnoreCase) != 0 ? 1 : 0) |
                ((options & RegexOptions.Multiline) != 0 ? 1 : 0) << 1 |
                ((options & RegexOptions.Singleline) != 0 ? 1 : 0) << 2 |
                ((options & RegexOptions.IgnorePatternWhitespace) != 0 ? 1 : 0) << 3;
    
    return RegexOptionStrings[index];
}

This approach:

  • Should maintain the near-zero performance overhead of your original PR
  • Provides much easier maintainability
  • Uses a clear, readable bit manipulation strategy

Would you be willing to adapt your PR to this implementation?

@paulomorgado
Copy link
Author

@adelinowona, I'll have a look into it as soon as I have the time.

Why do you find what you are provides much easier maintainability?

@adelinowona
Copy link
Contributor

@paulomorgado To be fair our proposed approach doesn't necessarily provide better functional maintainability but it looks better visually. We will rather have that than the switch-based lookup approach. Visually, it transforms what could be a verbose switch or repeated conditionals into a concise, mathematical-looking operation.

@paulomorgado
Copy link
Author

@adelinowona,

Sure. No problem.

Can you create a review with a suggestion?

@adelinowona
Copy link
Contributor

@paulomorgado Isn't this comment requesting changes enough?

@paulomorgado
Copy link
Author

No problem to me.

I'm just used to using the suggested changes in a review:

image

@paulomorgado paulomorgado force-pushed the performance/BsonRegularExpression/1 branch from 3a03958 to c22b878 Compare December 24, 2024 15:40
@paulomorgado
Copy link
Author

Sorry for taking so long, @adelinowona. I was waiting for you to submit a suggestion through a review. 😄

Copy link
Contributor

@adelinowona adelinowona left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @paulomorgado, thanks again for your contribution! Happy New Year!
I'll merge your PR after some CI checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants