Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve WebUtility.HtmlEncode / UrlEncode performance #103737

Merged
merged 2 commits into from
Jun 21, 2024

Conversation

stephentoub
Copy link
Member

  • For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop.
  • For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded.
  • For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer.
  • For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char.
  • For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains).
  • Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap.
  • Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops.
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Net;

BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);

[MemoryDiagnoser]
public class Tests
{
    [Benchmark]
    [ArgumentsSource(nameof(Inputs))]
    public string HtmlEncode(string input) => WebUtility.HtmlEncode(input);

    [Benchmark]
    [ArgumentsSource(nameof(Inputs))]
    public string UrlEncode(string input) => WebUtility.UrlEncode(input);

    public IEnumerable<object[]> Inputs() =>
    [
        ["this-is-a-very-long-filename-for-an-image-that-should-not-need-encoding.jpg"],
        ["short_name.txt"],
        ["https://www.example.com"],
        ["<test>hello, world</test>"],
        ["לילה טוב"],
        ["""
         How much wood could a woodchuck chuck
         If a woodchuck could chuck wood?
         A woodchuck would chuck as much wood
         As much wood as a woodchuck could chuck,
         If a woodchuck could chuck wood.
         """]
    ];
}
Method Toolchain input Mean Ratio Allocated Alloc Ratio
HtmlEncode \main\corerun.exe <test(...)test> [25] 81.576 ns 1.00 96 B 1.00
HtmlEncode \pr\corerun.exe <test(...)test> [25] 80.366 ns 0.97 96 B 1.00
UrlEncode \main\corerun.exe <test(...)test> [25] 118.515 ns 1.00 160 B 1.00
UrlEncode \pr\corerun.exe <test(...)test> [25] 103.039 ns 0.87 96 B 0.60
HtmlEncode \main\corerun.exe How (...)ood. [185] 164.359 ns 1.00 - NA
HtmlEncode \pr\corerun.exe How (...)ood. [185] 10.142 ns 0.06 - NA
UrlEncode \main\corerun.exe How (...)ood. [185] 537.945 ns 1.00 664 B 1.00
UrlEncode \pr\corerun.exe How (...)ood. [185] 466.832 ns 0.86 432 B 0.65
HtmlEncode \main\corerun.exe https(...)e.com [23] 21.442 ns 1.00 - NA
HtmlEncode \pr\corerun.exe https(...)e.com [23] 4.732 ns 0.22 - NA
UrlEncode \main\corerun.exe https(...)e.com [23] 103.053 ns 1.00 136 B 1.00
UrlEncode \pr\corerun.exe https(...)e.com [23] 76.013 ns 0.73 80 B 0.59
HtmlEncode \main\corerun.exe short_name.txt 13.068 ns 1.00 - NA
HtmlEncode \pr\corerun.exe short_name.txt 4.876 ns 0.38 - NA
UrlEncode \main\corerun.exe short_name.txt 14.324 ns 1.00 - NA
UrlEncode \pr\corerun.exe short_name.txt 3.600 ns 0.26 - NA
HtmlEncode \main\corerun.exe this-(...)g.jpg [75] 62.800 ns 1.00 - NA
HtmlEncode \pr\corerun.exe this-(...)g.jpg [75] 6.975 ns 0.11 - NA
UrlEncode \main\corerun.exe this-(...)g.jpg [75] 70.709 ns 1.00 - NA
UrlEncode \pr\corerun.exe this-(...)g.jpg [75] 5.611 ns 0.08 - NA
HtmlEncode \main\corerun.exe לילה טוב 8.436 ns 1.00 - NA
HtmlEncode \pr\corerun.exe לילה טוב 11.427 ns 1.36 - NA
UrlEncode \main\corerun.exe לילה טוב 100.852 ns 1.00 184 B 1.00
UrlEncode \pr\corerun.exe לילה טוב 74.544 ns 0.74 112 B 0.61

- For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop.
- For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded.
- For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer.
- For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char.
- For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains).
- Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap.
- Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops.
Copy link
Contributor

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@MihaZupan MihaZupan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always love to see SearchValues ratios like these

Method Toolchain input Mean Ratio Allocated Alloc Ratio
HtmlEncode \main\corerun.exe How (...)ood. [185] 164.359 ns 1.00 - NA
HtmlEncode \pr\corerun.exe How (...)ood. [185] 10.142 ns 0.06 - NA

@stephentoub stephentoub merged commit b1bb871 into dotnet:main Jun 21, 2024
142 of 146 checks passed
@stephentoub stephentoub deleted the webutilperf branch June 21, 2024 13:57
rzikm pushed a commit to rzikm/dotnet-runtime that referenced this pull request Jun 24, 2024
* Improve WebUtility.HtmlEncode / UrlEncode performance

- For HtmlEncode, vectorize IndexOfHtmlEncodingChars. Using SearchValues, we can efficiently search for the first ASCII encoding char or the first non-ASCII char, and only then fall back to a scalar loop.
- For HtmlEncode, reduce branching by using a more efficient check to determine whether the ASCII characters need to be encoded.
- For UrlEncode, rather than UTF8-encoding into a new byte[], %-encoding in-place in that, and then creating a string from that, we can use string.Create and just do all the encoding in that buffer.
- For UrlEncode, use SearchValues to vectorize the search for the first non-safe char. Also move the check for ' ' to be inside the if for non-safe char.
- For UrlEncode, use SearchValues to optimize the check for whether an individual character is part of the set (via Contains).
- Simplify IsUrlSafeChar. Rather than multiple checks, one of which is a bitmap, just have a bitmap.
- Remove some leading IsNullOrEmpty checks. Null/empty inputs should be rare, and they're now handled implicitly as part of the subsequent loops.

* Update src/libraries/System.Private.CoreLib/src/System/Net/WebUtility.cs

Co-authored-by: Miha Zupan <[email protected]>

---------

Co-authored-by: Miha Zupan <[email protected]>
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants