-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Equals/StartsWith auto-vectorization for OrdinalIgnoreCase #66095
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsThis PR adds
if the constant value contains at least one non-ASCII char (>127) - we give up on optimizing it. Codegen examplebool EqualsCoreLib(string s) =>
s.Equals("System.Private.CoreLib.dll", StringComparison.OrdinalIgnoreCase); New codegen: ; Method EqualsCoreLib(System.String):bool:this
G_M43376_IG01:
vzeroupper
G_M43376_IG02:
cmp dword ptr [rdx+8], 26
jne SHORT G_M43376_IG04
G_M43376_IG03:
vmovupd ymm0, ymmword ptr[rdx+12]
vpor ymm0, ymm0, ymmword ptr[reloc @RWD00] ;; ToLower
vpxor ymm0, ymm0, ymmword ptr[reloc @RWD32]
vmovupd ymm1, ymmword ptr[rdx+32]
vpor ymm1, ymm1, ymmword ptr[reloc @RWD64] ;; ToLower
vpxor ymm1, ymm1, ymmword ptr[reloc @RWD96]
vpor ymm0, ymm0, ymm1
vptest ymm0, ymm0
sete al
movzx rax, al
jmp SHORT G_M43376_IG05
G_M43376_IG04:
xor eax, eax
G_M43376_IG05:
movzx rax, al
G_M43376_IG06:
vzeroupper
ret
RWD00 dq 0020002000200020h, 0020000000200020h, 0020002000200020h, 0020000000200020h
RWD32 dq 0074007300790073h, 0070002E006D0065h, 0061007600690072h, 0063002E00650074h
RWD64 dq 0020002000200020h, 0020002000200000h, 0020002000200020h, 0020002000200000h
RWD96 dq 0065007400610076h, 0072006F0063002Eh, 00620069006C0065h, 006C006C0064002Eh
; Total bytes of code: 77 Benchmarkusing System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
public class Benchmarks
{
const StringComparison cmp = StringComparison.OrdinalIgnoreCase;
[Benchmark]
[Arguments("https://bing.com")]
public bool StartsWithBing(string s) => s.StartsWith("https://bing", cmp);
[Benchmark]
[Arguments("XMLFILE")]
public bool StartsWithXml(string s) => s.StartsWith("xml", cmp);
[Benchmark]
[Arguments("system.private.corelib.dll")]
public bool EqualsCoreLib(string s) => s.Equals("System.Private.CoreLib.dll", cmp );
}
10-20x faster for these cases. #65288 added tests which also cover
|
cc @dotnet/jit-contrib @AndyAyersMS |
|
||
// We can use e.g. UINT here to support SIMD for 32bit as well, | ||
// but it significantly complicates code, so 32bit support is left up-for-grabs | ||
assert(sizeof(ssize_t) == 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use static_assert(...)
here, so the code would never be enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, will change in a follow up PR
vec1 = gtNewSimdBinOpNode(GT_OR, simdType, vec1, toLowerVec1, baseType, simdSize, false); | ||
vec2 = gtNewSimdBinOpNode(GT_OR, simdType, vec2, toLowerVec2, baseType, simdSize, false); | ||
} | ||
|
||
// ((v1 ^ cns1) | (v2 ^ cns2)) == zero | ||
GenTree* xor1 = gtNewSimdBinOpNode(GT_XOR, simdType, vec1, cnsVec1, baseType, simdSize, false); | ||
GenTree* xor2 = gtNewSimdBinOpNode(GT_XOR, simdType, vec2, cnsVec2, baseType, simdSize, false); | ||
GenTree* orr = gtNewSimdBinOpNode(GT_OR, simdType, xor1, xor2, baseType, simdSize, false); | ||
return gtNewSimdHWIntrinsicNode(TYP_BOOL, useSingleVector ? xor1 : orr, zero, niEquals, baseType, simdSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens when useSingleVector
is true
. xor2
and orr
trees are just abandoned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I assume it's a rare case, basically, 6.25% probability 😄 I was not sure it was worth it to add more control flow here or DEBUG_DESTROY
This PR adds
OrdinalIgnoreCase
support to #65288 by injectingx | ToLowerMask
operation where ToLowerMask looks like this for e.g for"ab1-C"
constant value:if the constant value contains at least one non-ASCII char (>127) - we give up on optimizing it.
Codegen example
New codegen:
Benchmark
10-20x faster for these cases.
#65288 added tests which also cover
OrdinalIgnoreCase
but I might add more.jit-diff -f --crossgen
132 methods improved, I know that aspnetcore also uses it a lot.