-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Update xml char type #41756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update xml char type #41756
Conversation
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
nit: could you please have the numbers inside the comment in hex instead as the table is going with 16 elements in row? and if you can add a column header like // 0 1 2 3 ... A B C D E F just to make it easy to lookup any character Refers to: src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs:233 in bfd86e5. [](commit_id = bfd86e5, deletion_comment = False) |
|
@Wraith2 thanks for making this change. LGTM. I assume you have converted the bin file directly to the span without changing any value. I looked at some random rows and I am not seeing any change. @krwq @buyaa-n maybe the next step here is to review https://www.w3.org/TR/xml/ and ensure this data we carry is conforming to the latest Xml specs. |
Yes. I wrote a small program to read the file 16 bytes at a time and output the bytes in code format. No space for human error. |
|
reformatted, example: private static ReadOnlySpan<byte> s_charProperties => new byte[]
{
// 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
/* 0000 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x11, 0x00, 0x00, 0x11, 0x00, 0x00,
/* 0010 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,Trying to anything with the file that involves a format change will max the compiler for a while so be careful. |
Maybe later we can try to convert this table into a double lookup table (if not affecting the perf much) which may make the tables much smaller. This will need some time to do perf measurement and generate the optimized tables. |
@Wraith2 is that something you're interested in doing? Or maybe @tarekgh did not mean that should hold up this PR. |
This should not be a blocker for this PR. @Wraith2 is not changing anything from what we already have. my comment was about to review the data we have and ensure it is conforming to the latest Xml spec just to ensure we are up to date with the spec. |
Not really no... Reviewing 64K of byte values individually while the compiler struggles to parse the data sounds like a punishment for both my computer and I. 😁 |
| @@ -283,46 +228,4106 @@ private static bool InRange(int value, int start, int end) | |||
| return (uint)(value - start) <= (uint)(end - start); | |||
| } | |||
|
|
|||
| #if XMLCHARTYPE_GEN_RESOURCE | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you leave the generating code in DEBUG and perhaps add a Debug.Assert comparing this array with what is generated? (just make sure it runs only once) In case we ever need to change this for some reason this won't be some magic array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps might be better to just always generate this array, not sure if we prefer initialization time vs size. what do you think @tarekgh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generation code doesn't work. I started trying to use it but it's leftover from when the author of the current format was working with an older version of the data so it isn't useful. I ended up dumping the contents of the binary file rather than trying to work from that snippet. If it had been useful I would have kept it but it isn't useful currently.
I asked in discord if people thought this was a good use for a source generator and the answer was that it's static data so it doesn't need generating each time. In either case static vs generated the time is going to be spent parsing it and that's unavoidable without going back to the resource load.
@Wraith2 you can review everyday just 1k and we'll be done in couple of months :-) just kidding. thanks again for getting this change. CC @jkotas @davidwrighton if they have any concern having 64k readonly span data like that. |
|
This is modifying perf critical XML parsing code. Do we have any XML parsing microbenchmark to check the impact? |
|
aside from the perf, I like the change because it allow checking any character properties by just looking at the array value. but in general, I agree, having perf numbers is better here. |
|
I looked at this, and I have no concerns about having a large There are two possibilities for reduced performance that I can see here.
Now, given our system, I don't expect this to be slower, I actually expect the lack of an extra memory read to make this all slightly faster than the previous behavior. |
|
Here is a small targeted test: It runs about ~900ms per iteration before this change and ~1200ms per iteration after this change. |
|
I was going to work on a test this morning but thanks for doing that. I don't know how to determine what the speed change is caused by to see if it can be mitigated. With those results I don't believe it would be an acceptable regression so either I close or look at flowing the instance/static changes mentioned above out to see if the ability to call and possible inline static methods recovers the perf. I'd like to be guided by knowledge on that one though so if you can give me some info on what tools and methods you'd use to diagnose the speed difference that would be good. |
You can use profiler such as perfview to get the idea what methods to look at. This type of change will require looking at the disassembly to identify the bottlenecks. If it is not your expertise, there are number of people with this expertise hanging around and they may be able to help you if you ask. I took a cursory look when I wrote the test and the first problem I have noticed was a lack of inlining for |
|
I've got a local build of runtime that includes the change but I can't get BDN to use it at the moment, it seems to be ignoring the --coreRun argument entirely. I used sharplab to have a look at the il and asm that is generated for the IsWhitespace call and it was surprising and given the length I'm not surprised it won't inline. If I can get to a position where I can replicate your bench I can try making it a ref struct and passing the span is as a ctor arg the way it was done previously which should cut down iteration cost. |
|
I managed to replicate the results. The ref struct idea won't work because multiple uses of it are made as class fields and changing those to be fetched each time will further increase overhead. The only alternative I can see is to somehow keep a reference to the ROS which will require unsafe shenanigans. |
|
Just commenting here what I replied with in the Discord .NET Evolution #runtime channel... It looks like the JIT currently fails to elide the bounds checks for public static unsafe bool IsWhiteSpace2(char c) {
return (Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(s_charProperties), (nuint)c) & 0x01) == 0;
} L0000: movzx eax, cx
L0003: mov rdx, 0x2196faa09e8
L000d: test byte ptr [rax+rdx], 1
L0011: sete al
L0014: movzx eax, al
L0017: ret |
|
Thanks @tannergooding, I couldn't find the
|
extract unsafe code to single function and force inline for perf
Are these numbers for the test above? They look too good to be true to me. I am not able to replicate these results on my machine. |
|
Refactoring makes only a small difference. Benchmark: using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO;
using System.Xml;
namespace BenchmarkXmlChartype
{
class Program
{
static void Main(string[] args)
{
BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
}
}
public class Benchmarks
{
[Benchmark]
public int Simple()
{
var ms = new MemoryStream();
using (var w = XmlWriter.Create(ms))
{
w.WriteStartDocument();
w.WriteStartElement("OuterElement");
for (int i = 0; i < 10000; i++)
{
w.WriteStartElement("InnerElement");
w.WriteEndElement();
}
w.WriteEndElement();
w.WriteEndDocument();
}
int readCount = 0;
ms.Position = 0;
using (var r = XmlReader.Create(ms))
{
while (r.Read())
{
readCount += 1;
}
}
return readCount;
}
}
}Results: BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.8.20362.3
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.36102, CoreFX 5.0.20.36102), X64 RyuJIT
Job-YRJNPI : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT
Toolchain=CoreRun
|
|
It looks good to me otherwise - as long as we convince ourselves that we are happy with the performance. |
|
@jkotas I'll give it a try on my machine later today. I am going to use the same test used by @Wraith2 he posted in #41756 (comment) |
|
I have tried with the existing benchmark we have. We don't have benchmarks exactly for reader/writer,
|
Ah, I have not looked carefully at this. You may want to look at the benchmark profile to see whether it is dominated by code paths affected by this change. The test that I have used measured just the |
|
We may write the perf test to split the read and the write and get actual numbers for both operations. Of course we'll need to validate the affected code path is used in both cases or not. |
|
I wrote a test to test the read and write operation separately and I am seeing improvement in the reading cases and around 2% regression in the writing. Base
Change
I confirmed both reading and writing scenario hitting the code path we changed. Here is examples of the call stacks from the ETW logs: both Here is test code I used: [Benchmark] public int TestWriter()
{
s_msWriter.Position = 0;
using (var w = XmlWriter.Create(s_msWriter))
{
w.WriteStartDocument();
w.WriteStartElement("OutterElement");
for (int i = 0; i < 1000; i++)
{
w.WriteStartElement("InnerElement");
w.WriteEndElement();
}
w.WriteEndElement();
w.WriteEndDocument();
}
return 10;
}
[Benchmark] public int TestReader()
{
s_msReader.Position = 0;
using (var r = XmlReader.Create(s_msReader))
{
while (r.Read())
{
}
}
return 20;
}
@jkotas I believe the changes is reasonable to accept from perf point of view too. what you think? |
Agree. |
|
@Wraith2 thanks a lot for doing this change and accommodating the feedback too. |
fixes #41572
This removes an embedded resource binary file along with the support code needed to load it safely and atomically. It is replaced with an enormous readonly span which is inlined into the data segment by the compiler. Doing this allows the struct to be safe, not need a ctor and for the static Instance method to be default(T), the entire thing could probably be static
The changes are split into two commits. The first has the code changes, review this. The second only contains the span which was added to the bottom of the file so the contents can be seen casually without forcing a scroll through almost 4000 lines of code. The span code was generated from the binary in the project, review it only if you like hex.
/cc @RUSshy