Update xml char type #41756

Wraith2 · 2020-09-02T20:41:06Z

This removes an embedded resource binary file along with the support code needed to load it safely and atomically. It is replaced with an enormous readonly span which is inlined into the data segment by the compiler. Doing this allows the struct to be safe, not need a ctor and for the static Instance method to be default(T), the entire thing could probably be static

The changes are split into two commits. The first has the code changes, review this. The second only contains the span which was added to the bottom of the file so the contents can be seen casually without forcing a scroll through almost 4000 lines of code. The span code was generated from the binary in the project, review it only if you like hex.

/cc @RUSshy

Dotnet-GitSync-Bot · 2020-09-02T20:41:10Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2020-09-02T20:55:47Z

Tagging subscribers to this area: @buyaa-n, @krwq
See info in area-owners.md if you want to be subscribed.

tarekgh · 2020-09-02T21:59:56Z

        /* 000 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x11, 0x00, 0x00, 0x11, 0x00, 0x00,

nit: could you please have the numbers inside the comment in hex instead as the table is going with 16 elements in row? and if you can add a column header like

// 0 1 2 3 ... A B C D E F

just to make it easy to lookup any character

Refers to: src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs:233 in bfd86e5. [](commit_id = bfd86e5, deletion_comment = False)

tarekgh · 2020-09-02T22:03:23Z

@Wraith2 thanks for making this change. LGTM. I assume you have converted the bin file directly to the span without changing any value. I looked at some random rows and I am not seeing any change.

@krwq @buyaa-n maybe the next step here is to review https://www.w3.org/TR/xml/ and ensure this data we carry is conforming to the latest Xml specs.

Wraith2 · 2020-09-02T22:52:28Z

I assume you have converted the bin file directly to the span without changing any value. I looked at some random rows and I am not seeing any change

Yes. I wrote a small program to read the file 16 bytes at a time and output the bytes in code format. No space for human error.

Wraith2 · 2020-09-02T23:17:40Z

reformatted, example:

        private static ReadOnlySpan<byte> s_charProperties => new byte[]
        {
            //            0,    1,    2,    3,    4,    5,    6,    7,    8,    9,    A,    B,    C,    D,    E,    F
            /* 0000 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x11, 0x11, 0x00, 0x00, 0x11, 0x00, 0x00,
            /* 0010 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,

Trying to anything with the file that involves a format change will max the compiler for a while so be careful.

tarekgh · 2020-09-02T23:27:01Z

Trying to anything with the file that involves a format change will max the compiler for a while so be careful.

Maybe later we can try to convert this table into a double lookup table (if not affecting the perf much) which may make the tables much smaller. This will need some time to do perf measurement and generate the optimized tables.
currently having a 64k table is really big and I admire the compiler it can handle it.

danmoseley · 2020-09-03T00:19:15Z

maybe the next step here is to review https://www.w3.org/TR/xml/ and ensure this data we carry is conforming to the latest Xml specs.

@Wraith2 is that something you're interested in doing? Or maybe @tarekgh did not mean that should hold up this PR.

tarekgh · 2020-09-03T00:40:30Z

Or maybe @tarekgh did not mean that should hold up this PR.

This should not be a blocker for this PR. @Wraith2 is not changing anything from what we already have. my comment was about to review the data we have and ensure it is conforming to the latest Xml spec just to ensure we are up to date with the spec.

Wraith2 · 2020-09-03T07:25:05Z

@Wraith2 is that something you're interested in doing?

Not really no... Reviewing 64K of byte values individually while the compiler struggles to parse the data sounds like a punishment for both my computer and I. 😁

krwq · 2020-09-03T10:13:24Z

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs

@@ -283,46 +228,4106 @@ private static bool InRange(int value, int start, int end)
            return (uint)(value - start) <= (uint)(end - start);
        }

-#if XMLCHARTYPE_GEN_RESOURCE


can you leave the generating code in DEBUG and perhaps add a Debug.Assert comparing this array with what is generated? (just make sure it runs only once) In case we ever need to change this for some reason this won't be some magic array

perhaps might be better to just always generate this array, not sure if we prefer initialization time vs size. what do you think @tarekgh?

The generation code doesn't work. I started trying to use it but it's leftover from when the author of the current format was working with an older version of the data so it isn't useful. I ended up dumping the contents of the binary file rather than trying to work from that snippet. If it had been useful I would have kept it but it isn't useful currently.

I asked in discord if people thought this was a good use for a source generator and the answer was that it's static data so it doesn't need generating each time. In either case static vs generated the time is going to be spent parsing it and that's unavoidable without going back to the resource load.

tarekgh · 2020-09-03T17:16:55Z

Not really no... Reviewing 64K of byte values individually while the compiler struggles to parse the data sounds like a punishment for both my computer and I. 😁

@Wraith2 you can review everyday just 1k and we'll be done in couple of months :-) just kidding. thanks again for getting this change.

CC @jkotas @davidwrighton if they have any concern having 64k readonly span data like that.

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs

jkotas · 2020-09-03T17:49:29Z

This is modifying perf critical XML parsing code. Do we have any XML parsing microbenchmark to check the impact?

tarekgh · 2020-09-03T18:02:10Z

aside from the perf, I like the change because it allow checking any character properties by just looking at the array value. but in general, I agree, having perf numbers is better here.

davidwrighton · 2020-09-04T18:05:45Z

I looked at this, and I have no concerns about having a large ReadOnlySpan<byte> used here. However, I do think we should measure the performance of this, as this might be slower than the previous implementation. I doubt it, but I've been unpleasantly surprised before.

There are two possibilities for reduced performance that I can see here.

Accessing the span may be injecting a bounds check. In this case bounds check isn't actually needed, as the data is sufficiently large, but we might be missing that optimization in this case.
Accessing the span as embedded data requires embedding a pointer into the code stream of the jit. Reading that pointer from the code stream may be slower than reading a field from the struct. I don't expect that this is true, but on some platforms (notably arm64) it requires 2 instructions to embed a pointer where the field reading logic is a single instruction, and on intel architecture platforms, the field reading instruction is slightly bigger. OTOH, reading a field is an extra memory load, which is generally considered to be more expensive to run than a typical instruction.

Now, given our system, I don't expect this to be slower, I actually expect the lack of an extra memory read to make this all slightly faster than the previous behavior.

jkotas · 2020-09-06T06:36:30Z

Here is a small targeted test:

using System;
using System.IO;
using System.Xml;

class Program
{
    static void Main(string[] args)
    {
        var ms = new MemoryStream();

        using (var w = XmlWriter.Create(ms))
        {
            w.WriteStartDocument();
            w.WriteStartElement("OutterElement");
            for (int i = 0; i < 10000; i++)
            {
                w.WriteStartElement("InnerElement");
                w.WriteEndElement();
            }
            w.WriteEndElement();
            w.WriteEndDocument();
        }

        for (; ; )
        {
            int start = Environment.TickCount;
            for (int i = 0; i < 1000; i++)
            {
                ms.Position = 0;
                using (var r = XmlReader.Create(ms))
                {
                    while (r.Read())
                    {
                    }
                }
            }
            int end = Environment.TickCount;
            Console.WriteLine(end - start);
        }
    }
}

It runs about ~900ms per iteration before this change and ~1200ms per iteration after this change.

Wraith2 · 2020-09-06T08:29:03Z

I was going to work on a test this morning but thanks for doing that.

I don't know how to determine what the speed change is caused by to see if it can be mitigated. With those results I don't believe it would be an acceptable regression so either I close or look at flowing the instance/static changes mentioned above out to see if the ability to call and possible inline static methods recovers the perf. I'd like to be guided by knowledge on that one though so if you can give me some info on what tools and methods you'd use to diagnose the speed difference that would be good.

jkotas · 2020-09-06T15:11:27Z

I don't know how to determine what the speed change is caused by

You can use profiler such as perfview to get the idea what methods to look at. This type of change will require looking at the disassembly to identify the bottlenecks. If it is not your expertise, there are number of people with this expertise hanging around and they may be able to help you if you ask.

I took a cursory look when I wrote the test and the first problem I have noticed was a lack of inlining for XmlCharType.IsNCNameSingleChar call from XmlTextReaderImpl.ParseElement, so part of the fix is likely going to be to add AggressiveInlining on the XmlCharType accessors. It will fix some of the regression, but I suspect that it won't be enough.

Wraith2 · 2020-09-06T17:36:37Z

I've got a local build of runtime that includes the change but I can't get BDN to use it at the moment, it seems to be ignoring the --coreRun argument entirely.

I used sharplab to have a look at the il and asm that is generated for the IsWhitespace call and it was surprising and given the length I'm not surprised it won't inline. If I can get to a position where I can replicate your bench I can try making it a ref struct and passing the span is as a ctor arg the way it was done previously which should cut down iteration cost.

Wraith2 · 2020-09-07T09:21:29Z

I managed to replicate the results. The ref struct idea won't work because multiple uses of it are made as class fields and changing those to be fetched each time will further increase overhead. The only alternative I can see is to somehow keep a reference to the ROS which will require unsafe shenanigans.

tannergooding · 2020-09-07T19:09:59Z

Just commenting here what I replied with in the Discord .NET Evolution #runtime channel...

It looks like the JIT currently fails to elide the bounds checks for SomeSpan[ushort] or SomeSpan[char] when the length of the span is known to be >= ushort.MaxValue (as is the case here).
Utilizing Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(ROSpan), nuint) will elide the bounds check and gives codegen similar to the following:

public static unsafe bool IsWhiteSpace2(char c) {
    return (Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(s_charProperties), (nuint)c) & 0x01) == 0;
}

    L0000: movzx eax, cx
    L0003: mov rdx, 0x2196faa09e8
    L000d: test byte ptr [rax+rdx], 1
    L0011: sete al
    L0014: movzx eax, al
    L0017: ret

Wraith2 · 2020-09-07T21:30:31Z

Thanks @tannergooding, I couldn't find the AddByteOffset method overload with nuint so I went with the original suggestion of Unsafe.Add(ref MemoryMarshal.GetReference(CharProperties), ch). I also added readonly to the various instance methods and marked them with aggressive inline. The results of that are:

Method	Mean	Error	StdDev
Master	1.406 ms	0.0052 ms	0.0048 ms
Branch	1.192 ms	0.0038 ms	0.0034 ms

@krwq and @tarekgh can you re-review please?

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs

extract unsafe code to single function and force inline for perf

jkotas · 2020-09-08T09:09:17Z

Method	Mean	Error	StdDev
Master	1.406 ms	0.0052 ms	0.0048 ms
Branch	1.192 ms	0.0038 ms	0.0034 ms

Are these numbers for the test above? They look too good to be true to me. I am not able to replicate these results on my machine.

Wraith2 · 2020-09-08T09:25:57Z

Refactoring makes only a small difference.

Benchmark:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.IO;
using System.Xml;

namespace BenchmarkXmlChartype
{
    class Program
    {
        static void Main(string[] args)
        {
            BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);
        }
	}

    public class Benchmarks
    {
        [Benchmark]
        public int Simple()
        {
            var ms = new MemoryStream();

            using (var w = XmlWriter.Create(ms))
            {
                w.WriteStartDocument();
                w.WriteStartElement("OuterElement");
                for (int i = 0; i < 10000; i++)
                {
                    w.WriteStartElement("InnerElement");
                    w.WriteEndElement();
                }
                w.WriteEndElement();
                w.WriteEndDocument();
            }

            int readCount = 0;

            ms.Position = 0;
            using (var r = XmlReader.Create(ms))
            {
                while (r.Read())
                {
                    readCount += 1;
                }
            }
            return readCount;
        }
    }
}

Results:

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19041.450 (2004/?/20H1)
Intel Core i9-9900K CPU 3.60GHz (Coffee Lake), 1 CPU, 16 logical and 8 physical cores
.NET Core SDK=5.0.100-preview.8.20362.3
  [Host]     : .NET Core 5.0.0 (CoreCLR 5.0.20.36102, CoreFX 5.0.20.36102), X64 RyuJIT
  Job-YRJNPI : .NET Core 5.0 (CoreCLR 42.42.42.42424, CoreFX 42.42.42.42424), X64 RyuJIT

Toolchain=CoreRun

Method	Mean	Error	StdDev
Simple	1.198 ms	0.0085 ms	0.0079 ms

jkotas · 2020-09-08T17:42:59Z

@tarekgh @krwq @buyaa-n Could you please validate the performance impact of this change?

I see about 5% regression on my machine. It would be useful to get numbers for different environments.

jkotas · 2020-09-08T17:44:15Z

It looks good to me otherwise - as long as we convince ourselves that we are happy with the performance.

tarekgh · 2020-09-08T17:46:06Z

@jkotas I'll give it a try on my machine later today. I am going to use the same test used by @Wraith2 he posted in #41756 (comment)

buyaa-n · 2020-09-08T17:52:09Z

I have tried with the existing benchmark we have. We don't have benchmarks exactly for reader/writer, XmlDocument.LoadXml could show the reader perf some level. But I don't understand why XmlNode.GetAttributes having so high regression. I might add some benchmarks for reader/writer and check the result.

Slower	diff/base	Base Median (ns)	Diff Median (ns)
XmlDocumentTests.XmlNodeTests.Perf_XmlNode.GetAttributes	1.66	0.64	1.07
XmlDocumentTests.XmlDocumentTests.Perf_XmlDocument.LoadXml	1.07	2997.90	3203.79
XmlDocumentTests.XmlNodeListTests.Perf_XmlNodeList.GetCount	1.07	10.22	10.92

jkotas · 2020-09-08T17:54:41Z

I am going to use the same test used by @Wraith2 he posted in #41756

Ah, I have not looked carefully at this. You may want to look at the benchmark profile to see whether it is dominated by code paths affected by this change. The test that I have used measured just the Read loop and I have verified that the Read loop is dominated by paths affected by this change.

tarekgh · 2020-09-08T18:05:33Z

We may write the perf test to split the read and the write and get actual numbers for both operations. Of course we'll need to validate the affected code path is used in both cases or not.

tarekgh · 2020-09-09T00:26:21Z

I wrote a test to test the read and write operation separately and I am seeing improvement in the reading cases and around 2% regression in the writing.

Base

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
TestWriter	99,978.386 ns	2,052.2271 ns	5,855.1195 ns	1.7090	-	-	7344 B
TestReader	105,172.380 ns	3,134.4095 ns	9,043.4880 ns	3.5400	-	-	15232 B

Change

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
TestWriter	101,930.709 ns	2,010.0744 ns	4,618.4802 ns	1.7090	-	-	7328 B
TestReader	102,153.794 ns	2,040.5844 ns	4,928.2406 ns	3.5400	-	-	15224 B

I confirmed both reading and writing scenario hitting the code path we changed. Here is examples of the call stacks from the ETW logs:

// Writing stack

 system.private.xml.il!XmlWellFormedWriter.CheckNCName
+ system.private.xml.il!XmlWellFormedWriter.WriteStartElement
 + baseline!Baseline.Program.TestWriter()
  + 669d4deb-b062-4341-8b68-dbdb19ad7888!Runnable_0.WorkloadActionUnroll
   + benchmarkdotnet!Engine.RunIteration
   + benchmarkdotnet!Engine.GetExtraStats


// Reading stack
system.private.xml.il!System.Xml.XmlTextReaderImpl.ParseElement()
+ system.private.xml.il!XmlTextReaderImpl.ParseElementContent
\|+ baseline!Baseline.Program.TestReader()
\|+ system.private.xml.il!System.Xml.XmlTextReaderImpl.Read()
+ system.private.xml.il!XmlTextReaderImpl.ParseDocumentContent
+ baseline!Baseline.Program.TestReader()
+ system.private.xml.il!System.Xml.XmlTextReaderImpl.Read()

both CheckNCName and ParseElement calls XmlCharType we are changing here.

Here is test code I used:

        [Benchmark] public int TestWriter()
        {
            s_msWriter.Position = 0;
            using (var w = XmlWriter.Create(s_msWriter))
            {
                w.WriteStartDocument();
                w.WriteStartElement("OutterElement");
                for (int i = 0; i < 1000; i++)
                {
                    w.WriteStartElement("InnerElement");
                    w.WriteEndElement();
                }
                w.WriteEndElement();
                w.WriteEndDocument();
            }
            return 10;
        }

        [Benchmark] public int TestReader()
        {
            s_msReader.Position = 0;
            using (var r = XmlReader.Create(s_msReader))
            {
                while (r.Read())
                {
                }
            }
            return 20;
        }

s_msWriter and s_msReader are pre-initialized streams. The reader memory is also initialized with the Xml contents (similar to what we wrote in the writer case).

@jkotas I believe the changes is reasonable to accept from perf point of view too. what you think?

jkotas · 2020-09-09T00:42:58Z

@jkotas I believe the changes is reasonable to accept from perf point of view too. what you think?

Agree.

tarekgh · 2020-09-09T00:53:36Z

@Wraith2 thanks a lot for doing this change and accommodating the feedback too.

Wraith2 added 2 commits September 2, 2020 21:06

remove binary, remove resource load and init code

7859432

add massive readonly span

bfd86e5

danmoseley requested review from krwq and tarekgh September 2, 2020 20:55

danmoseley added the area-System.Xml label Sep 2, 2020

tarekgh requested a review from buyaa-n September 2, 2020 20:59

tarekgh approved these changes Sep 2, 2020

View reviewed changes

reformat span with header and hex row numbers

d4ad145

tarekgh approved these changes Sep 2, 2020

View reviewed changes

krwq reviewed Sep 3, 2020

View reviewed changes

krwq approved these changes Sep 3, 2020

View reviewed changes

jkotas reviewed Sep 3, 2020

View reviewed changes

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs Outdated Show resolved Hide resolved

jkotas reviewed Sep 3, 2020

View reviewed changes

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs Outdated Show resolved Hide resolved

rename to follow property name convention

513da05

Merge remote-tracking branch 'dotnet/master' into update-XmlCharType

777cee0

use unsafe to access ROS and add aggressive inlining.

d6b1288

jkotas reviewed Sep 7, 2020

View reviewed changes

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs Outdated Show resolved Hide resolved

jkotas reviewed Sep 7, 2020

View reviewed changes

src/libraries/System.Private.Xml/src/System/Xml/XmlCharType.cs Outdated Show resolved Hide resolved

factor out instances of XmlCharType

1df6248

extract unsafe code to single function and force inline for perf

tarekgh merged commit f1f7681 into dotnet:master Sep 9, 2020

Wraith2 deleted the update-XmlCharType branch September 9, 2020 08:13

ghost locked as resolved and limited conversation to collaborators Dec 7, 2020

Update xml char type #41756

Update xml char type #41756

Uh oh!

Conversation

Wraith2 commented Sep 2, 2020

Uh oh!

Dotnet-GitSync-Bot commented Sep 2, 2020

Uh oh!

ghost commented Sep 2, 2020

Uh oh!

tarekgh commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tarekgh commented Sep 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wraith2 commented Sep 2, 2020

Uh oh!

Wraith2 commented Sep 2, 2020

Uh oh!

tarekgh commented Sep 2, 2020

Uh oh!

danmoseley commented Sep 3, 2020

Uh oh!

tarekgh commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wraith2 commented Sep 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

krwq Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

krwq Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

Wraith2 Sep 3, 2020

Choose a reason for hiding this comment

Uh oh!

tarekgh commented Sep 3, 2020

Uh oh!

Uh oh!

Uh oh!

jkotas commented Sep 3, 2020

Uh oh!

tarekgh commented Sep 3, 2020

Uh oh!

davidwrighton commented Sep 4, 2020

Uh oh!

jkotas commented Sep 6, 2020

Uh oh!

Wraith2 commented Sep 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented Sep 6, 2020

Uh oh!

Wraith2 commented Sep 6, 2020

Uh oh!

Wraith2 commented Sep 7, 2020

Uh oh!

tannergooding commented Sep 7, 2020

Uh oh!

Wraith2 commented Sep 7, 2020

Uh oh!

Uh oh!

Uh oh!

jkotas commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wraith2 commented Sep 8, 2020

Uh oh!

jkotas commented Sep 8, 2020

Uh oh!

jkotas commented Sep 8, 2020

Uh oh!

tarekgh commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

buyaa-n commented Sep 8, 2020

Uh oh!

tarekgh commented Sep 2, 2020 •

edited

Loading

tarekgh commented Sep 2, 2020 •

edited

Loading

tarekgh commented Sep 3, 2020 •

edited

Loading

Wraith2 commented Sep 3, 2020 •

edited

Loading

Wraith2 commented Sep 6, 2020 •

edited

Loading

jkotas commented Sep 8, 2020 •

edited

Loading

tarekgh commented Sep 8, 2020 •

edited

Loading

tarekgh commented Sep 9, 2020 •

edited

Loading