Extend mstat format with information that maps to dependency nodes #83578

MichalStrehovsky · 2023-03-17T09:20:51Z

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps. This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions This adds an extra instruction encoding the name into each entry of the Method and Types stream. We avoid the problem in dotnet#75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section. If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

ghost · 2023-03-17T09:21:05Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

Author:	MichalStrehovsky
Assignees:	MichalStrehovsky
Labels:	`area-NativeAOT-coreclr`
Milestone:	-

ghost · 2023-03-17T09:21:23Z

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

Author:	MichalStrehovsky
Assignees:	MichalStrehovsky
Labels:	`NO-MERGE`, `area-NativeAOT-coreclr`
Milestone:	-

MichalStrehovsky · 2023-03-17T09:22:44Z

I'm marking this no-merge. If we agree about this format change, we need to update the reader in the performance repo before this merges: https://github.com/dotnet/performance/blob/7eddd9da5bdfbba7809e91846bbb6e6d851f19cb/src/tools/ScenarioMeasurement/SizeOnDisk/MStatProcessor.cs#L8

MichalStrehovsky · 2023-03-17T09:31:32Z

src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/MstatObjectDumper.cs

        {
-            string mangledName = null;


This part of the diff is about which name to emit. We didn't actually emit it because that part was rolled back in #75328, but computing the name was left. I now realized it's the wrong name.

We have two names for a node: one name is the name of the symbol in the object file. The second name is the name of the node in DGML. They're the same 95% of time. The remaining 5% are types - we distinguish between two forms of a type, "constructed" (with a vtable) and "unconstructed" (e.g. type used in a cast). At object writing time they get collapsed into one. But for analysis purposes they're two distinct things. This switches the name to use the one that can be looked up in DGML. The object name can be useful too, but maybe we can just do something hacky like remove trailing constructed suffix to compute it if it's needed in the future.

vitek-karas · 2023-03-17T11:15:33Z

In general I don't see a problem with this.

I would rev the version for this (1.2)

I think it would be a good idea to also write the version of the ILC compiler which produced the file into it somewhere.

vitek-karas · 2023-03-17T11:22:09Z

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

Suchiman · 2023-03-17T11:43:27Z

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

Would we even need the names when we just print the Id? Given that the Id is just an int we could even skip out on doing SRM magic to find the string

vitek-karas · 2023-03-17T11:58:34Z

I just realized that the IDs only make sense when coupled with the DGML file - but if the goal is to get rid of the DGML file in the future, it doesn't make sense to have its ID there. It might make sense to add new mstat specific ID, but that will probably come once we want to write the dependency graph into it (the way DGML does).

eerhardt · 2023-03-17T16:10:35Z

Does this PR mean we are committing to the .mstat file format as the data file that will be emitted by ILC for .NET 8?

See also the discussion in #78671 and sbomer/linker@961474c?short_path=bcecbad#diff-bcecbad3970ec34a98732056ca2a36fdd2c7cfb1f7c285be0329d5116f6cf56c. In the latter, it seems to indicate an .xml file would be the format given that design.

MichalStrehovsky · 2023-03-17T20:18:19Z

I would rev the version for this (1.2)

I revved it. It's 2.0 because it's breaking.

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

They are guaranteed unique. They're symbol names. New node id once we embed the graph will be simply the name offset. We don't have access to the id dgml writer made up.

Does this PR mean we are committing to the .mstat file format as the data file that will be emitted by ILC for .NET 8?

We don't have a better format to capture things like overloaded methods, or instantiated type/method than IL. Any kind of textual format will end up being harder to parse. Either that, or we settle with a lossy format. A textual lossy format can always be generated from IL. It's impossible the other way around.

Should merge together with dotnet/runtime#83578.

MichalStrehovsky · 2023-03-19T07:01:47Z

I've submitted a PR to the performance repo with an update to read this format: dotnet/performance#2932

sbomer

LGTM, thank you!

MichalStrehovsky · 2023-03-28T03:45:25Z

I think it would be a good idea to also write the version of the ILC compiler which produced the file into it somewhere.

Realized I didn't respond to this - we do store version of CoreLib and version of CoreLib = version of ILC.

Since perf repo already merged the update without waiting for this, this is good to merge now, removed NO MERGE.

MichalStrehovsky requested a review from sbomer March 17, 2023 09:20

dotnet-issue-labeler bot added the area-NativeAOT-coreclr label Mar 17, 2023

ghost assigned MichalStrehovsky Mar 17, 2023

MichalStrehovsky added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) and removed area-NativeAOT-coreclr labels Mar 17, 2023

MichalStrehovsky added the area-NativeAOT-coreclr label Mar 17, 2023

MichalStrehovsky commented Mar 17, 2023

View reviewed changes

Update MstatObjectDumper.cs

6e19289

MichalStrehovsky added a commit to dotnet/performance that referenced this pull request Mar 19, 2023

Add support for reading MSTAT v2

6ddb7bf

Should merge together with dotnet/runtime#83578.

MichalStrehovsky mentioned this pull request Mar 19, 2023

Add support for reading MSTAT v2 dotnet/performance#2932

Merged

sbomer approved these changes Mar 27, 2023

View reviewed changes

MichalStrehovsky removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 28, 2023

MichalStrehovsky merged commit 56ee31f into dotnet:main Mar 28, 2023

MichalStrehovsky deleted the mstat2 branch March 28, 2023 03:45

ghost locked as resolved and limited conversation to collaborators Apr 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend mstat format with information that maps to dependency nodes #83578

Extend mstat format with information that maps to dependency nodes #83578

MichalStrehovsky commented Mar 17, 2023

ghost commented Mar 17, 2023

ghost commented Mar 17, 2023

MichalStrehovsky commented Mar 17, 2023

MichalStrehovsky Mar 17, 2023

vitek-karas commented Mar 17, 2023

vitek-karas commented Mar 17, 2023

Suchiman commented Mar 17, 2023

vitek-karas commented Mar 17, 2023

eerhardt commented Mar 17, 2023

MichalStrehovsky commented Mar 17, 2023

MichalStrehovsky commented Mar 19, 2023

sbomer left a comment

MichalStrehovsky commented Mar 28, 2023

Extend mstat format with information that maps to dependency nodes #83578

Extend mstat format with information that maps to dependency nodes #83578

Conversation

MichalStrehovsky commented Mar 17, 2023

ghost commented Mar 17, 2023

ghost commented Mar 17, 2023

MichalStrehovsky commented Mar 17, 2023

MichalStrehovsky Mar 17, 2023

Choose a reason for hiding this comment

vitek-karas commented Mar 17, 2023

vitek-karas commented Mar 17, 2023

Suchiman commented Mar 17, 2023

vitek-karas commented Mar 17, 2023

eerhardt commented Mar 17, 2023

MichalStrehovsky commented Mar 17, 2023

MichalStrehovsky commented Mar 19, 2023

sbomer left a comment

Choose a reason for hiding this comment

MichalStrehovsky commented Mar 28, 2023