Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend mstat format with information that maps to dependency nodes #83578

Merged
merged 2 commits into from
Mar 28, 2023

Conversation

MichalStrehovsky
Copy link
Member

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in dotnet#75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.
@MichalStrehovsky MichalStrehovsky added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) and removed area-NativeAOT-coreclr labels Mar 17, 2023
@ghost
Copy link

ghost commented Mar 17, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

Author: MichalStrehovsky
Assignees: MichalStrehovsky
Labels:

area-NativeAOT-coreclr

Milestone: -

@ghost
Copy link

ghost commented Mar 17, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.

This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions

This adds an extra instruction encoding the name into each entry of the Method and Types stream.

We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.

If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.

Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey

Author: MichalStrehovsky
Assignees: MichalStrehovsky
Labels:

NO-MERGE, area-NativeAOT-coreclr

Milestone: -

@MichalStrehovsky
Copy link
Member Author

I'm marking this no-merge. If we agree about this format change, we need to update the reader in the performance repo before this merges: https://github.com/dotnet/performance/blob/7eddd9da5bdfbba7809e91846bbb6e6d851f19cb/src/tools/ScenarioMeasurement/SizeOnDisk/MStatProcessor.cs#L8

{
string mangledName = null;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the diff is about which name to emit. We didn't actually emit it because that part was rolled back in #75328, but computing the name was left. I now realized it's the wrong name.

We have two names for a node: one name is the name of the symbol in the object file. The second name is the name of the node in DGML. They're the same 95% of time. The remaining 5% are types - we distinguish between two forms of a type, "constructed" (with a vtable) and "unconstructed" (e.g. type used in a cast). At object writing time they get collapsed into one. But for analysis purposes they're two distinct things. This switches the name to use the one that can be looked up in DGML. The object name can be useful too, but maybe we can just do something hacky like remove trailing constructed suffix to compute it if it's needed in the future.

@vitek-karas
Copy link
Member

In general I don't see a problem with this.

I would rev the version for this (1.2)

I think it would be a good idea to also write the version of the ILC compiler which produced the file into it somewhere.

@vitek-karas
Copy link
Member

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

@Suchiman
Copy link
Contributor

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

Would we even need the names when we just print the Id? Given that the Id is just an int we could even skip out on doing SRM magic to find the string

@vitek-karas
Copy link
Member

I just realized that the IDs only make sense when coupled with the DGML file - but if the goal is to get rid of the DGML file in the future, it doesn't make sense to have its ID there. It might make sense to add new mstat specific ID, but that will probably come once we want to write the dependency graph into it (the way DGML does).

@eerhardt
Copy link
Member

Does this PR mean we are committing to the .mstat file format as the data file that will be emitted by ILC for .NET 8?

See also the discussion in #78671 and sbomer/linker@961474c?short_path=bcecbad#diff-bcecbad3970ec34a98732056ca2a36fdd2c7cfb1f7c285be0329d5116f6cf56c. In the latter, it seems to indicate an .xml file would be the format given that design.

@MichalStrehovsky
Copy link
Member Author

I would rev the version for this (1.2)

I revved it. It's 2.0 because it's breaking.

Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique...

They are guaranteed unique. They're symbol names. New node id once we embed the graph will be simply the name offset. We don't have access to the id dgml writer made up.

Does this PR mean we are committing to the .mstat file format as the data file that will be emitted by ILC for .NET 8?

We don't have a better format to capture things like overloaded methods, or instantiated type/method than IL. Any kind of textual format will end up being harder to parse. Either that, or we settle with a lossy format. A textual lossy format can always be generated from IL. It's impossible the other way around.

MichalStrehovsky added a commit to dotnet/performance that referenced this pull request Mar 19, 2023
@MichalStrehovsky
Copy link
Member Author

I've submitted a PR to the performance repo with an update to read this format: dotnet/performance#2932

Copy link
Member

@sbomer sbomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@MichalStrehovsky MichalStrehovsky removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Mar 28, 2023
@MichalStrehovsky
Copy link
Member Author

I think it would be a good idea to also write the version of the ILC compiler which produced the file into it somewhere.

Realized I didn't respond to this - we do store version of CoreLib and version of CoreLib = version of ILC.

Since perf repo already merged the update without waiting for this, this is good to merge now, removed NO MERGE.

@MichalStrehovsky MichalStrehovsky merged commit 56ee31f into dotnet:main Mar 28, 2023
@MichalStrehovsky MichalStrehovsky deleted the mstat2 branch March 28, 2023 03:45
@ghost ghost locked as resolved and limited conversation to collaborators Apr 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants