-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend mstat format with information that maps to dependency nodes #83578
Conversation
This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps. This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions This adds an extra instruction encoding the name into each entry of the Method and Types stream. We avoid the problem in dotnet#75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section. If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsThis will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps. This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions This adds an extra instruction encoding the name into each entry of the Method and Types stream. We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section. If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own. Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey
|
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsThis will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps. This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions This adds an extra instruction encoding the name into each entry of the Method and Types stream. We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section. If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own. Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey
|
I'm marking this no-merge. If we agree about this format change, we need to update the reader in the performance repo before this merges: https://github.com/dotnet/performance/blob/7eddd9da5bdfbba7809e91846bbb6e6d851f19cb/src/tools/ScenarioMeasurement/SizeOnDisk/MStatProcessor.cs#L8 |
{ | ||
string mangledName = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part of the diff is about which name to emit. We didn't actually emit it because that part was rolled back in #75328, but computing the name was left. I now realized it's the wrong name.
We have two names for a node: one name is the name of the symbol in the object file. The second name is the name of the node in DGML. They're the same 95% of time. The remaining 5% are types - we distinguish between two forms of a type, "constructed" (with a vtable) and "unconstructed" (e.g. type used in a cast). At object writing time they get collapsed into one. But for analysis purposes they're two distinct things. This switches the name to use the one that can be looked up in DGML. The object name can be useful too, but maybe we can just do something hacky like remove trailing constructed
suffix to compute it if it's needed in the future.
In general I don't see a problem with this. I would rev the version for this (1.2) I think it would be a good idea to also write the version of the ILC compiler which produced the file into it somewhere. |
Actually one more thing - would it also make sense to write the node ID for the DGML node? I don't know if the node names are guaranteed unique... |
Would we even need the names when we just print the Id? Given that the Id is just an int we could even skip out on doing SRM magic to find the string |
I just realized that the IDs only make sense when coupled with the DGML file - but if the goal is to get rid of the DGML file in the future, it doesn't make sense to have its ID there. It might make sense to add new mstat specific ID, but that will probably come once we want to write the dependency graph into it (the way DGML does). |
Does this PR mean we are committing to the See also the discussion in #78671 and sbomer/linker@961474c?short_path=bcecbad#diff-bcecbad3970ec34a98732056ca2a36fdd2c7cfb1f7c285be0329d5116f6cf56c. In the latter, it seems to indicate an |
I revved it. It's 2.0 because it's breaking.
They are guaranteed unique. They're symbol names. New node id once we embed the graph will be simply the name offset. We don't have access to the id dgml writer made up.
We don't have a better format to capture things like overloaded methods, or instantiated type/method than IL. Any kind of textual format will end up being harder to parse. Either that, or we settle with a lossy format. A textual lossy format can always be generated from IL. It's impossible the other way around. |
Should merge together with dotnet/runtime#83578.
I've submitted a PR to the performance repo with an update to read this format: dotnet/performance#2932 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Realized I didn't respond to this - we do store version of CoreLib and version of CoreLib = version of ILC. Since perf repo already merged the update without waiting for this, this is good to merge now, removed NO MERGE. |
This will allow crossreferencing MSTAT with DGML files. The ultimate goal is to allow folding information from DGML into MSTAT (so that we don't need two files) but... baby steps.
This is a breaking change to the file format. I don't anticipate a breaking change after this. The DGML folding would be backwards compatible. I considered making it non-breaking, but I think it's just better to bite the bullet now. To fix existing readers, just skip over the new entry. Like in this diff: https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d/revisions
This adds an extra instruction encoding the name into each entry of the Method and Types stream.
We avoid the problem in #75328 (where the strings ran over the 16 MB total length limit) by generating the textual strings into a separate PE section. The record contains an integer index into this section.
If you would like to read it, it's not possible with Cecil, but the superior metadata reader in S.R.Metadata comes to the rescue. This is the ~diff you'd want to apply (and make a similar adjustment for methods): https://gist.github.com/MichalStrehovsky/195967f3a117663b6340cd828a52dfc7/revisions. If you're worried about duplicate loading, I'd suggest creating a memory mapped view of the file and sharing the same view between Cecil and S.R.Metadata. S.R.Metadata is pretty much no overhead on it's own.
Cc @dotnet/ilc-contrib @Suchiman @kant2002 @ShreyasJejurkar @eerhardt @amcasey