Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NFC] Walk module to update names in binary parser #6959

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

tlively
Copy link
Member

@tlively tlively commented Sep 19, 2024

The binary parser generally does not know the final names of module
elements when it parses them, or even when it parses instructions that
refer to them, since the name section comes at the end of a binary. The
parser previously kept a list of pointers to locations where each module
element's name would have to be used, then it patched those locations
after parsing the names section to discover the final names.

When the binary parser starts using IRBuilder, the parsed expressions
will be constructed and managed by IRBuilder rather than by the parser
itself. This means that the parser will no longer be able to collect
pointers to places where module element names are used; it won't have
access to the instructions at all.

Since the strategy of collecting locations to patch will no longer work,
switch to a strategy of traversing the module to find and update names
instead. This is generally less efficient because the locations have to
be found before they can be updated, but on the other hand it only
happens when preserving debug info and it is parallelizable anyway.

In preparation for using IRBuilder in the binary parser, eagerly create
Functions when parsing the function section so that they are already
created once we parse the code section. IRBuilder will require the
functions to exist when parsing calls so it can figure out what type
each call should have, even when there is a call to a function whose
body has not been parsed yet.

NFC except that some error messages change to include the new empty
functions.
The purpose of the datacount section is to pre-declare how many data
segments there will be so that engines can allocate space for them
and not have to back patch subsequent instructions in the code section
that refer to them. Once we use IRBuilder in the binary parser, we will
have to have the data segments available by the time we parse
instructions that use them, so eagerly construct the data segments when
parsing the datacount section.
The binary parser generally does not know the final names of module
elements when it parses them, or even when it parses instructions that
refer to them, since the name section comes at the end of a binary. The
parser previously kept a list of pointers to locations where each module
element's name would have to be used, then it patched those locations
after parsing the names section to discover the final names.

When the binary parser starts using IRBuilder, the parsed expressions
will be constructed and managed by IRBuilder rather than by the parser
itself. This means that the parser will no longer be able to collect
pointers to places where module element names are used; it won't have
access to the instructions at all.

Since the strategy of collecting locations to patch will no longer work,
switch to a strategy of traversing the module to find and update names
instead. This is generally less efficient because the locations have to
be found before they can be updated, but on the other hand it only
happens when preserving debug info and it is parallelizable anyway.
@kripken
Copy link
Member

kripken commented Sep 19, 2024

This extra traversal of all the code worries me. Can IRBuilder not collect those pointers as it creates things?

@tlively
Copy link
Member Author

tlively commented Sep 19, 2024

It could in principle, but that would be an expansion of its responsibilities and would make the abstraction fuzzier. I think this separation of concerns is nicer, but I can measure the performance impact.

@tlively
Copy link
Member Author

tlively commented Sep 19, 2024

Ok, I looked into the performance overhead of this change and I think it is acceptable. In the very worst case where --debuginfo is used, but there are not actually any new names to apply, the slowdown to binary parsing is 6%. When there are names to apply, the slowdown is about 0.2%, measured with both 128 threads and 2 threads.

@kripken
Copy link
Member

kripken commented Sep 19, 2024

Can you explain the difference between the two cases? I don't follow that. Isn't this extra pass over the module always done when there are names, so the overhead is fixed?

@tlively
Copy link
Member Author

tlively commented Sep 19, 2024

Right, we currently do the walk whenever --debuginfo is used, and if there are no names to apply, it simply won't make any changes. I had been thinking that the current code path can avoid doing any work in this case, but I don't think that's right, since there are locations where we simply don't set the names before backpatching.

@kripken
Copy link
Member

kripken commented Sep 19, 2024

Yeah, maybe we can avoid it somehow, but I think atm we do figure out names late for some things.

So IIUC, is 6% the relevant overhead, then?

Overall my concern is that adding an extra pass to the entire module can be pretty costly. It's not like adding another ReFinalize at the end of a function, since that function is in the cache already - this is another loading and processing of the entire module from the start, which can be slow on large modules.

@tlively
Copy link
Member Author

tlively commented Sep 20, 2024

I guess 6% is the relevant worst-case overhead we've seen, but I think it's also notable that on real modules we've also seen 0.2%. This shouldn't be any different in terms of caching because all the function bodies have just been parsed and should also be in caches. If we parallelize parsing at some point, the bodies will even be in the caches on the correct threads.

Base automatically changed from binary-parser-eager-data to main September 20, 2024 00:08
@kripken
Copy link
Member

kripken commented Sep 20, 2024

6% seems high to me. Maybe it's because I've been working on a series of such optimizations recently (moderate speedups, some of which involving avoiding an extra pass on the IR), but I think it's worth trying to avoid this regression. Some ideas:

  1. We aren't in a streaming situation like browsers: we have the entire file up front. We could skip to the name section and read it first, then go back and build the IR using those names perhaps? The skipping should be fast as the binary format makes that simple, and the number of sections is small.
  2. Or, could we subclass IRBuilder to add more logic, to add the code that takes the address of names? We just need to be able to hook in there, and could use wasm-delegations-fields to find the fields with names, in just a few lines of code.

This shouldn't be any different in terms of caching because all the function bodies have just been parsed and should also be in caches.

Difference compared to what? Sorry, I'm not following this. My concern is that a big module simply cannot fit in cache - large wasm modules expand into hundreds of MB of IR - so doing another pass over that IR means cache misses for it all. I am guessing that is that 6%?

@tlively
Copy link
Member Author

tlively commented Sep 20, 2024

I'll look into skipping ahead 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants