-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid emitting a block in the binary format when it has no name #4912
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep visitPossibleBlockContents
and visitBlock
separate? The only difference here is sometimes we want to be more aggressive (use BranchSeeker
) and sometimes we don't. Can we have a single function with a parameter? Probably we can't modify the signature of PostWalker::visitBlock
, but maybe we can add a seekBranch
parameter to visitPossibleBlockContents
and make visitBlock
just call it with the parameter false
?
src/wasm-stack.h
Outdated
// uses, it is equivalent to not having one). This is potentially quadratic, | ||
// but it is extremely rare to have recursion on this function, since it is | ||
// limited by the number of non-block control flow structures (the places that | ||
// call here). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is potentially quadratic,
Isn't this always quadratic, because we traverse all children?
it is extremely rare to have recursion on this function, since it is limited by the number of non-block control flow structures (the places that call here).
How do we have a recursion here? Can BranchSeeker::has
call this BinaryenIRWriter::VisitPossibleBlockContents
back or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, good points, my text was not accurate. Yes, it's always quadratic, just on smaller numbers. By "recursion" I really meant larger numbers (where quadratic time gets bad). I improved the comment now.
Interesting, maybe... but thinking about it, I'm not sure it's worth it. |
// uses, it is equivalent to not having one). Scanning the children of the | ||
// block means that this takes quadratic time, but it will be N^2 for a fairly | ||
// small N since the number of nested non-block control flow structures tends | ||
// to be very reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure... Doesn't BranchSeeker::has
mostly only check branches, or more precisely, name uses defined by DELEGATE_FIELD_SCOPE_NAME_USE
? (Currently br
, br_table
, br_on
, rethrow
, and try
(due to delegate
) have this field)
but it will be N^2 for a fairly small N since the number of nested non-block control flow structures tends to be very reasonable.
I'm not sure what do you mean by "non-block control flow structures". Do you mean branches here?
Also, it is N^2 because we need to check every child, and that N includes all children. We don't know whether it is a branch or not before checking it. So I'm also not sure what you mean by "a fairly small N".
I'm not suggesting the quadratic behavior is bad; it is preexisting anyway and it is for a reason and as you said it doesn't cause unreasonable slowdown. I'm just not very good at understanding the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-block control flow structures are: If, Try, Loop. In each of their arms they call visitPossibleBlockContents
(to try to avoid emitting a block there).
By small N I mean that there are few nested non-block control flow structures. This would be bad:
(if
..
(if
..
(if
..N such ifs..
..
..innermost child..
Those nested ifs lead to the innermost child being scanned N times where N is the number of those ifs (since each if scans all children). So the total time is O(N*M)
if M is the total number of children of the first if. But, N is small in the real world, such if stacks are very rare (unlike block stacks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, maybe... but thinking about it, I'm not sure it's worth it.
visitBlock
andvisitPossibleBlockContents
already had some code duplication, because both iterate on the children, but in slightly different ways. And while we could share more code now after this PR, it's really just one line - to check if the block has a name. Adding a parameterseekBranch
whether to scan the contents would already make it about the same size, I think, and not necessarily more readable.
Hmm, yeah, I thought they are similar but as you said visitBlock
has additional routines handling deeply nested blocks... Nevermind then.
// block means that this takes quadratic time, but it will be N^2 for a fairly | ||
// small N since the number of nested non-block control flow structures tends | ||
// to be very reasonable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could still make this cheaper and also improve the later case by running a pass to remove unused names in linear time, then assuming we've already taken care of it here. We already have a pass that does this, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, RemoveUnusedNames is run when optimizing. So this only matters for the unoptimized case. But it's still nice to handle that since that includes a simple roundtrip with wasm-opt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since that includes a simple roundtrip with wasm-opt
I don't understand what you mean here. I see that this quadratic checking will only produce useful results if we haven't already optimized, but it would still be cheaper and produce better results to run the RemoveUnusedNames
pass unconditionally here no matter what other optimizations we are doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, running the pass might be fine. We'd need to do some more work to be careful to avoid noticeable side effects, as right now binary writing does not modify the Module data, but running that pass would. But I think it could be done probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at RemoveUnusedBrs, it looks like it does more optimizations than we would actually want for this use case. What Module data are you thinking about, though? That pass is function parallel, so I don't think it should change any Module-level state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant any changes to the Module or its contents. Right now, writing a module does not modify the input in any way, but this would make changes to blocks etc. Still, it might be ok...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Yeah, I think that's ok since there shouldn't really be anything happening after writing. This PR lgtm if you'd rather look at that as a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's leave that as a separate topic. It's the existing behavior before this PR and I'd rather not change it here so each PR is focused.
Previously the wat parser would turn this input: (block (nop) ) into something like this: (block $block17 (nop) ) It just added a name all the time, in case the block is referred to by an index later even though it doesn't have a name. This PR makes us rountrip more precisely by not adding such names: if there was no name before, and there is no break by index, then do not add a name. In addition, this will be useful for non-nullable locals since whether a block has a name or not matters there. Like #4912, this makes us more regular in our usage of block names.
We already did this if the block was a child of a control flow structure, which is
the common case (see the new added comment around that code, which clarifies
why). This does the same for all other blocks. This is simple to do and a minor
optimization, but the main benefit from this is just to make our handling of blocks
uniform: after this, we never emit a block with no name. This will make 1a non-
nullable locals easier to handle (since they will be able to assume that property;
and not emitting such blocks avoids some work to handle non-nullable locals
in them).