Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need While/For? #27

Closed
wants to merge 1 commit into from
Closed

Do we need While/For? #27

wants to merge 1 commit into from

Conversation

lukewagner
Copy link
Member

This is more a question than a proposal:

I know the current polyfill has while/for/do-while (b/c I was lazy and it makes it easier to generate valid asm.js) but I remember Ben had mentioned thinking all we needed was unconditional loops. The counter-argument would be if there was some win from having the higher-level while/for/do-while ops (e.g. size or avoiding depending on backend control-flow optimizations). Does the inclusion of these reflect more recent thinking?

It's fine if we leave this open until we can do real experiments. Perhaps we'll find different control flow primitives (that don't simply reflect a source-level primitives) that better capture compile-time control flow optimizations.

@sunfishcode
Copy link
Member

The case for DoWhile:

LLVM prefers to put all loops that it can in "if (x) do { ... } while (x);" form, because there are a bunch of optimizations it simplifies, and because that form is usually what corresponds to the most efficient code in hardware. LLVM calls this loop rotation, GCC calls it loop header copying, and both do it pretty aggressively. WebAssembly programs will likely use this sequence a lot.

If we don't have DoWhile, we can synthesize it with an infinite loop with a break, but it'll be bigger in a common case, and it'll be more work to make fast-compilation backends generate the desired single-branch backedge. Alternatively, we could make WebAssembly producers unrotate their loops back to plain while(x) form, but this isn't always easy to do, since code can be hoisted out of the do-while inside the if, and it'll also be more work for fast-compilation backends to emit single-branch backedges.

@titzer
Copy link

titzer commented May 4, 2015

I was originally thinking just Block = "{S*}", If = " if(E) S else S", and Loop = "while(1) S". From these you can build all the rest. I guess it's a question whether we want to go minimalist or for structures that mirror what an AST for a high-level language looks like.

@lukewagner
Copy link
Member Author

Generally, it seems like there are two kinds of wins to be had:

  1. reduce the binary size
  2. encode some control-flow information that would take extra work in the backend and might be missed (e.g., in a baseline compiler that didn't even build a CFG)

For 1, there is the question of if we want a macro layer which could easily define things like DoWhile()/IfDoWhile() as macros. Also, in the case of control flow nodes, in the polyfill I find that control flow nodes are a small % of total bytes, so the win here will probably be <2% (uncompressed), but I'm going off memory so I'd need to remeasure.

OTOH, I think argument 2 can be a lot more compelling especially from the baseline compiler pov.

@titzer
Copy link

titzer commented May 4, 2015

One reason for having as special For construct is that the standard three-part for loop becomes easier to encode if there are continues in the loop.

for(X,Y,Z) B

=>

X;
For(Y,Z) B

So any continues inside of B go to the update clause Z, and breaks break out normally. Otherwise you end up trying to nest multiple blocks inside to deal with the update clause:

=>

X;
While(Y) Block { Block[update] { B; } Z }

and continues in B actually becomes Break[update].

Of course it gets worse with just Loop, then you have:

=>

X;
Loop[outer] Block { if (!X) break; Block[update] { B; } Z }

@lukewagner
Copy link
Member Author

As you've illustrated, "Avoid unnecessary code duplication for common patterns" seems like a 3rd general justification since, even ignoring binary size, there are machine-code-size and icache-locality issues.

@titzer
Copy link

titzer commented May 4, 2015

I think you end up with basically the same CFG structure if you eagerly fold branches, but my original idea seems increasingly awkward to express basic syntactic control flow patterns.

@lukewagner
Copy link
Member Author

It'd be nice to have the property that, assuming the input to the relooper is reducible, the relooper never had to duplicate code (e.g., this continue-in-for-loop case). I haven't studied the relooper at all; do we have this property or can we?

@kripken
Copy link
Member

kripken commented May 4, 2015

The relooper never has to duplicate code. It introduces a control flow helper variable, and adds code to use it to move around. This is true even for horribly irreducible code: nothing is duplicated, but code execution starts to look like its emulated (i.e., loop-in-a-switch type thing).

In nicely reducible control flow, typically there is little helper variable usage, but it does still happen.

The relooper does optionally duplicate code in some cases, when it sees that doing so reduces code size (small block + avoids use of the helper variable) or is likely to speed things up.

Side note, another very minor reason for having do-while is "one-time loops":

do {
  ...
  if (check()) break; // jumps to NEXT
  ...
} while(0)
// NEXT

The relooper emits these. They could be done with whiles, of course,

while (1) {
  ...
  if (check()) break; // jumps to NEXT
  ...
  break;
}
// NEXT

at a minor cost in code size, so maybe this doesn't matter.

@lukewagner
Copy link
Member Author

Well, then I guess I'd like to refine my query to: could we get a guarantee that the relooper never has to emit the helper variable usage (and start to act like a while-switch loop) except in the case of irreducible control flow?

Also, for the one-time loop, I think you can just use a Block and break out of it.

@kripken
Copy link
Member

kripken commented May 4, 2015

Right, yeah, there were js engine bugs in the block implementation, way back, that's why the relooper avoided it. Could probably be fixed now ;)

No, there isn't a guarantee of avoiding the helper variable given reducible control flow. The relooper design and its proof aren't amenable to that; they focus on showing that any control flow is implementable, then use average results to show that on typical cases, helper variable usage is minimal. It would take a quite different approach to actually prove that reducible control flow can be perfectly reconstructed - also a much more complicated one than the relooper, which defines just 3 basic "shapes" in order to make the implementation and proof reasonable in complexity.

Note that this is indeed a hard problem. For example, this often happens

while (1) {
  ..
  if (...) {
    label = 1;
    break;
  }
  ..
  if (...) {
    label = 1;
    break;
  }
  ..
  if (...) {
    break;
  }
}
if (label === 1) {
  ...
}

In this case, it seems like the obvious solution is to hoist that code back into the loop, even though it logically speaking isn't part of it (those blocks can't return to the loop start, unlike the blocks actually in the loop). But it would need to be hoisted into two places, so it might not be a good idea.

I don't know offhand what C code can lead to that pattern, but it is quite common.

@lukewagner
Copy link
Member Author

Well, my question is more: are there new still-structured-but-not-JS-source-syntax control flow primitives we could add so that you never needed label for reducible control flow? The example you gave is a perfect example: common and suboptimal without some seriously non-trivial control flow analysis.

@jfbastien
Copy link
Member

I would like us to err towards having less operations, and do size tuning on real-world big codebases. I'd also not do the tuning manually, I'd specify pretty generic operations and have the 3-level compression format we discussed (binary encoding, macros over them, and then generic compression).

@lukewagner
Copy link
Member Author

As discussed above, I agree that size isn't a primary concern here but, rather, avoiding cases where reducible control flow is deoptimized (or requires significant work to re-optimize -- look at Alon's example).

@jfbastien
Copy link
Member

OK, that I can get behind :)

@kripken
Copy link
Member

kripken commented May 5, 2015

Well, my question is more: are there new still-structured-but-not-JS-source-syntax control flow primitives we could add so that you never needed label for reducible control flow?

Very interesting question! Ok, after thinking about this, I believe that in fact the relooper hints at one possible approach, as follows.

Currently we have break/continue X where X is an optional label. We could add a second optional parameter, break/continue X -> Y where Y is the label of the target. This is sort of like a limited, structured goto. We would also need to add a "multiple block", which is like a switch over labels. It's easiest to just write an example:

L1: {
    if (x) break L1 -> L2;
    if (y) break L1 -> L3;
}
multiple {
    L2: { ... }
    L3: { ... }
}

This works as follows: break X -> Y breaks on the label X, as it currently does. No change there. But, if the first thing we reach is a multiple, and if Y is one of the keys in the multiple, then execute the code in block Y, then exit the multiple. If Y isn't among the keys, or we reach the multiple by some other means (like a break without a target), then we skip the multiple.

(Minor detail, we also need to be able to write break -> Y, so that we can break to a target, even when we don't have a label to break on.) (Other minor detail: this isn't enough to optimize indirect branches, they would need Y to be a variable.)

Note how we can only reach places we could reach anyhow; and everything we do is 100% doable by setting a helper variable, and checking it in the multiple. So this still keeps things structured (at least to the extent that JS is structured; wikipedia says breaks are reducible, but not structured).

I believe the VM should be capable of emitting code for this that has no helper variable usage at all. By analyzing control flow, it can see where the breaks with targets go, and see the multiples they reach. It can then just emit a branch to that location. In other words, this proposal makes the relooper's helper variable explicit, and when it is in that form, the underlying control flow graph is reconstructible. What is nice is that don't need to: if you didn't bother to write that optimization, it is trivial to implement breaks with targets and multiples the way the relooper does, add one helper variable, set it and test it, and you're done (but you have a little pesky overhead you could have avoided had you written the optimization).

But there's more! The relooper has a proof of being capable of reducing any control flow graph, even irreducible, into reducible control flow using a loop helper variable, without duplicating code. So all the nice stuff we just mentioned applies not only to reducible control flow, but irreducible as well. Where did the overhead of irreducible control flow go?

Consider a worst-case of irreducible control flow, a loop with two blocks, both loop entries, both able to branch to each other. The relooper will emit a loop and ifs,

label = checkA() ? 1 : 2;
while (1) {
  if (label == 1) {
    ...
    label = checkB() ? 1 : 2;
  } else if (label == 2) {
    ...
    label = checkC() ? 1 : 2;
  }
}

Irreducible control flow leads to overhead of much helper variable usage, and a loop, all of which we don't need. But it's clear that we can get rid of all that overhead as discussed above, helper variable and loop as well (it is not trivial, though). And this shows what's really going on here - the pattern above is basically the switch-in-a-loop form, which can represent any control flow graph (except here with just 2, it's ifs and not a switch). And we know that it is possible to pattern-match that form into the underlying basic blocks. So the "branch with a target" proposal is basically an intermediate form: It can also represent arbitrary control flow, but it represents it directly as nice ifs and loops when it can, and as switch-loop when it's too irreducible.

In summary, there are 3 options in this space:

  1. What we currently have: structured control flow, with implicit helper vars. Helper var overhead is present.
  2. The new proposal here: Structured control flow, multiples and break/continue with a target. The browser can either just implement them with a helper var - trivial, but then there is overhead - or it can do an analysis (probably quite fast, but not trivial) and find the underlying control flow graph, thereby eliminating the overhead.
  3. Switch-loop form or basic blocks and branches. The underlying control flow graph is explicit. But, the browser must reconstruct important aspects of the graph in order to do regalloc etc., which would have been easier on structured control flow.

In the proposal here (2), we don't have a guarantee but in practice "well-structured" code tends to lead to few multiples in practice, and the nicer the code is, the less there are. So in typical use the amount of break-with-a-target and multiples that the browser will need to analyze would be quite small. Alternatively, this means that if the browser does the trivial thing of a helper var instead of an analysis, it wouldn't be too bad off - it would be where emscripten output is right now, in fact. So in either case the prospects look good.

@lukewagner
Copy link
Member Author

Cool, that sounds promising! What about this reformulation (which I think is basically equivalent but avoids the need for any analysis to optimize). There is a new statement: WhileWithMultipleExit(cond, body, [exit-stmt-1, exit-stmt-2]) and a new break statement BreakToExit(#label, #exit), where #exit is the index into the array of exits for the WhileWithMultipleExit identified by #label. With this formulation, linking the BreakToExit's to their corresponding exit would be done analogous to how normal breaks are compiled now in a single pass. It also seems like we could have (DoWhile|For|Block)WithMultipleExit as simple variants.

My more general question, though, is whether this is just a patch for a common problem or if this (and maybe 1 or 2 more primitives) could provide a proper 'basis' for compiling all reducible control flow without helper variables. (I get the feeling that there must have been some research in this area, and probably in the 70's.) If indirect control flow requires a helper variable, that seems fine (it is indirect, so some dynamism is expected) as long as we can efficiently capture all the indirect control flow that comes out of C++ (viz., switch).

@jfbastien
Copy link
Member

At this point I think there's a lot of cool/crazy stuff, but I'm a bit afraid of committing sins to support irreducible control flow in V1 without actually being sure that what we're doing is actually OK. I don't want to wait for perfect, but I'm wary of this not even being good enough without some more experience.

Would it be acceptable to say that irreducible control flow won't perform well in V1, and have the relooper emit somewhat bad code using a basic loop? We can then have an unstable version of wasm where we experiment with other loop types, ideally paired with a good sample of benchmarks.

@titzer
Copy link

titzer commented May 5, 2015

It seems like the major stumbling block here is that we haven't specified a
switch node with fallthrough cases yet. If we do that, will Multiple become
redundant? Switch with fallthrough, while, do-while, and if seem to suffice
for the relooper targeting asm.js source, so maybe we should steer in that
direction? Those constructs are easy to grasp and reason about for most
people.

On Tue, May 5, 2015 at 7:09 AM, Alon Zakai [email protected] wrote:

Well, my question is more: are there new
still-structured-but-not-JS-source-syntax control flow primitives we could
add so that you never needed label for reducible control flow?

Very interesting question! Ok, after thinking about this, I believe that
in fact the relooper hints at one possible approach, as follows.

Currently we have break/continue X where X is an optional label. We could
add a second optional parameter, break/continue X -> Y where Y is the
label of the target. This is sort of like a limited, structured goto.
We would also need to add a "multiple block", which is like a switch
over labels. It's easiest to just write an example:

L1: {
if (x) break L1 -> L2;
if (y) break L1 -> L3;
}
multiple {
L2: { ... }
L3: { ... }
}

This works as follows: break X -> Y breaks on the label X, as it
currently does. No change there. But, if the first thing we reach is a
multiple, and if Y is one of the keys in the multiple, then execute the
code in block Y, then exit the multiple. If Y isn't among the keys, or we
reach the multiple by some other means (like a break without a target),
then we skip the multiple.

(Minor detail, we also need to be able to write break -> Y, so that we
can break to a target, even when we don't have a label to break on.) (Other
minor detail: this isn't enough to optimize indirect branches, they would
need Y to be a variable.)

Note how we can only reach places we could reach anyhow; and everything we
do is 100% doable by setting a helper variable, and checking it in the
multiple. So this still keeps things structured (at least to the extent
that JS is structured; wikipedia says breaks are reducible, but not
structured).

I believe the VM should be capable of emitting code for this that has no
helper variable usage at all. By analyzing control flow, it can see where
the breaks with targets go, and see the multiples they reach. It can then
just emit a branch to that location. In other words, this proposal makes
the relooper's helper variable explicit, and when it is in that form, the
underlying control flow graph is reconstructible. What is nice is that don't
need to
: if you didn't bother to write that optimization, it is trivial
to implement breaks with targets and multiples the way the relooper does,
add one helper variable, set it and test it, and you're done (but you have
a little pesky overhead you could have avoided had you written the
optimization).

But there's more! The relooper has a proof of being capable of reducing
any control flow graph, even irreducible, into reducible control flow
using a loop helper variable, without duplicating code. So all the nice
stuff we just mentioned applies not only to reducible control flow, but
irreducible as well. Where did the overhead of irreducible control flow go?

Consider a worst-case of irreducible control flow, a loop with two blocks,
both loop entries, both able to branch to each other. The relooper will
emit a loop and ifs,

label = checkA() ? 1 : 2;
while (1) {
if (label == 1) {
...
label = checkB() ? 1 : 2;
} else if (label == 2) {
...
label = checkC() ? 1 : 2;
}
}

Irreducible control flow leads to overhead of much helper variable usage,
and a loop, all of which we don't need. But it's clear that we can get rid
of all that overhead as discussed above, helper variable and loop as well
(it is not trivial, though). And this shows what's really going on here -
the pattern above is basically the switch-in-a-loop form, which can
represent any control flow graph (except here with just 2, it's ifs and not
a switch). And we know that it is possible to pattern-match that form into
the underlying basic blocks. So the "branch with a target" proposal is
basically an intermediate form: It can also represent arbitrary control
flow, but it represents it directly as nice ifs and loops when it can, and
as switch-loop when it's too irreducible.

In summary, there are 3 options in this space:

  1. What we currently have: structured control flow, with implicit
    helper vars. Helper var overhead is present.
  2. The new proposal here: Structured control flow, multiples and
    break/continue with a target. The browser can either just implement them
    with a helper var - trivial, but then there is overhead - or it can do an
    analysis (probably quite fast, but not trivial) and find the underlying
    control flow graph, thereby eliminating the overhead.
  3. Switch-loop form or basic blocks and branches. The underlying
    control flow graph is explicit. But, the browser must reconstruct important
    aspects of the graph in order to do regalloc etc., which would have been
    easier on structured control flow.

In the proposal here (2), we don't have a guarantee but in practice
"well-structured" code tends to lead to few multiples in practice, and the
nicer the code is, the less there are. So in typical use the amount of
break-with-a-target and multiples that the browser will need to analyze
would be quite small. Alternatively, this means that if the browser does
the trivial thing of a helper var instead of an analysis, it wouldn't be
too bad off - it would be where emscripten output is right now, in fact. So
in either case the prospects look good.


Reply to this email directly or view it on GitHub
WebAssembly/spec#27 (comment).

@lukewagner
Copy link
Member Author

@jfbastien I'm only talking about reducible control flow above. I can close this PR though (the original question is answered so no change needed) and we can perhaps start a new issue to discuss new primitives to better support reducible control flow by avoiding helper variables.

@lukewagner
Copy link
Member Author

@titzer We do have switch-with-fallthrough (unless I'm mistaking your meaning of this). How does this avoid the helper variable in Alon's example above?

@titzer
Copy link

titzer commented May 5, 2015

It wasn't in the list above, but I can add it. The larger question is
whether Multiple is an increase in expressive power over what we have, or
just convenience?

On Tue, May 5, 2015 at 4:57 PM, Luke Wagner [email protected]
wrote:

@titzer https://github.com/titzer We do have switch-with-fallthrough
(unless I'm mistaking your meaning of this). How does this avoid the helper
variable in Alon's example above?


Reply to this email directly or view it on GitHub
WebAssembly/spec#27 (comment).

@lukewagner
Copy link
Member Author

It depends on how you define "expressiveness". With just while+switch we can "express" everything (ye olde argument of Knuth). However, if we include efficiency in our definition, then we can see why our current scheme with If/While/etc is superior b/c it replaces data-dependent switches with direct control flow edges. It's true (and we've talked about since literally the beginning) that we could design optimizations around the label variable (the one we switch (label) over) so that we could erase the label variable entirely and connect breaks to their destination case in the switch. That's just goto, though and it seems like, if we want goto, we should just add goto.

So the point of multiple is to remove another case where we have to have a label variable (one that Alon says is demonstrably common) and so it is increasing expressiveness given the latter definition.

@kripken
Copy link
Member

kripken commented May 5, 2015

@lukewagner: I think WhileWithMultipleExit is not quite enough (need to be able to enter loops as well), and also my proposal was missing a little thing too, I realized over the night. It wouldn't be hard to extend them, though. But first, to address @jfbastien: I understand the hesitation to consider cool/crazy stuff - we could leave this discussion for later, assuming we are willing to add new control flow constructs at a later date. However, I actually think the proposal here is not crazy and not dependent on further research: The "break/continue with target" proposal is based off of

  1. The published proof in the relooper paper, showing that that construct (plus a tiny thing I forgot) is sufficient to represent arbitrary control flow.
  2. Years of experience with the relooper in practice, showing us that converting control flow to the relooper shapes - which the proposal here is based off of - tends to be very efficient in practice.

So I do think we have the theoretical guarantees plus experience that are necessary to make a call here.

I can write up a more formal proposal in a new issue. I would also be ok with deferring this for now, if we just prefer to focus on other stuff for V1, that's fine too (I'll probably still write it up, just to not forget the details for later). But my point in this comment is that I think we do have a solid combination of theoretical results and practical experience already.

@titzer: Yes, as @lukewagner said, the multiple construct would increase expresiveness. It is meant to be used only in conjunction with a break/continue with a target, and nothing else. Together, they can express a limited form of goto, still structured, and easily implemented on top of structured control flow, but by being expressed directly, easily optimized into a direct branch by the VM. So this can't replace regular switches, at least as currently proposed.

@lukewagner
Copy link
Member Author

I'm happy if Alon starts a discussion in a new issue on the new proposal.

@lukewagner lukewagner closed this May 5, 2015
@lukewagner lukewagner deleted the while-for-in-semantics branch May 5, 2015 16:14
@jfbastien
Copy link
Member

OK, I had misunderstood the proposal. I agree to fork the discussion, and if @kripken's approach could potentially be The One True Loop then I agree it may be preferable to use it instead of simple while-based loop.

I'm cautiously enthusiastic for this :-)

@titzer
Copy link

titzer commented May 5, 2015

I haven't read the relooper paper (I promise too though!). It's an
interesting idea but is this extra support for (decidely uncommon)
irreducible control flow really necessary for v1? Try-catch-finally isn't
on the table for v1 either but I expect it to interact with this construct.

On Tue, May 5, 2015 at 6:29 PM, JF Bastien [email protected] wrote:

OK, I had misunderstood the proposal. I agree to fork the discussion, and
if @kripken https://github.com/kripken's approach could potentially be
The One True Loop then I agree it may be preferable to use it instead of
simple while-based loop.

I'm cautiously enthusiastic for this :-)


Reply to this email directly or view it on GitHub
WebAssembly/spec#27 (comment).

@lukewagner
Copy link
Member Author

@titzer The example Alon gave is reducible.

@titzer
Copy link

titzer commented May 5, 2015

Maybe I misunderstood. I thought he was proposing a reducible construct
that is useful for implementing irreducible control flow more easily. Is
the Multiple construct useful otherwise? Is it more fundamental than
alternatives? More efficient?

On Tue, May 5, 2015 at 6:43 PM, Luke Wagner [email protected]
wrote:

@titzer https://github.com/titzer The example Alon gave is reducible.


Reply to this email directly or view it on GitHub
WebAssembly/spec#27 (comment).

@lukewagner
Copy link
Member Author

It's easiest if you look at Alon's example (I tried to find the Markdown syntax for linking to an individual comment but failed; so just search for label = 1). That example is reducible and, Alon says, not uncommon.

@kripken
Copy link
Member

kripken commented May 5, 2015

Yes, there are sort of 3 types of control flow,

  1. Really simple control flow. Simple ifs and whiles and switches.
  2. Slightly complex control flow. Ifs and whiles and switches, plus a few helper var usages. In practice, this is most code that we see. The helper var usages stem from 2 main causes: (1) odd basic block patterns, possibly irreducible - either from the source itself, or introduced by an LLVM optimization pass - and (2) limitations of the relooper's current optimizations. It is unclear how much is due to each of (1) and (2), and also unclear if (2) can ever be completely removed in tractable time. In practice, this has not mattered much, because (1) exists anyhow, and in practice the amount of such helper var usages is very small.
  3. Really spaghetti-like irreducible control flow, which we need a lot of helper var usage to deal with. This is very rare. Generally we might want proper tail calls for this, to just encode the basic blocks and branches directly.

@Nic30
Copy link

Nic30 commented Aug 28, 2023

Was loop unrotate actually implemented in some form in WebAssembly or anywhere else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants