Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternatives to let? #44

Closed
jakobkummerow opened this issue Jan 15, 2021 · 438 comments · Fixed by #63
Closed

Alternatives to let? #44

jakobkummerow opened this issue Jan 15, 2021 · 438 comments · Fixed by #63

Comments

@jakobkummerow
Copy link
Contributor

We have heard from multiple directions (interpreters, toolchains, debuggers) that let is (surprisingly) hard to implement/use. As I am looking into implementing let in V8's non-optimizing compiler, add my name to that list. It's certainly all doable, but it's quite involved. And it would be sad if engine/interpreter implementors had to spend particular implementation effort (as well as runtime CPU+memory cost) on a feature that toolchains then barely use.

So I was wondering whether the let instruction as currently proposed really is the best solution we can collectively come up with.

IIUC, its primary purpose is to provide a way to have non-defaultable locals (in particular: non-nullable reference type locals) in a function. Please do point out if it has additional important use cases.

In the spirit of brainstorming, here are some alternatives that would solve that problem too:

(1) A "locals_initialized_barrier" instruction with semantics: before the barrier, locals may be uninitialized/null, and reading from them incurs the cost of a check; after the barrier, such checks are dropped as all locals are guaranteed to be initialized. Execution of the barrier checks that all non-defaultable locals have been initialized.

(2) A scope that indicates "within this scope, the given locals are non-null". Entering the scope performs null checks on the specified locals. Accessing these locals within the scope needs no null checks.

(3) Introduce "local initializers" modeled after the existing global initializers (which are solving the same problem). We'd need to figure out how exactly to encode these in the text and binary formats. Execution of a function would begin with evaluating these local initializers; afterwards all locals are guaranteed to be initialized. Similar to globals, the rules would be something like: only constant instructions are allowed as local initializers; they can read other locals but only those with lower indices.

(4) Require all locals to be pre-declared (at least by count, maybe also by type). Their initialization then still happens with let as currently proposed. That would prevent the size of the locals list/array/stack from changing dynamically, and would also keep each local's index constant throughout the entire function.

(5) Drop let entirely, at least for now. We can always add it later if we have enough evidence of a concrete need for it. In the meantime, a workaround is to factor out the body of what would have been a let-block as a function, and call that. A JIT might still decide to inline that function. (This would limit Wasm module's ability to fine-tune for maximum performance; but based on binaryen's feedback it's unclear to what extent they'd do that anyway. This is not my preferred solution, just mentioning it here for completeness.)

(I realize that there is a lot of conceptual overlap between these ideas -- which is unsurprising given that they're all solving the same problem, just with slightly different approaches.)

I'm sure there are other possibilities, please suggest them!
If the ideas above have critical flaws making them unsuitable, please point them out! And if you have a favorite among them, please say so as well.
As I wrote above, this issue is meant to be a collective brainstorming (and maybe eventually decision-making) effort.
We don't have to change anything; I just wanted to make sure that this design has received sufficient contemplation before we set it in stone. Especially in light of the feedback we've been getting so far.

@ghost
Copy link

ghost commented Jan 15, 2021

(not part of the Wasm team, just a consumer/user of Wasm compilers)
Regarding point three, I've seen Clang++ prefer implementing its own null pointer function as a function that traps upon being called (e.g. unreachable), instead of allowing genuine out-of-bounds access. Declaring locals with a function that exists, yet will still trap upon being called seems to preserve the same semantics of using a null function reference, while removing the branching ref.is_null check, and in this case, I can see some compilers easily following that route.

@kripken
Copy link
Member

kripken commented Jan 15, 2021

About (3) (local initializers that are similar to global initializers), I think that could be simplified to just referring to an immutable global. That is, in global initializers we need a lot of generality, but here we just need to avoid a null value, and the default value will in most cases not be used and so not matter. So in a module each GC type could have a singleton immutable global containing a "dummy" instance of that type which locals would refer to.

A more compact encoding of that could be to define types with default values. Such types would be defined after the global section, and just pair a type with an immutable global. This would avoid any extra bytes in functions. And such types with custom defaults may have other uses than this.

@ghost
Copy link

ghost commented Jan 15, 2021

If we take Kripken's idea, why do null references even need to exist in the binary format at all?
Compilers could all implement their own nulls, defaults, and is_null checks.

local.get $func_ref
ref.is_null

vs

local.get $func_ref
ref.func $null_func
ref.eq

Contrary to what @rossberg said at WebAssembly/reference-types#71 (comment), Wasm could be a null-free language.

@tlively
Copy link
Member

tlively commented Jan 15, 2021

I like (4) because it has the best ratio of simplification to change, but I would be happy to consider larger changes as well.

Taking @kripken's line of thought a step in the opposite direction that @00ff0000red took it, if every non-nullable type would have a dummy global default value that is never used in normal execution, then why waste code size and tool effort on defining those globals? Instead we could just have the engine generate a meaningless default value for each non-nullable type. The only difference between these dummy values and null values are that dereferencing the dummy values does not trap, but rather does nothing or yields another dummy value.

The benefit of non-nullable values is that you know that there is always meaningful data in them, so I think any solution that involves global default values for non-nullable types somewhat diminishes their benefit to the point where we might as well not have them.

@skuzmich
Copy link

skuzmich commented Jan 15, 2021

Do we still want let instruction, if we were to go with (4) ?

Instead, local.set could form a scope till the end of the current block, where you can safely local.get non-nullable variable.

@tlively
Copy link
Member

tlively commented Jan 15, 2021

I think engines would have to validate that every local.get from a non-nullable local is dominated by a local.set to the same local. It looks like there are efficient algorithms for this, but it would certainly add validation complexity.

@kripken
Copy link
Member

kripken commented Jan 15, 2021

Speaking of domination, another option might be

(6) Trap at runtime if a non-nullable local is used before being assigned to. I know that sounds bad, but in an optimizing tier SSA analysis will anyhow prove that uses are dominated by values (always the case unless the wasm emitter has a bug), so it has no overhead to either compile time or runtime. Of course, in a baseline compiler this will mean null checks on local.get. But it's simple and has no code size overhead.

@skuzmich
Copy link

@tlively we don't have to do general dominance algorithm (at least initially). Validation algorithm could be equivalent to that of let instruction. We still would need to check that struct.get is inside let block, if we were to use let.

@tlively
Copy link
Member

tlively commented Jan 15, 2021

@skuzmich I don't think I understood your suggestion, then. Right now local.set instructions do not introduce new block scopes. Are you suggesting that we have a version of local.set that does introduce a new scope? If so, how would that be different from let?

@skuzmich
Copy link

@tlively Please, allow me to clarify. In my suggestion local.set would not form a proper Wasm block, it would not have a dedicated end instruction. Instead, it would merely create a variable scope from local.set instructions till the end of the current block it is in.

Benefits of this approach:

  • No let instruction in spec. No need to support it in decoder and execution. New rules for local.set apply only to validation phase and are as complex as let instruction rules.
  • No code size overhead for blocktype and end.

As far as I remember, let was designed as it is to simplify function inlining. You would want a proper branch target for inlined returns, and localdef section with relative indexing for inlined locals. But since (4) removes relative indexing from locals defined in let, this is no longer the case, and we might not need this extra instruction.

@tlively
Copy link
Member

tlively commented Jan 15, 2021

Thanks @skuzmich, I understand your suggestion now 👍

@RossTate
Copy link

In WebAssembly/design#1381, I did an analysis of initialization and of let. I found that let was not well-suited for initialization. One example I gave is where a local is initialized on separate paths that join together. Many of the suggestions above would also have problems with that case. At the time I had considered a variety of other fixes to locals/let, but they all made the type system more complicated and without solving more advanced cases like type refinement. The simplest and most expressive solution by far was to just use the stack and add these four simple stack instructions: WebAssembly/design#1381 (comment).

The parts of the discussion in WebAssembly/design#796 regarding register allocation and this article make me think that such an approach could be easier for engines, though others here are better equipped to speak to this. The fact that there's a straightforward way to translate from using locals to using stack instructions makes me suspect it could be easier to generate, but that translation does require tracking more information about the current stack state, so I defer that judgement to others as well.

@RossTate
Copy link

Oh, I forgot, regarding dummy values, that approach might not always be possible. For example, in the presence of polymorphism, you might not be able to construct a dummy value of a type ahead of time because you might not know what that type is. That is, of course, unless all value types are defaultable, which is a viable option but one that brings us back to #40.

@tlively
Copy link
Member

tlively commented Jan 16, 2021

Unfortunately stack manipulation instructions won't help in Binaryen because of its expression-based IR, but if they would be helpful in other contexts, I don't think that should be a big blocker. We're already hacking around things like multivalue results, block parameters, and let in Binaryen IR, so we would just hack around stack manipulation instructions as well. That being said, it would be nice if we had a solution that fit nicely into Binaryen IR.

@RossTate
Copy link

Hmm, I feel like an expression-based IR shouldn't be a big problem. I would suspect this would only affect the serialization phase. That is, you're free to use (your own notion of) locals in your IR, but when you serialize your IR to WebAssembly, rather than emitting local.get/set $index_of_local_in_your_IR you emit get/set $index_of_local_on_stack. The complication is that $index_of_local_on_stack depends on $index_of_local_in_your_IR, which locals are currently initialized (as the uninitialized ones are not on the stack), and how many temporaries are currently on the stack, so your emitter would have to track that as it recurses through your expressions.

@kripken
Copy link
Member

kripken commented Jan 16, 2021

@tlively

The benefit of non-nullable values is that you know that there is always meaningful data in them

The underlying point of them is speed, though, isn't it? (that's my understanding from this discussion) We still get that speed with default values that are never used. They are just lightweight ways to prove a null can't happen there.

Default values also solve @RossTate 's points about inits on joining paths, so I think they're worth considering. (However, other options do too.)

@tlively
Copy link
Member

tlively commented Jan 16, 2021

@RossTate you're totally right, but another (former?) design goal for Binaryen IR is to have it mirror WebAssembly as closely as possible. Obvious as we add more stacky things to WebAsssembly this becomes impossible, though, and I don't think we should be going out of our way to be convenient for Binaryen as we design WebAssembly.

@kripken Yeah, that's true. With a dummy default value, there's always a value meaningful to the engine (i.e. dereferenceable) so the engine can elide any null checks. I noticed a problem with my previous suggestion in which I wrote, "The only difference between these dummy values and null values are that dereferencing the dummy values does not trap, but rather does nothing or yields another dummy value." Specifying hypothetical engine-generated defaults to "do nothing" on e.g. struct.set would require the engine to check whether the struct it is modifying is the default struct, which defeats the purpose of eliding such checks. It would be gross to have an implicit engine-generated default value be modified by user code, so I guess I would rather have user-provided default values after all.

For purposes of compositionality and decompositionality of functions and modules, I prefer a let-like construct (that doesn't change any indices) over function- or module-level initializers. I would be fine with stack instructions too.

@ghost
Copy link

ghost commented Jan 16, 2021

There seems to be something that is being overlooked when it comes to user-provided defaults here, shown by comments like this,

Yeah, that's true. With a dummy default value, there's always a value meaningful to the engine (i.e. dereferenceable) so the engine can elide any null checks. I noticed a problem with my previous suggestion in which I wrote, "The only difference between these dummy values and null values are that dereferencing the dummy values does not trap, but rather does nothing or yields another dummy value."

I think this type of thinking was encouraged by my early comment,

...Declaring locals with a function that exists, yet will still trap upon being called seems to preserve the same semantics of using a null function reference, while removing the branching ref.is_null check...

The thing about defaults, is that they don't necessarily have to be a "dummy" value, in fact, they could very well be genuinely useful values, that are just filled in by default.

Or maybe the language that was compiled was more of a "dynamic" language.
In some dynamic languages, e.g. JS, Python, calling something that isn't a function as one doesn't require an explicit type-check, but instead just throws a runtime exception that can be caught. In this case, let's say that this is a subset of a dynamic language, one that is typed, yet allows null function references.

(assuming exception proposal) In this case, the compiler could define the default function as a function doesn't trap, but instead throws a runtime exception, actually allowing the code to catch it at runtime.

(func $default
    ...
    throw
)

(func
    (local $function ..typed func.. (ref.func $default))
    ...
    try
        local.get $function
        ref.call func
    catch
        ...
    end
)

Consider an author working on a WebGL web game.
if a Wasm trap occurs, they want to inform the client that an error has occurred, abort the process, and clean up some resources.

They could call their function from JS, and wrap it in a large try...catch block, expecting a WebAssembly.RuntimeError due to calling a null function reference.

Alternatiely, they could pass in a host-imported (JS) function that acts as the "default" function reference value, and if their code contains a logical error, leaving their function un-initialized, their JS function would end up being called (which may use the JS debugger, or use an error stack trace to help them debug their code).

And there are plenty of other use cases that haven't even crossed my mind for user-provided defaults, these are just a few right off of the top of my head.

@RossTate
Copy link

While there are many types with easy to generate dummy values, that is not the case for all types. If I've imported a type, I don't necessarily have a dummy value for it. If I'm a polymorphic method and I'm not given an rtt for the type parameter (i.e. a type-erased function), I can't create a value for it. But both those examples involve type variables.

As an example not involving type variables, consider the following Kotlin class for the nodes of a doubly-linked list:

class Node() { var prev : Node; var elem : Int; var next : Node; ... }

This corresponds to the wasm type ref (fix node. (struct (mut (ref Node)) (mut i32) (mut (ref Node)))). It is non-trivial to create a dummy value for this type, and more generally to procedurally create dummy values for types from surface languages.

@kripken
Copy link
Member

kripken commented Jan 16, 2021

@00ff0000red

Yes, I agree. One reason I like the default values option is that defaults can be useful, separately from the issue of non-nullability. Another simple example is an i32 that defaults to 1 or another value, that can be nice for code size.

@RossTate

I agree it's not always easy to create such values, but that would be the responsibility of the language-specific compiler. I assume the Kotlin compiler knows how to create valid values for a doubly-linked list, and it would be practical for it to make a dummy value?

(In your example, btw, the references both forward and back are non-nullable - is this meant to be an infinite list or a cycle? Shouldn't they be nullable in a doubly-linked list?)

@RossTate
Copy link

I should have clarified that this is for a circular doubly linked list, for which the standard is that prev and next are not null.

@RossTate
Copy link

Also, it's not that it's not always easy to create dummy values—for some languages it's not always possible to create dummy values. For example, a cast-free, null-free, and memory-safe implementation of OCaml cannot create dummy values for its local variables—a type variable might represent the empty type.

So going with the current plan that we will eventually support non-null references and parametric polymorphism, we will need to provide a solution to this problem that does not assume the existence of dummy values. (There is the option of changing that plan, but that requires a group discussion.)

@kripken
Copy link
Member

kripken commented Jan 16, 2021

@RossTate

I'm not 100% on board with default values, to be clear, but I'd like to understand your objection. How can OCaml have a type that it uses for local variables, but is incapable of creating a value for? As long as there is some way to create a value for that type, that value could be used for the default. And if there is no way to create any values ever, how/why would the type ever be used? Clearly I'm missing something...

@RossTate
Copy link

Yeah, it's counterintuitive. Here's an example:

let reverse (elems : 'a array) : unit =
    let len=Array.length elems in
    for i=0 to (len/2) do 
        let temp : 'a = elems.(i) in
        elems.(i) <- elems.(len-i-1);
        elems.(len-i-1) <- temp         
    done

In this example, it's possible for 'a to denote a type with no values whatsoever (in which case the array is necessarily empty), yet it has a local variable of type 'a.

@kripken
Copy link
Member

kripken commented Jan 16, 2021

@RossTate

I see, thanks. So this would be a problem if we add parametric polymorphism, and if such a function is used with a type with no values.

I think this should still be solvable, the wasm producer would need to avoid using such a type in such a way (e.g. to pass a different type for it, or to use a different function without a local), but I agree the extra complexity for producers (and also maybe larger wasm sizes) would be a downside of the default values approach.

@rossberg
Copy link
Member

rossberg commented Jan 21, 2021

Thanks for initiating the discussion! But to be honest, what I'm missing, is a concrete problem statement. The OP links to a couple of issues, but the only tangible one is Ben's concern about the effect on calling conventions in an interpreter, which is primarily concerned with the indexing scheme (I believe Lars has raised a similar question somewhere else). The other links are questions rather than actual problems, and I think they have already been answered on the issues.

The essence of Ben's concern could be addressed, I think, by separating the namespace of let indices from that of the existing locals, and e.g. introduce a separate let.get instruction. Slightly less elegant, but not complicated. I'll need to write something up more carefully.

Beyond that, I'm not sure what concrete issues there are and how the suggestions are meant to address them. Note that most of these suggestions are fairly big hammers that would make matters significantly worse in terms of complexity.

(1) A "locals_initialized_barrier" instruction with semantics: before the barrier, locals may be uninitialized/null, and reading from them incurs the cost of a check; after the barrier, such checks are dropped as all locals are guaranteed to be initialized. Execution of the barrier checks that all non-defaultable locals have been initialized.

You'd need to define what exactly "after" means, which ultimately remains a flow-dependent notion. And it requires some form of affinity, capability, or typestate to propagate that information, which would be a significant extension to the type system. This sort of direction tends to be a rabbit hole of complexity.

Notably, introducing a separate barrier instruction does not actually simplify anything over treating local.set as such a "barrier" directly.

Let also increases locality and thereby conveys additional life range information to a VM, which that can take advantage of without more expensive static analysis.

(2) A scope that indicates "within this scope, the given locals are non-null". Entering the scope performs null checks on the specified locals. Accessing these locals within the scope needs no null checks.

This would be somewhat simpler to type than (1) (because more structured), but still requires amending, changing and restoring typing environments with extra information in the algorithm, and still looses all live range information.

Operationally, the runtime checks still strike me as highly undesirable. I think we should strive for a solution that does not require checks. Allowing access to virtual registers to trap is not a very fitting property for an assembly language.

(As an aside, nullability is not the right notion for expressing knowledge about initialisation, since it doesn't apply to non-defaultable types that aren't references -- RTTs would be one example, we might have others in the future.)

(3) Introduce "local initializers" modeled after the existing global initializers (which are solving the same problem). We'd need to figure out how exactly to encode these in the text and binary formats. Execution of a function would begin with evaluating these local initializers; afterwards all locals are guaranteed to be initialized. Similar to globals, the rules would be something like: only constant instructions are allowed as local initializers; they can read other locals but only those with lower indices.

That assumes that one can always construct a dummy value for any given type. That's not generally the case, as RossTate points out correctly. The main purpose of the let-construct is to handle types where you can't construct dummies (or it would be expensive to do so) -- another interesting example would be values of imported types, acquired only through a call to an imported function.

(4) Require all locals to be pre-declared (at least by count, maybe also by type). Their initialization then still happens with let as currently proposed. That would prevent the size of the locals list/array/stack from changing dynamically, and would also keep each local's index constant throughout the entire function.

Similar to (2), this induces extra bookkeeping and complications to environment handling. And questions, such as: would it be legal to reuse the same pre-declared local in multiple lets?

The big conceptual disadvantage is that this destroys composability of let instructions. The ability to locally name a value without affecting the context is one motivation for let. I have repeatedly heard complaints from compiler writers about Wasm's current lack of that ability, e.g., to emit ops that need a scratch variable or achieve macro-definability of certain pseudo instructions.

(5) Drop let entirely, at least for now. We can always add it later if we have enough evidence of a concrete need for it. In the meantime, a workaround is to factor out the body of what would have been a let-block as a function, and call that. A JIT might still decide to inline that function. (This would limit Wasm module's ability to fine-tune for maximum performance; but based on binaryen's feedback it's unclear to what extent they'd do that anyway. This is not my preferred solution, just mentioning it here for completeness.)

You cannot easily outline let bodies that use (let alone mutate) other locals of the function. In practical terms, this would create a glaring hole in the language design that makes non-null references almost unusable in practice. It would be a strong incentive to never define functions to return a non-null reference, for example, which pretty much defeats the purpose.

Hope that sheds some more light on the design space.

Just for completeness, a much simpler solution along the lines of (1) and (2) would be to simply say that every local.get may trap if not yet initialised, and simply leave it to the cleverness of the VM to eliminate the checks where possible. Sort of what JavaScript does with let. Still not a satisfactory property for an assembly language.

@jakobkummerow
Copy link
Contributor Author

Thanks for your thorough reply, Andreas!

I'm aware that the design space is complex here, and it's not obvious what the best solution is. The let instruction as is does have certain nice properties (such as composability, and elegantly fitting in with the type system). The concrete problem statement is pretty much:

language design that makes non-null references almost unusable in practice

When the Binaryen folks started looking into using let in order to optimize away null checks for reference-type locals, their impression was that let was so cumbersome to use (for toolchain logic, and also binary size) that they went as far as seriously questioning why we wanted non-nullable types in Wasm at all!

On the engine side, the let instruction is the one that currently causes "extra bookkeeping and complications to environment handling". Having the number and indices of locals constant throughout a function is much more convenient; and this spreads pretty far (e.g. think of a debugger allowing inspection of locals). As I said, it's all solvable, but in terms of implementation complexity the current let sure seems like one of the heavier solutions.

Regarding (3)/"local initializers", that doesn't necessarily mean constructing dummy values. E.g. if you wanted to use an RTT in a local (maybe a "truly local" rtt.fresh-allocated RTT?), that could be created right there in the initializer, or it could be read from a global. Of course there would be some limitations; that's essentially where the (1)/"barrier" idea came from: one could then use the early part of the function for initializing locals, but without any restrictions on the allowed instructions. Rather than relying on the engine to infer when initialization has been completed for each local, the barrier would be a manual hint to indicate this.

I agree with the skepticism about dummies, especially autogenerated default values. My feeling is that such a concept would only shift the problem around: instead of checking for null (for a nullable reference type), code would have to check for the dummy instead. And there's no good way to apply that idea to non-defaultable types.

@kripken
Copy link
Member

kripken commented Jan 21, 2021

@rossberg

Just for completeness, a much simpler solution along the lines of (1) and (2) would be to simply say that every local.get may trap if not yet initialised, and simply leave it to the cleverness of the VM to eliminate the checks where possible. Sort of what JavaScript does with let. Still not a satisfactory property for an assembly language.

That sounds like option (6)?

I think the key thing we want out of non-nullable types is performance, in at least three ways:

  1. Speed in the optimizing tier.
  2. Speed in the baseline tier.
  3. Download/parsing speed.

The downsides of let include binary size (a block etc. for each let) and toolchain complexity as @jakobkummerow mentioned. An additional form of toolchain complexity is @RossTate 's point about control flow merges: it is not trivial to create a phi with a non-nullable value (as locals cannot be used for it, which otherwise is the typical solution in wasm). All of these harm us on 3 (and possibly 2).

On the other hand option (6) will only harm us on 2, and it seems like the only option that will completely avoid any harm on 3 (as it adds literally no bytes over non-nullable types). I think it is actually a quite nice solution for an assembly language with wasm's constraints.

@tlively
Copy link
Member

tlively commented Jan 21, 2021

About option (4):

Similar to (2), this induces extra bookkeeping and complications to environment handling. And questions, such as: would it be legal to reuse the same pre-declared local in multiple lets?

The big conceptual disadvantage is that this destroys composability of let instructions. The ability to locally name a value without affecting the context is one motivation for let. I have repeatedly heard complaints from compiler writers about Wasm's current lack of that ability, e.g., to emit ops that need a scratch variable or achieve macro-definability of certain pseudo instructions.

I can see that this change would add extra bookkeeping to the spec formalism because it uses a stack of local contexts, but it would remove extra bookkeeping from tools like Binaryen. In Binaryen, all local.get and local.set instructions identify the local they access by a simple index into a vector of locals owned by the function. The function is readily available when visiting an expression, but generally it is much more complicated to carry around any additional local context. The problem with let as currently specified is that it requires carrying extra local context to determine the identity of the local that is being accessed, which is a frequent operation. Essentially the complexity of carrying any local context at all outweighs the benefits of having composable local contexts for us.

I also understand that needing scratch locals would be annoying in many contexts, but for Binaryen specifically, we always just add scratch locals as necessary and optimize them out later.

@taralx
Copy link

taralx commented Jan 21, 2021

Is it crazy to suggest a combination of (4) and a locals.add instruction? The function would declare a number of "dynamic locals" that are assigned numbers past the end of the static locals. The instruction stores a value from the stack in the next dynamic local (or a specified one? but that complicates validation) such that the set of accessible locals is always contiguous from 0. At control flow joins the minimum is taken, or maybe it's a validation error to have a mismatch. Might need locals.pop or similar.

(This starts to feel like the R stack in Forth.)

@RossTate
Copy link

Sorry, the cost I'm referencing is the implementation complexity of producers.

On the one hand, we have a solution where we just add a shorthand instruction, engines do some simple analysis during baseline compilation (at whatever scale they find is actually beneficial), and all producers get nearly all of the potential benefit.

On the other hand, we have a solution where we add non-defaultable locals and rules for initialization, engines are forced to do specifically that validation, and producers must add patchwork for dealing with the limitations of those rules in order to receive any benefit.

Yes, there are a number of details and variations I am overlooking, but none of those seem to affect the big picture of the options we are facing.

Looking at this, I am having a hard time understanding why people would want the latter over the former. The former provides more performance benefits (because it applies to all producers), whereas the latter imposes more burdens on engines and producers.

@askeksa-google
Copy link

My understanding is that monotonicity provides only a fraction of a percent of performance improvement to just the baseline compiler, and that's not taking into account the potential loss in performance caused by having more locals.

In the absence of null check elimination in the baseline compiler, the performance benefit of non-nullable locals is quite significant. One of the things we have seen in practice is a 9% overall peak performance improvement from non-nullable locals in a situation where V8 failed to optimize a single hot function. I have seen similar improvements for time-to-first-frame measurements (which cover a mixture of compilation, baseline code and optimized code), so the isolated benefit for raw baseline code performance is likely somewhat bigger than that.

@rossberg
Copy link
Member

In case that hasn't been clear, I'd also be totally fine to settle on 1a! As far as I'm concerned, it's the best compromise given all the constraints and positions, while still being sufficiently conservative. And it would help to get more extensive practical experience with it before generalising to something more complex that we do not know yet is sufficiently justified.

@RossTate
Copy link

In the absence of null check elimination in the baseline compiler

This is the critical condition. Others' comments have indicated that we could efficiently add basic null-check elimination to the baseline compiler and it would recover the vast majority of the benefits of non-nullable locals. After all, WebAssembly burdens producers with generating semi-structured control flow for the explicit purpose of making it easy for engines to optimize. Here we have an opportunity to justify that burden.

@askeksa-google
Copy link

askeksa-google commented Jul 19, 2022

We can efficiently add basic null-check elimination to the baseline compiler for non-nullable locals if initialization of non-nullable locals obeys monotonicity. The defer option (with or without shorthand) does not fulfill this property. There has been no indicattion in the discussion that we can achieve such elimination without monotonicity, and there are fundamental limitations to one-pass forward analysis that suggest that we can't.

@conrad-watt
Copy link
Contributor

This is the critical condition. Others' comments have indicated that we could efficiently add basic null-check elimination to the baseline compiler and it would recover the vast majority of the benefits of non-nullable locals.

This is going backwards - either we're replaying the monotonicity argument over defer, or you're instead reverting to advocacy for 6.

Your remaining argument against 1a is based on concerns about producer complexity. We have the producers here in this discussion stating that they'd be willing to experiment with 1a. I trust their judgement, and unless we have any other objections I think we should fix our intention to experiment with 1a through a consensus vote so we can free up mental energy for other discussions.

@titzer
Copy link
Contributor

titzer commented Jul 19, 2022

I'm fine moving forward with 1a as well.

@RossTate
Copy link

We can efficiently add basic null-check elimination to the baseline compiler for non-nullable locals if initialization of non-nullable locals obeys monotonicity.

I forget where, but I remember @kripken made a comment somewhere that indicated monotonicity was not particularly necessary for the majority of benefits either. @kripken, is my understanding incorrect?

@kripken
Copy link
Member

kripken commented Jul 19, 2022

I don't think it matters much, but yes, even without monotonicity a baseline compiler could do pretty well on removing non-null checks. See footnote 7 in the doc for a summary + links (starting from the second sentence in that footnote).

But, again, I don't think it matters. While a baseline compiler can do it, all the other options than defer can do it somewhat better - either less work (they know which locals to focus on), or they handle more cases (if they handle joins they can get 100% of cases and not 90%), or both. So it is a disadvantage of defer.

All options here have pluses and minuses. 1a does seem like a reasonable compromise overall. I think the summary doc table makes that clear: a column for 1a there would look the same as 1c except (1) 100% of null checks removed would turn into 90% (since no joins), while the complexity row would be improved (no quadratic behavior). And fully extensible later - I agree with @rossberg that we need more practical experience before doing any more.

@RossTate
Copy link

Thanks for the clarification and the additional good point about how explicit NNLs help save time in analyses.

From what producers have described, 1a is no less of a burden on producers than 1c (or arguably 1b). And it sounds like it offers no benefits to engines over defer+NNLs. That is, the column for 1a would look very similar to 1c, and the table summary still strongly suggests that defer+NNLs (or possibly even defer, but that would require more explicit experiments to judge) has substantially more pragmatic advantages over 1a/1b/1c.

Your remaining argument against 1a is based on concerns about producer complexity.

My argument is that if you have the choice between a design that burdens producers and one that does not and you choose the former, then you should be able to provide a justification why. Since I posed the question of what the justifications would be for 1a over defer, a number of advantages of NNLs over just defer have been offered. Like defer, NNLs alone do not place a real burden on producers. But 1a does, and the only advantage given for 1a has been for simple interpreters that appear not to be used in practice. When producers push on why we require semi-structured control flow, we tell them it facilitates optimization and point them to the (I believe) roughly 2x speed-up in the optimizing tier. When producers push on why we require 1a over simple engine analysis (facilitated by semi-structured control flow), are we going to point them to speed-up in a class of interpreters that aren't deployed in browsers?

@conrad-watt
Copy link
Contributor

Personally, I trust the stakeholders involved here to make their own concerns known. We can always investigate extensions to 1a that reduce the burden on producers in future, once we have more implementation experience.

I think it's important that we find a way to move on from this conversation as (IMO) it's disrupting the wider work of the group, and 1a provides a non-committal way for us to do so that has a realistic chance of gaining consensus.

@tlively
Copy link
Member

tlively commented Jul 19, 2022

It looks like we have a consensus coalescing around 1a (resetting initialization state at the end of every block) for the MVP. Let's try to resolve this discussion right now.

Official online unanimous consensus vote

React to this comment with a 👎 to object to the proposed resolution. If we have any 👎 by the end of day on Thursday this week, we will discuss this at the next meeting and resolve this conversation then instead. If there are no 👎 at the end of day Thursday, we will adopt 1a as the solution for the MVP.

@tlively
Copy link
Member

tlively commented Jul 22, 2022

Excellent, we will go with option 1a for the MVP.

@jakobkummerow
Copy link
Contributor Author

It's great to finally have a solution here! @askeksa-google raised one detail a couple of days ago: we still have a choice to make regarding loop.

Contrary to block, there is no way to reach the end of a loop without executing all of its body at least once; so we could maximise applicability of "option 1a" by not resetting initialization status at the end of loops; in other words the rule would be "NNL initialization status is reset at the end of any block, if, try, as well as any else/catch/catch_all".

The alternative is to reset NNL initialization status at every end (and else/catch/catch_all). Looking at #63, this might make the formulation of the formal spec a bit easier. It provides no meaningful simplification for engines. In theory, this option is wasteful (we'd be throwing out perfectly good initialized-ness information); in practice the difference is probably insignificant (because the situation of initializing an NNL in a loop and then using it afterwards is likely very rare).

Regarding forward compatibility: we cannot change our minds and go from "not resetting" to "resetting". Changing behavior in the opposite direction is not a problem. This is independent of whether we eventually adopt "1b", "1c", or neither, or something else entirely. In particular, if we go with "not resetting" now, we can still do "1b" or "1c" later, we'll just have to maintain "not resetting" at that time (which is not a problem for "1b" and "1c").

I think either option is better than spending another two years debating this, but we should pin it down one way or the other.

I'm going to try an unofficial poll about this to see where we stand as a group. Please 👍 one of the next five posts:

@jakobkummerow
Copy link
Contributor Author

"I strongly believe that we should not reset NNL initialization status at the end of a loop, and would rather argue about it than be outvoted."

@jakobkummerow
Copy link
Contributor Author

"If it was up to me, I'd say we should not reset NNL initialization status at the end of a loop, but I'd rather be outvoted than spend time arguing about it."

@jakobkummerow
Copy link
Contributor Author

"I totally don't care one way or the other."

@jakobkummerow
Copy link
Contributor Author

"If it was up to me, I'd say we should reset NNL initialization status at the end of a loop, but I'd rather be outvoted than spend time arguing about it."

@jakobkummerow
Copy link
Contributor Author

"I strongly believe that we should reset NNL initialization status at the end of a loop, and would rather argue about it than be outvoted."

@titzer
Copy link
Contributor

titzer commented Jul 22, 2022

It's sounds OK to me at first blush to not reset at loop ends, because the only way to get there is falling through.

@conrad-watt
Copy link
Contributor

@titzer this would mean that we break forwards compatibility with the version of 1b that @rossberg and I advocated for (recall the whole "local reasoning" debate). As @jakobkummerow mentioned, we can go from resetting now to not resetting later, so resetting is the most conservative choice in terms of future plans.

@rossberg
Copy link
Member

The current PR does "reset". Frankly, I don't think there is much of a choice here, since not resetting is incompatible with a uniform interpretation of block types under 1b and thus not really conservative, as we intended to be.

@titzer
Copy link
Contributor

titzer commented Jul 22, 2022

Ok, fair points. I agree that resetting is the conservative choice.

@jakobkummerow
Copy link
Contributor Author

FYI, this has been implemented in V8.

copybara-service bot pushed a commit to dart-lang/sdk that referenced this issue Jul 27, 2022
With the decision to support non-nullable locals in WasmGC as per
WebAssembly/function-references#44 the
support in dart2wasm for forcing all locals to be nullable is no
longer needed.

This CL removes that support and cleans up some related nullability
issues. Specifically:

- Remove the `--local-nullability` and `--parameter-nullability`
  commandline options. These are now always enabled.
- Clean out special cases around forced nullable locals throughout the
  compiler.
- Make `thisLocal` and `preciseThisLocal` always non-nullable.
- Make `returnValueLocal` (for storing the return value of `return`
  statements inside `try` blocks with `finally`) always defaultable,
  since its initialization flow crosses control constructs.
- Make type argument parameters non-nullable.
- Make non-nullable `FutureOr` translate to a non-nullable Wasm type.
- Implement the "initialized until end of block" validation scheme
  in the Wasm instruction validator.
- Run tests with the `--experimental-wasm-nn-locals` option. This
  is likely going away soon, but for now we need it.

Change-Id: I05873dd70510af5944d86d37cb5765c7bdef73a9
Cq-Include-Trybots: luci.dart.try:dart2wasm-linux-x64-d8-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/252600
Commit-Queue: Aske Simon Christensen <[email protected]>
Reviewed-by: Joshua Litt <[email protected]>
Reviewed-by: William Hesse <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.