-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternatives to let
?
#44
Comments
(not part of the Wasm team, just a consumer/user of Wasm compilers) |
About (3) (local initializers that are similar to global initializers), I think that could be simplified to just referring to an immutable global. That is, in global initializers we need a lot of generality, but here we just need to avoid a null value, and the default value will in most cases not be used and so not matter. So in a module each GC type could have a singleton immutable global containing a "dummy" instance of that type which locals would refer to. A more compact encoding of that could be to define types with default values. Such types would be defined after the global section, and just pair a type with an immutable global. This would avoid any extra bytes in functions. And such types with custom defaults may have other uses than this. |
If we take Kripken's idea, why do null references even need to exist in the binary format at all? local.get $func_ref
ref.is_null vs local.get $func_ref
ref.func $null_func
ref.eq Contrary to what @rossberg said at WebAssembly/reference-types#71 (comment), Wasm could be a null-free language. |
I like (4) because it has the best ratio of simplification to change, but I would be happy to consider larger changes as well. Taking @kripken's line of thought a step in the opposite direction that @00ff0000red took it, if every non-nullable type would have a dummy global default value that is never used in normal execution, then why waste code size and tool effort on defining those globals? Instead we could just have the engine generate a meaningless default value for each non-nullable type. The only difference between these dummy values and null values are that dereferencing the dummy values does not trap, but rather does nothing or yields another dummy value. The benefit of non-nullable values is that you know that there is always meaningful data in them, so I think any solution that involves global default values for non-nullable types somewhat diminishes their benefit to the point where we might as well not have them. |
Do we still want Instead, |
I think engines would have to validate that every |
Speaking of domination, another option might be (6) Trap at runtime if a non-nullable local is used before being assigned to. I know that sounds bad, but in an optimizing tier SSA analysis will anyhow prove that uses are dominated by values (always the case unless the wasm emitter has a bug), so it has no overhead to either compile time or runtime. Of course, in a baseline compiler this will mean null checks on |
@tlively we don't have to do general dominance algorithm (at least initially). Validation algorithm could be equivalent to that of |
@skuzmich I don't think I understood your suggestion, then. Right now |
@tlively Please, allow me to clarify. In my suggestion Benefits of this approach:
As far as I remember, |
Thanks @skuzmich, I understand your suggestion now 👍 |
In WebAssembly/design#1381, I did an analysis of initialization and of The parts of the discussion in WebAssembly/design#796 regarding register allocation and this article make me think that such an approach could be easier for engines, though others here are better equipped to speak to this. The fact that there's a straightforward way to translate from using locals to using stack instructions makes me suspect it could be easier to generate, but that translation does require tracking more information about the current stack state, so I defer that judgement to others as well. |
Oh, I forgot, regarding dummy values, that approach might not always be possible. For example, in the presence of polymorphism, you might not be able to construct a dummy value of a type ahead of time because you might not know what that type is. That is, of course, unless all value types are defaultable, which is a viable option but one that brings us back to #40. |
Unfortunately stack manipulation instructions won't help in Binaryen because of its expression-based IR, but if they would be helpful in other contexts, I don't think that should be a big blocker. We're already hacking around things like multivalue results, block parameters, and |
Hmm, I feel like an expression-based IR shouldn't be a big problem. I would suspect this would only affect the serialization phase. That is, you're free to use (your own notion of) locals in your IR, but when you serialize your IR to WebAssembly, rather than emitting |
The underlying point of them is speed, though, isn't it? (that's my understanding from this discussion) We still get that speed with default values that are never used. They are just lightweight ways to prove a null can't happen there. Default values also solve @RossTate 's points about inits on joining paths, so I think they're worth considering. (However, other options do too.) |
@RossTate you're totally right, but another (former?) design goal for Binaryen IR is to have it mirror WebAssembly as closely as possible. Obvious as we add more stacky things to WebAsssembly this becomes impossible, though, and I don't think we should be going out of our way to be convenient for Binaryen as we design WebAssembly. @kripken Yeah, that's true. With a dummy default value, there's always a value meaningful to the engine (i.e. dereferenceable) so the engine can elide any null checks. I noticed a problem with my previous suggestion in which I wrote, "The only difference between these dummy values and null values are that dereferencing the dummy values does not trap, but rather does nothing or yields another dummy value." Specifying hypothetical engine-generated defaults to "do nothing" on e.g. For purposes of compositionality and decompositionality of functions and modules, I prefer a |
There seems to be something that is being overlooked when it comes to user-provided defaults here, shown by comments like this,
I think this type of thinking was encouraged by my early comment,
The thing about defaults, is that they don't necessarily have to be a "dummy" value, in fact, they could very well be genuinely useful values, that are just filled in by default. Or maybe the language that was compiled was more of a "dynamic" language. (assuming exception proposal) In this case, the compiler could define the default function as a function doesn't trap, but instead throws a runtime exception, actually allowing the code to catch it at runtime. (func $default
...
throw
)
(func
(local $function ..typed func.. (ref.func $default))
...
try
local.get $function
ref.call func
catch
...
end
) Consider an author working on a WebGL web game. They could call their function from JS, and wrap it in a large try...catch block, expecting a Alternatiely, they could pass in a host-imported (JS) function that acts as the "default" function reference value, and if their code contains a logical error, leaving their function un-initialized, their JS function would end up being called (which may use the JS And there are plenty of other use cases that haven't even crossed my mind for user-provided defaults, these are just a few right off of the top of my head. |
While there are many types with easy to generate dummy values, that is not the case for all types. If I've imported a type, I don't necessarily have a dummy value for it. If I'm a polymorphic method and I'm not given an As an example not involving type variables, consider the following Kotlin class for the nodes of a doubly-linked list:
This corresponds to the wasm type |
Yes, I agree. One reason I like the default values option is that defaults can be useful, separately from the issue of non-nullability. Another simple example is an I agree it's not always easy to create such values, but that would be the responsibility of the language-specific compiler. I assume the Kotlin compiler knows how to create valid values for a doubly-linked list, and it would be practical for it to make a dummy value? (In your example, btw, the references both forward and back are non-nullable - is this meant to be an infinite list or a cycle? Shouldn't they be nullable in a doubly-linked list?) |
I should have clarified that this is for a circular doubly linked list, for which the standard is that |
Also, it's not that it's not always easy to create dummy values—for some languages it's not always possible to create dummy values. For example, a cast-free, null-free, and memory-safe implementation of OCaml cannot create dummy values for its local variables—a type variable might represent the empty type. So going with the current plan that we will eventually support non-null references and parametric polymorphism, we will need to provide a solution to this problem that does not assume the existence of dummy values. (There is the option of changing that plan, but that requires a group discussion.) |
I'm not 100% on board with default values, to be clear, but I'd like to understand your objection. How can OCaml have a type that it uses for local variables, but is incapable of creating a value for? As long as there is some way to create a value for that type, that value could be used for the default. And if there is no way to create any values ever, how/why would the type ever be used? Clearly I'm missing something... |
Yeah, it's counterintuitive. Here's an example:
In this example, it's possible for |
I see, thanks. So this would be a problem if we add parametric polymorphism, and if such a function is used with a type with no values. I think this should still be solvable, the wasm producer would need to avoid using such a type in such a way (e.g. to pass a different type for it, or to use a different function without a local), but I agree the extra complexity for producers (and also maybe larger wasm sizes) would be a downside of the default values approach. |
Thanks for initiating the discussion! But to be honest, what I'm missing, is a concrete problem statement. The OP links to a couple of issues, but the only tangible one is Ben's concern about the effect on calling conventions in an interpreter, which is primarily concerned with the indexing scheme (I believe Lars has raised a similar question somewhere else). The other links are questions rather than actual problems, and I think they have already been answered on the issues. The essence of Ben's concern could be addressed, I think, by separating the namespace of let indices from that of the existing locals, and e.g. introduce a separate Beyond that, I'm not sure what concrete issues there are and how the suggestions are meant to address them. Note that most of these suggestions are fairly big hammers that would make matters significantly worse in terms of complexity.
You'd need to define what exactly "after" means, which ultimately remains a flow-dependent notion. And it requires some form of affinity, capability, or typestate to propagate that information, which would be a significant extension to the type system. This sort of direction tends to be a rabbit hole of complexity. Notably, introducing a separate barrier instruction does not actually simplify anything over treating Let also increases locality and thereby conveys additional life range information to a VM, which that can take advantage of without more expensive static analysis.
This would be somewhat simpler to type than (1) (because more structured), but still requires amending, changing and restoring typing environments with extra information in the algorithm, and still looses all live range information. Operationally, the runtime checks still strike me as highly undesirable. I think we should strive for a solution that does not require checks. Allowing access to virtual registers to trap is not a very fitting property for an assembly language. (As an aside, nullability is not the right notion for expressing knowledge about initialisation, since it doesn't apply to non-defaultable types that aren't references -- RTTs would be one example, we might have others in the future.)
That assumes that one can always construct a dummy value for any given type. That's not generally the case, as RossTate points out correctly. The main purpose of the let-construct is to handle types where you can't construct dummies (or it would be expensive to do so) -- another interesting example would be values of imported types, acquired only through a call to an imported function.
Similar to (2), this induces extra bookkeeping and complications to environment handling. And questions, such as: would it be legal to reuse the same pre-declared local in multiple lets? The big conceptual disadvantage is that this destroys composability of let instructions. The ability to locally name a value without affecting the context is one motivation for let. I have repeatedly heard complaints from compiler writers about Wasm's current lack of that ability, e.g., to emit ops that need a scratch variable or achieve macro-definability of certain pseudo instructions.
You cannot easily outline let bodies that use (let alone mutate) other locals of the function. In practical terms, this would create a glaring hole in the language design that makes non-null references almost unusable in practice. It would be a strong incentive to never define functions to return a non-null reference, for example, which pretty much defeats the purpose. Hope that sheds some more light on the design space. Just for completeness, a much simpler solution along the lines of (1) and (2) would be to simply say that every |
Thanks for your thorough reply, Andreas! I'm aware that the design space is complex here, and it's not obvious what the best solution is. The
When the Binaryen folks started looking into using On the engine side, the Regarding (3)/"local initializers", that doesn't necessarily mean constructing dummy values. E.g. if you wanted to use an RTT in a local (maybe a "truly local" I agree with the skepticism about dummies, especially autogenerated default values. My feeling is that such a concept would only shift the problem around: instead of checking for null (for a nullable reference type), code would have to check for the dummy instead. And there's no good way to apply that idea to non-defaultable types. |
That sounds like option (6)? I think the key thing we want out of non-nullable types is performance, in at least three ways:
The downsides of On the other hand option (6) will only harm us on 2, and it seems like the only option that will completely avoid any harm on 3 (as it adds literally no bytes over non-nullable types). I think it is actually a quite nice solution for an assembly language with wasm's constraints. |
About option (4):
I can see that this change would add extra bookkeeping to the spec formalism because it uses a stack of local contexts, but it would remove extra bookkeeping from tools like Binaryen. In Binaryen, all local.get and local.set instructions identify the local they access by a simple index into a vector of locals owned by the function. The function is readily available when visiting an expression, but generally it is much more complicated to carry around any additional local context. The problem with I also understand that needing scratch locals would be annoying in many contexts, but for Binaryen specifically, we always just add scratch locals as necessary and optimize them out later. |
Is it crazy to suggest a combination of (4) and a (This starts to feel like the R stack in Forth.) |
Sorry, the cost I'm referencing is the implementation complexity of producers. On the one hand, we have a solution where we just add a shorthand instruction, engines do some simple analysis during baseline compilation (at whatever scale they find is actually beneficial), and all producers get nearly all of the potential benefit. On the other hand, we have a solution where we add non-defaultable locals and rules for initialization, engines are forced to do specifically that validation, and producers must add patchwork for dealing with the limitations of those rules in order to receive any benefit. Yes, there are a number of details and variations I am overlooking, but none of those seem to affect the big picture of the options we are facing. Looking at this, I am having a hard time understanding why people would want the latter over the former. The former provides more performance benefits (because it applies to all producers), whereas the latter imposes more burdens on engines and producers. |
In the absence of null check elimination in the baseline compiler, the performance benefit of non-nullable locals is quite significant. One of the things we have seen in practice is a 9% overall peak performance improvement from non-nullable locals in a situation where V8 failed to optimize a single hot function. I have seen similar improvements for time-to-first-frame measurements (which cover a mixture of compilation, baseline code and optimized code), so the isolated benefit for raw baseline code performance is likely somewhat bigger than that. |
In case that hasn't been clear, I'd also be totally fine to settle on 1a! As far as I'm concerned, it's the best compromise given all the constraints and positions, while still being sufficiently conservative. And it would help to get more extensive practical experience with it before generalising to something more complex that we do not know yet is sufficiently justified. |
This is the critical condition. Others' comments have indicated that we could efficiently add basic null-check elimination to the baseline compiler and it would recover the vast majority of the benefits of non-nullable locals. After all, WebAssembly burdens producers with generating semi-structured control flow for the explicit purpose of making it easy for engines to optimize. Here we have an opportunity to justify that burden. |
We can efficiently add basic null-check elimination to the baseline compiler for non-nullable locals if initialization of non-nullable locals obeys monotonicity. The |
This is going backwards - either we're replaying the monotonicity argument over Your remaining argument against |
I'm fine moving forward with |
I forget where, but I remember @kripken made a comment somewhere that indicated monotonicity was not particularly necessary for the majority of benefits either. @kripken, is my understanding incorrect? |
I don't think it matters much, but yes, even without monotonicity a baseline compiler could do pretty well on removing non-null checks. See footnote 7 in the doc for a summary + links (starting from the second sentence in that footnote). But, again, I don't think it matters. While a baseline compiler can do it, all the other options than All options here have pluses and minuses. |
Thanks for the clarification and the additional good point about how explicit NNLs help save time in analyses. From what producers have described, 1a is no less of a burden on producers than 1c (or arguably 1b). And it sounds like it offers no benefits to engines over defer+NNLs. That is, the column for 1a would look very similar to 1c, and the table summary still strongly suggests that defer+NNLs (or possibly even defer, but that would require more explicit experiments to judge) has substantially more pragmatic advantages over 1a/1b/1c.
My argument is that if you have the choice between a design that burdens producers and one that does not and you choose the former, then you should be able to provide a justification why. Since I posed the question of what the justifications would be for 1a over defer, a number of advantages of NNLs over just defer have been offered. Like defer, NNLs alone do not place a real burden on producers. But 1a does, and the only advantage given for 1a has been for simple interpreters that appear not to be used in practice. When producers push on why we require semi-structured control flow, we tell them it facilitates optimization and point them to the (I believe) roughly 2x speed-up in the optimizing tier. When producers push on why we require 1a over simple engine analysis (facilitated by semi-structured control flow), are we going to point them to speed-up in a class of interpreters that aren't deployed in browsers? |
Personally, I trust the stakeholders involved here to make their own concerns known. We can always investigate extensions to I think it's important that we find a way to move on from this conversation as (IMO) it's disrupting the wider work of the group, and |
It looks like we have a consensus coalescing around 1a (resetting initialization state at the end of every block) for the MVP. Let's try to resolve this discussion right now. Official online unanimous consensus vote React to this comment with a 👎 to object to the proposed resolution. If we have any 👎 by the end of day on Thursday this week, we will discuss this at the next meeting and resolve this conversation then instead. If there are no 👎 at the end of day Thursday, we will adopt 1a as the solution for the MVP. |
Excellent, we will go with option 1a for the MVP. |
It's great to finally have a solution here! @askeksa-google raised one detail a couple of days ago: we still have a choice to make regarding Contrary to The alternative is to reset NNL initialization status at every Regarding forward compatibility: we cannot change our minds and go from "not resetting" to "resetting". Changing behavior in the opposite direction is not a problem. This is independent of whether we eventually adopt "1b", "1c", or neither, or something else entirely. In particular, if we go with "not resetting" now, we can still do "1b" or "1c" later, we'll just have to maintain "not resetting" at that time (which is not a problem for "1b" and "1c"). I think either option is better than spending another two years debating this, but we should pin it down one way or the other. I'm going to try an unofficial poll about this to see where we stand as a group. Please 👍 one of the next five posts: |
"I strongly believe that we should not reset NNL initialization status at the |
"If it was up to me, I'd say we should not reset NNL initialization status at the |
"I totally don't care one way or the other." |
"If it was up to me, I'd say we should reset NNL initialization status at the |
"I strongly believe that we should reset NNL initialization status at the |
It's sounds OK to me at first blush to not reset at loop ends, because the only way to get there is falling through. |
@titzer this would mean that we break forwards compatibility with the version of |
The current PR does "reset". Frankly, I don't think there is much of a choice here, since not resetting is incompatible with a uniform interpretation of block types under 1b and thus not really conservative, as we intended to be. |
Ok, fair points. I agree that resetting is the conservative choice. |
FYI, this has been implemented in V8. |
With the decision to support non-nullable locals in WasmGC as per WebAssembly/function-references#44 the support in dart2wasm for forcing all locals to be nullable is no longer needed. This CL removes that support and cleans up some related nullability issues. Specifically: - Remove the `--local-nullability` and `--parameter-nullability` commandline options. These are now always enabled. - Clean out special cases around forced nullable locals throughout the compiler. - Make `thisLocal` and `preciseThisLocal` always non-nullable. - Make `returnValueLocal` (for storing the return value of `return` statements inside `try` blocks with `finally`) always defaultable, since its initialization flow crosses control constructs. - Make type argument parameters non-nullable. - Make non-nullable `FutureOr` translate to a non-nullable Wasm type. - Implement the "initialized until end of block" validation scheme in the Wasm instruction validator. - Run tests with the `--experimental-wasm-nn-locals` option. This is likely going away soon, but for now we need it. Change-Id: I05873dd70510af5944d86d37cb5765c7bdef73a9 Cq-Include-Trybots: luci.dart.try:dart2wasm-linux-x64-d8-try Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/252600 Commit-Queue: Aske Simon Christensen <[email protected]> Reviewed-by: Joshua Litt <[email protected]> Reviewed-by: William Hesse <[email protected]>
We have heard from multiple directions (interpreters, toolchains, debuggers) that
let
is (surprisingly) hard to implement/use. As I am looking into implementinglet
in V8's non-optimizing compiler, add my name to that list. It's certainly all doable, but it's quite involved. And it would be sad if engine/interpreter implementors had to spend particular implementation effort (as well as runtime CPU+memory cost) on a feature that toolchains then barely use.So I was wondering whether the
let
instruction as currently proposed really is the best solution we can collectively come up with.IIUC, its primary purpose is to provide a way to have non-defaultable locals (in particular: non-nullable reference type locals) in a function. Please do point out if it has additional important use cases.
In the spirit of brainstorming, here are some alternatives that would solve that problem too:
(1) A "locals_initialized_barrier" instruction with semantics: before the barrier, locals may be uninitialized/null, and reading from them incurs the cost of a check; after the barrier, such checks are dropped as all locals are guaranteed to be initialized. Execution of the barrier checks that all non-defaultable locals have been initialized.
(2) A scope that indicates "within this scope, the given locals are non-null". Entering the scope performs null checks on the specified locals. Accessing these locals within the scope needs no null checks.
(3) Introduce "local initializers" modeled after the existing global initializers (which are solving the same problem). We'd need to figure out how exactly to encode these in the text and binary formats. Execution of a function would begin with evaluating these local initializers; afterwards all locals are guaranteed to be initialized. Similar to globals, the rules would be something like: only constant instructions are allowed as local initializers; they can read other locals but only those with lower indices.
(4) Require all locals to be pre-declared (at least by count, maybe also by type). Their initialization then still happens with
let
as currently proposed. That would prevent the size of the locals list/array/stack from changing dynamically, and would also keep each local's index constant throughout the entire function.(5) Drop
let
entirely, at least for now. We can always add it later if we have enough evidence of a concrete need for it. In the meantime, a workaround is to factor out the body of what would have been alet
-block as a function, and call that. A JIT might still decide to inline that function. (This would limit Wasm module's ability to fine-tune for maximum performance; but based on binaryen's feedback it's unclear to what extent they'd do that anyway. This is not my preferred solution, just mentioning it here for completeness.)(I realize that there is a lot of conceptual overlap between these ideas -- which is unsurprising given that they're all solving the same problem, just with slightly different approaches.)
I'm sure there are other possibilities, please suggest them!
If the ideas above have critical flaws making them unsuitable, please point them out! And if you have a favorite among them, please say so as well.
As I wrote above, this issue is meant to be a collective brainstorming (and maybe eventually decision-making) effort.
We don't have to change anything; I just wanted to make sure that this design has received sufficient contemplation before we set it in stone. Especially in light of the feedback we've been getting so far.
The text was updated successfully, but these errors were encountered: