Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[js-api] JS API exposes function identity #1351

Open
RossTate opened this issue Jun 8, 2020 · 21 comments
Open

[js-api] JS API exposes function identity #1351

RossTate opened this issue Jun 8, 2020 · 21 comments

Comments

@RossTate
Copy link

RossTate commented Jun 8, 2020

In many discussions it has been suggested that funcref values intentionally do not have a notion of identity (i.e. you cannot ask if two funcref values are equal) in order to enable a number of common optimizations with functions. However, the JS API exposes the identity of funcref values, meaning none of those optimizations would be valid in browsers. As two trivial examples just to help clarify the issue, if you have two identical function definitions, you cannot merge them, or if you have a function that just calls an imported function, you cannot simply use the imported function in place of that function.

Was this intentional?

(More generally speaking, any time the JS API can do more with a wasm value than wasm itself can, then that will typically mean that observationally equivalent wasm programs will not be observationally equivalent in browsers, giving wasm effectively two semantics.)

@rossberg
Copy link
Member

rossberg commented Jun 9, 2020

The spec mentions this. It is unfortunate, but generally seems unavoidable, given the interop constraints. What alternative would you suggest?

@RossTate
Copy link
Author

RossTate commented Jun 9, 2020

Ah, thanks for the pointer!

What alternative would you suggest?

I would not have used an Exported Function cache. Of course, in theory this would mean executing all the steps of "create a new Exported Function" every time, but in practice all those steps could be cached, conceptually as an object's prototype. So the only practical difference is that the change would mean allocating a new object with the given (hidden) prototype each time.

@rossberg
Copy link
Member

rossberg commented Jun 9, 2020

Hm, "every time" would mean an allocation on every JS-side access to a table, or a function global, or a higher-order call. Besides the cost, wouldn't it be semantically dubious if you were to get a different value each time you access an immutable global?

@RossTate
Copy link
Author

RossTate commented Jun 9, 2020

Not really; JS code just wouldn't rely on object identity for these values, which is what we want (and which I suspect is pretty common anyways). Plus, if it's an immutable global, you can always cache the value on the JS side anyways if you really want to maintain identity. That cache could be built into the standard JS API, or it could be left to the user, but either way that doesn't affect the semantics of WebAssembly.

This problem will become more and more prominent. exnref is running into the same thing (which is what prompted me to think of this). With the GC proposal, there will be a number of implementation strategies and optimizations made impossible if everything implicitly has an identity. It also exposes implementation details of wasm modules that their maintainers might come to wish were kept concealed.

So while I see how the tradeoffs led to this design decision for functions, I am dubious that those tradeoffs will extend to other values in the same way, at which point we have to address these issues anyways. And then we'd be very close to having one universal semantics for wasm.

@lukewagner
Copy link
Member

It seems unfortunate for JS/wasm interop perf if each time wasm passes a funcref to JS a new JS function object must necessarily be created. I do like the "always generate a new object" semantics for new things we introduce that are not equality-comparable, though (we were talking about this recently for module/instance exports, which also shouldn't have identity). If equality-comparability is intrinsically part of the definition (instead of being exclusively derived from the static type at the wasm-to-JS boundary), then new kinds of functions in the future could be defined to be non-equality-comparable and those new functions could get the "new object each time" semantics.

@RossTate
Copy link
Author

I wouldn't be surprised if the performance cost of allocation would be negligible in the vast majority of cases (likely none of which exist currently). Plus, there are costs to not concealing function identities; they're much more indirect costs, but they might be more significant. For the cases where the performance cost is notable, I suspect the better way to address them is to provide some stronger type.

@taralx
Copy link

taralx commented Jun 10, 2020

Is the idea here to require a new object, or to permit a new object? The latter basically says "don't count on function identity in the embedding", but permits the embedder to reuse the object.

@RossTate
Copy link
Author

In other words, if ever the JS API gets a funcref from a WebAssembly instance in some context where it can't guarantee it's the same as some other previously-retrieved funcref without looking at the identity of the funcrefs, then the idea is to require a new object. Otherwise the embedder is necessarily exhibiting behavior that exposes (i.e. depends upon) the identity of the funcref.

@Ms2ger
Copy link

Ms2ger commented Jun 15, 2020

Is the idea here to require a new object, or to permit a new object? The latter basically says "don't count on function identity in the embedding", but permits the embedder to reuse the object.

I imagine "permit" would lead to observably different behavior between browsers, which is extremely undesirable.

Separately, any proposed changes to the JS API that are not backwards compatible need very strong supporting evidence that no code will be broken and that the significant effort to make the change is justified.

@RossTate
Copy link
Author

Separately, any proposed changes to the JS API that are not backwards compatible need very strong supporting evidence that no code will be broken and that the significant effort to make the change is justified.

Yep. What I'm trying to gauge here is whether there's interest and, even if there isn't in doing so for funcref, whether we should try to do this for future types without identity (and possibly add identity to funcref so that there's no longer a disparity in semantics).

@RossTate
Copy link
Author

For next meeting's agenda (i.e. not tomorrow), I'm planning on suggesting a discussion on what to do about the disparity between wasm and JS's notions of identity. Dunno if it will come to any decision, or even if that'd be a goal for that specific day, but it seems worth getting broader understanding of. Of course, more offline discussion ahead of time will help make the online discussion more fruitful.

To that end, here are two thoughts on the costs of exposing identity.

  1. If funcref has exposed identity in JS, and it's a subtype of anyref (should we add it back), then should anyref also have exposed identity in JS? If so, then all references have exposed identities, ruling out a number of optimizations and implementation strategies (especially regarding GC of references to immutable content...like funcrefs). If not, then to be well-behaved with respect with subtyping every conversion from anyref to JS would likely need to check if the anyref is a funcref and then either expose its identity or allocate a new object based on that, making the cost of casting more expensive. That is, at a high level I'm concerned that having one type with a leaky abstraction will leak into many types with a leaky abstraction.

  2. Especially given that the JS API is the most prominent embedder for WebAssembly, tools for producing, analyzing, and manipulating (e.g. optimizing) WebAssembly will have to worry about identity. So the compilers generating WebAssembly from other languages will have to worry about identities that otherwise do not exist in the source language. And then tools like binaryen will have to make sure to preserve various identities even if they don't exist within WebAssembly's own spec. Maintainers of wasm modules will have to consider what to do about users of their modules developing dependencies on identities that ostensibly should not exist. That is, I'm concerned that discrepancies between the JS API and WebAssembly will impede the various infrastructures around WebAssembly.

Hoping to hear more thoughts!

(Also, given that there are limited ways in which to get a funcref from a module right now, I suspect the likelihood of anything useful truly depending on funcref identities in JS is very low at the moment. But with the upcoming release of the Reference Types proposal, there will be more ways to get a funcref, which increases this likelihood. So if we want to do a breaking change to address these problems, the time for that is as soon as possible.)

@rossberg
Copy link
Member

  1. If funcref has exposed identity in JS, and it's a subtype of anyref (should we add it back), then should anyref also have exposed identity in JS?

The plan of record is to introduce an eqref type that would be a subtype of anyref and only includes those reference types that allow equality. See e.g. the GC proposal (this even used to be part of the reference types proposal but was deferred).

If not, then to be well-behaved with respect with subtyping every conversion from anyref to JS would likely need to check if the anyref is a funcref and then either expose its identity or allocate a new object based on that, making the cost of casting more expensive.

This problem doesn't come from subtyping specifically. For performance reasons, an engine may often want to represent certain reference types differently on both sides (a wrapper object in JS, a direct handle to the thing in Wasm), so for some reference types it has to perform a mapping at the boundaries. Consequently, it may be unavoidable in general that the conversions ToJSValue and ToWAValue have to dispatch on the type of the object.

For anything other than anyref the engine has to inspect the value's type anyway (at least in the JS->Wasm direction), since it has to check that it matches the static type.

So in general, I believe this is an unavoidable cost. We can merely tweak the potential for optimisation.

  1. Especially given that the JS API is the most prominent embedder for WebAssembly, tools for producing, analyzing, and manipulating (e.g. optimizing) WebAssembly will have to worry about identity.

I agree this is unfortunate. The simplest (and lamest) solution would be to say that the identity of JS objects returned by ToWAValue is not specified for certain types. Or we do what you suggested elsewhere and specify that it's always fresh.

However, on second thought, such a semantics would probably only be applicable to a minority of reference types. For functions, we can't afford to break backwards compatibility. For GC objects, you usually want to preserve identity (at least when they are mutable). For externref it naturally has to be preserved. For exotic types like exnref there hardly ever is a reason to pass them out anyway, so the overhead would be irrelevant.

The one relevant case I can think of is immutable structs, possibly.

Also, given that there are limited ways in which to get a funcref from a module right now, I suspect the likelihood of anything useful truly depending on funcref identities in JS is very low at the moment.

Based on my experience with JS usage in the wild, I would bet quite a fortune that this assumption is false and that we'd break the web if we changed the semantics. :( The JS eco system and some of its API rely on function identity, cf. examples like removeEventListener.

@RossTate
Copy link
Author

However, on second thought, such a semantics would probably only be applicable to a minority of reference types.

Even if it's a minority, not having identity for that minority can be hugely useful.

For GC objects, you usually want to preserve identity (at least when they are mutable).

When they are mutable, identity must be preserved. When they're immutable, whether or not they have identity varies a bunch across languages. Whether "usually" is the right word just depends on which languages are more prominent. If you bake decisions into the design based on expectations of which languages will be more prominent, then you help make those languages more prominent because they're better served by the design.

Based on my experience with JS usage in the wild, I would bet quite a fortune that this assumption is false and that we'd break the web if we changed the semantics. :( The JS eco system and some of its API rely on function identity, cf. examples like removeEventListener.

There is no reasonable comparison that can be made between JS usage of functions and of funcrefs in the wild. I don't know how big the difference in number of occurrences of functions versus number of occurrences of funcrefs in JS, but I imagine it is many orders of magnitude.

Also, the suggestion is not to have JS values representing funcrefs not have an identity, since every JS value has identity; the suggestion is to not have repeated fetches of funcrefs from wasm produce the same JS identity. So you can still use removeEventListener with the proposed change to JS funcrefs. What you can't do is rely on table.get returning the same JS funcref in order to remove the event listener you added from table.get. So it's not even occurrences of funcrefs in JS that we need to worry about, which is already likely very low; it's just repeated accesses to table.get returning the same value, which is likely even much lower than the occurrences of funcrefs in JS.

For functions, we can't afford to break backwards compatibility.

According to this study, WebAssembly usage is still pretty low (2000 out of the top million sites). Plus websites that are using WebAssembly are likely very actively maintained (as it's only a few years old) or not critical to anything (i.e. experiments to set up). So even if one of those 2000 sites happened to be sensitive to specifically table.get returning the same or distinct values each time (which on it's own I'm doubtful of), that sensitivity would either be quickly patched by it's active maintainer or not noticed. I think stating we can't afford to break backwards compatibility is an overstatement both of the current observability of the change and of the current prevalence of WebAssembly.

@kripken
Copy link
Member

kripken commented Jun 23, 2020

Plus websites that are using WebAssembly are likely very actively maintained (as it's only a few years old) or not critical to anything (i.e. experiments to set up).

I think there is far too much uncertainty to make that guess. As a counter-example I am familiar with, people ship games on the web using wasm which are not constantly maintained. They are published and left alone. (I'm not claiming that's the common use case, of course.)

@RossTate
Copy link
Author

I think you’re right that that is a use case. More generally, there are likely sites that are essentially WebAssembly programs wrapped by some JS code to integrate it into the ecosystem (e.g. providing imports for sound and graphics). But these sites are unlikely to be relying upon repeated calls to table.get returning the exact same JS object each time.

@rossberg
Copy link
Member

rossberg commented Jun 25, 2020

@RossTate:

Even if it's a minority, not having identity for that minority can be hugely useful.

Agreed in the abstract, yet such a hypothetical would need somewhat more concrete evidence to apply to specific cases. I think we are worrying about premature optimisation way too often lately.

I think stating we can't afford to break backwards compatibility is an overstatement both of the current observability of the change and of the current prevalence of WebAssembly.

TC39's experience with hoping to get away with a seemingly innocent change and then running into years of trouble could probably fill a little book by now. Sometimes even a single breaking line in a sufficiently relevant web page (or a library, which is the bigger problem) can practically kill a language change.

In this particular case it's not difficult to imagine a plausible usage pattern that would break. We would need to have sufficient confidence that such a pattern does not exist, or if it does, that it (and all downstream dependencies) are still actively maintained, and the CG can identify and convince all responsible devs to change their code beforehand. TC39 has done that successful on a couple of limited occasions.

The way TC39 sometimes has tried to figure out the likelihood of breakage beforehand was by browsers implementing telemetry usage counters to survey the wild on a specific pattern. But that approach produces an increasingly weak signal as the amount of off-web usage of JS (and Wasm) grows.

@RossTate
Copy link
Author

I think we are worrying about premature optimisation way too often lately.

Leaving the opportunity to optimize open is different from prematurely optimizing.

We would need to have sufficient confidence that such a pattern does not exist

Yes, and I'm trying to have us discuss that topic now, while there's still a reasonable possibility, rather than just have us give up prematurely.

In this particular case it's not difficult to imagine a plausible usage pattern that would break.

Then demonstrate it. It's much easier to find a program that would break than to prove that no programs would break. At the moment, after filtering out all the testing code I'm having a hard time finding programs on GitHub that even use Table.get. Anyone know of any examples that would help inform the discussion?

@taralx
Copy link

taralx commented Jun 26, 2020

I'm working on assembling a corpus of wasm found "in the wild", so to speak. I'll let you know what I find.

@RossTate
Copy link
Author

RossTate commented Jul 1, 2020

@taralx, that's a very cool undertaking! Do you think you'll have some data in time for Tuesday's meeting? If not, would it make sense to push this back to the following meeting?

@taralx
Copy link

taralx commented Jul 2, 2020

Mmm, I do not want to commit anything yet, but I am also not sure anyone should wait for me because I don't know what I'll find. I'll do my best, but regardless you probably should go ahead on Tuesday.

@RossTate
Copy link
Author

RossTate commented Jul 6, 2020

Thanks for the assessment! I'll stick with the timeline then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants