Initial addition of builtin steps #39

ajklein · 2024-08-09T20:55:07Z

Adds steps for all string builtins.

Several different approaches are taken for referencing JS
operations, depending on what the JS spec exposes:

Where there's already an abstract operation of the right form, reference that directly
Where there's only a JS function, use the Call operation and a reference to the function

Also clean up some of the underlying infra:

Make UnwrapString an abstract op
Try to use the same names for things in more places
Fix a few Bikeshed warnings

Major TODOs include:

How to treat errors as traps for Wasm callers
Whether to do something more formal for the builtins which operate on i16 arrays

Needs more detail to properly integrate with GC array ops.

This avoids having to refer to actual Wasm instructions, since after all this is a host function.

ajklein · 2024-08-16T18:50:58Z

@eqrion this is now ready for review

eqrion

This looks good! There are probably some small improvements to the formalization that we could make, but this is much better than the JS we had before.

Two high level questions, that can get addressed later:

Are we always using the 'original' value of the JS builtin functions (like String.charCodeAt)? Or do we need to add extra language to this effect.
How do we handle the cases where we 'trap'? I looked through the wasm3.0 branch (which has EH), and it's not clear to me how wasm traps are uncatchable by wasm. But I think it's useful for performance to keep that the case here for these new builtins.

eqrion · 2024-08-20T15:25:07Z

document/js-api/index.bs

+Note: This function only takes a mutable i16 array defined in its own recursion group.
+If this is an issue for toolchains, we can look into how to relax the function type
+while still maintaining good performance.


Lets drop the second line of "If this is an issue...good performance". It's not clear how we could do that at this point, and no one has seemed to complain.

eqrion · 2024-08-20T16:21:45Z

document/js-api/index.bs

+The |funcType| of this builtin is `(func (param externref (ref null (array (mut i16))) i32) (result i32))`.
+
+Note: This function only takes a mutable i16 array defined in its own recursion group.
+If this is an issue for toolchains, we can look into how to relax the function type


same as above.

eqrion · 2024-08-20T16:22:41Z

document/js-api/index.bs

+1. Let |string| be [=?=] [$UnwrapString$](|string|).
+1. Let |stringLength| be the [=string/length=] of |string|.
+1. Let |arrayLength| be the number of elements in |array|.
+1. If |start| + |length| > |arrayLength|


I think this needs to be |start| + |stringLength| here.

eqrion · 2024-08-21T13:56:09Z

I'm just going to merge this and then do the fixes myself, plus some extra stuff.

ajklein · 2024-08-21T19:53:00Z

This looks good! There are probably some small improvements to the formalization that we could make, but this is much better than the JS we had before.

Thanks for taking a look and merging.

Two high level questions, that can get addressed later:

Are we always using the 'original' value of the JS builtin functions (like String.charCodeAt)? Or do we need to add extra language to this effect.

I talked to @syg about this, and I think what we have will be clear enough for now. But there some improvements we could make include:

Using the %String.charCharCodeAt% nomenclature in the next, which is how ES spec refers to originals.
Asking the ES spec to refactor more of the internals of its string operations into abstract ops, to avoid even going through the Call operation.

How do we handle the cases where we 'trap'? I looked through the wasm3.0 branch (which has EH), and it's not clear to me how wasm traps are uncatchable by wasm. But I think it's useful for performance to keep that the case here for these new builtins.

Agreed that the trap behavior continues to be the oddest part of this. I think we probably want to avoid the "throw" language altogether and specify that these trap "as if" they were implemented in Wasm. But I don't think there's any precedent for this today, so we may have to make something up. Curious if @rossberg, @conrad-watt, or @tlively have thoughts here.

tlively · 2024-08-21T20:45:41Z

How bad would it be to have the builtin operations throw normal JS errors instead of trapping? I hope the performance impact would be negligible if the erroring path could be placed with other cold code. Throwing a normal JS error would also make polyfills simpler, and would even allow implementing the error path by calling out to JS to redo the operation using the canonical implementation.

Looking at the JS Wasm spec updated for EH, it does contain this line:

Execute the WebAssembly instructions (ref.exn address) (throw_ref).

If we can just arbitrarily execute WebAssembly instructions from the JS spec (which is more than a little questionable IMO; instructions only have semantics within a context), then we can get the trap by writing "Execute the WebAssembly instruction (unreachable)."

rossberg · 2024-08-28T12:15:46Z

@eqrion:

I looked through the wasm3.0 branch (which has EH), it's not clear to me how wasm traps are uncatchable by wasm.

Technically, by not being exceptions at all. In the Wasm semantics, they are a completely separate form of result that exception handlers don't recognise.

They are only converted to (JavaScript) exceptions at the JS API boundary, by means of hand-wavy words. And as far as I can tell, they are never converted the other direction. That is, in a sandwich scenario, if a Wasm trap reaches JS, it converts to a JS exception, and if that reaches Wasm again, then it just materialises as a random wrapped JS exception with the JS exception tag.

At least that's what the JS API currently seems to imply. To be honest, I'm not sure if that was intended, implementations actually agree, or whether we have any tests for that behaviour. Before EH, that was not observable, but now it is.

@tlively:

If we can just arbitrarily execute WebAssembly instructions from the JS spec (which is more than a little questionable IMO; instructions only have semantics within a context), then we can get the trap by writing "Execute the WebAssembly instruction (unreachable)."

Agreed on both accounts. :)

Maybe dealing with traps generally needs to be made more precise in the JS API at this point.

dschuff · 2024-08-28T18:45:05Z

+1 on all of these:
You are right that traps never get converted back (that would happen in section 3.10 of create a host function).
And I think it makes sense to make the trap handling more precise, and probably keep traps "uncatchable" rather than letting them stay as JS exceptions when they propagate back into wasm from JS in the sandwich scenario.
I'm not 100% sure what would be the best way to do that.
We could have traps come out to JS as a WebAssembly.Trap, and we'd just add another case where we'd just unwind instead of throwing something into wasm (or just executing unreachable, but see below) but that would be a breaking change. Otherwise we'd need some other way to know that a particular JS value was originally created by propagating a trap out of wasm.
Probably we would want to have some notion of preserving a trap's "identity" the way we do with exceptions (to keep a stack trace for the original trap location instead of creating a new one).

And yes executing random wasm instructions from the JS API, (or this proposed "just unwind instead of throwing" idea which is equally hand-wavy) would be good to improve. Probably we'd want to augment the embedder API, but what we're talking about isn't exactly an API call, so it seems nontrivial.

ajklein added 16 commits August 9, 2024 11:42

Mark up options appropriately

5b629af

Make UnwrapString an abstract-op and use it to implement cast

970e9f6

Consistently use "steps" instead of "algorithm" for builtins

8ae777d

Add test

60ae9af

Add stubs for the rest of the operations

24b7a6a

Add fromCharCode

c85900b

Switch to using the Call abstract op, and add fromCodePoint

da36f03

Add length

7410c0d

Add charCodeAt and codePointAt

a25d2ae

Add concat

b2bd766

Add substring

08353e3

Use angle quotes appropriately

ebd3d5f

Add basic support for fromCharCodeArray

f2e5080

Needs more detail to properly integrate with GC array ops.

Make fromCharCodeArray slightly less formal

76b4db2

This avoids having to refer to actual Wasm instructions, since after all this is a host function.

Add intoCharCodeArray

c94a261

Add equals and compare

2631599

ajklein marked this pull request as ready for review August 16, 2024 18:50

eqrion approved these changes Aug 20, 2024

View reviewed changes

eqrion merged commit a3c7562 into WebAssembly:main Aug 21, 2024
7 checks passed

ajklein deleted the add-builtins branch September 4, 2024 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial addition of builtin steps #39

Initial addition of builtin steps #39

ajklein commented Aug 9, 2024 •

edited

Loading

ajklein commented Aug 16, 2024

eqrion left a comment

eqrion Aug 20, 2024

eqrion Aug 20, 2024

eqrion Aug 20, 2024

eqrion commented Aug 21, 2024

ajklein commented Aug 21, 2024

tlively commented Aug 21, 2024

rossberg commented Aug 28, 2024

dschuff commented Aug 28, 2024

Initial addition of builtin steps #39

Initial addition of builtin steps #39

Conversation

ajklein commented Aug 9, 2024 • edited Loading

ajklein commented Aug 16, 2024

eqrion left a comment

Choose a reason for hiding this comment

eqrion Aug 20, 2024

Choose a reason for hiding this comment

eqrion Aug 20, 2024

Choose a reason for hiding this comment

eqrion Aug 20, 2024

Choose a reason for hiding this comment

eqrion commented Aug 21, 2024

ajklein commented Aug 21, 2024

tlively commented Aug 21, 2024

rossberg commented Aug 28, 2024

dschuff commented Aug 28, 2024

ajklein commented Aug 9, 2024 •

edited

Loading