Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial addition of builtin steps #39

Merged
merged 16 commits into from
Aug 21, 2024
Merged

Conversation

ajklein
Copy link
Contributor

@ajklein ajklein commented Aug 9, 2024

Adds steps for all string builtins.

Several different approaches are taken for referencing JS
operations, depending on what the JS spec exposes:

  • Where there's already an abstract operation of the right form, reference that directly
  • Where there's only a JS function, use the Call operation and a reference to the function

Also clean up some of the underlying infra:

  • Make UnwrapString an abstract op
  • Try to use the same names for things in more places
  • Fix a few Bikeshed warnings

Major TODOs include:

  • How to treat errors as traps for Wasm callers
  • Whether to do something more formal for the builtins which operate on i16 arrays

@ajklein ajklein marked this pull request as ready for review August 16, 2024 18:50
@ajklein
Copy link
Contributor Author

ajklein commented Aug 16, 2024

@eqrion this is now ready for review

Copy link
Collaborator

@eqrion eqrion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! There are probably some small improvements to the formalization that we could make, but this is much better than the JS we had before.

Two high level questions, that can get addressed later:

  1. Are we always using the 'original' value of the JS builtin functions (like String.charCodeAt)? Or do we need to add extra language to this effect.
  2. How do we handle the cases where we 'trap'? I looked through the wasm3.0 branch (which has EH), and it's not clear to me how wasm traps are uncatchable by wasm. But I think it's useful for performance to keep that the case here for these new builtins.

Comment on lines +1865 to +1867
Note: This function only takes a mutable i16 array defined in its own recursion group.
If this is an issue for toolchains, we can look into how to relax the function type
while still maintaining good performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets drop the second line of "If this is an issue...good performance". It's not clear how we could do that at this point, and no one has seemed to complain.

The |funcType| of this builtin is `(func (param externref (ref null (array (mut i16))) i32) (result i32))`.

Note: This function only takes a mutable i16 array defined in its own recursion group.
If this is an issue for toolchains, we can look into how to relax the function type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above.

1. Let |string| be [=?=] [$UnwrapString$](|string|).
1. Let |stringLength| be the [=string/length=] of |string|.
1. Let |arrayLength| be the number of elements in |array|.
1. If |start| + |length| > |arrayLength|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be |start| + |stringLength| here.

@eqrion
Copy link
Collaborator

eqrion commented Aug 21, 2024

I'm just going to merge this and then do the fixes myself, plus some extra stuff.

@eqrion eqrion merged commit a3c7562 into WebAssembly:main Aug 21, 2024
7 checks passed
@ajklein
Copy link
Contributor Author

ajklein commented Aug 21, 2024

This looks good! There are probably some small improvements to the formalization that we could make, but this is much better than the JS we had before.

Thanks for taking a look and merging.

Two high level questions, that can get addressed later:

  1. Are we always using the 'original' value of the JS builtin functions (like String.charCodeAt)? Or do we need to add extra language to this effect.

I talked to @syg about this, and I think what we have will be clear enough for now. But there some improvements we could make include:

  1. Using the %String.charCharCodeAt% nomenclature in the next, which is how ES spec refers to originals.
  2. Asking the ES spec to refactor more of the internals of its string operations into abstract ops, to avoid even going through the Call operation.
  1. How do we handle the cases where we 'trap'? I looked through the wasm3.0 branch (which has EH), and it's not clear to me how wasm traps are uncatchable by wasm. But I think it's useful for performance to keep that the case here for these new builtins.

Agreed that the trap behavior continues to be the oddest part of this. I think we probably want to avoid the "throw" language altogether and specify that these trap "as if" they were implemented in Wasm. But I don't think there's any precedent for this today, so we may have to make something up. Curious if @rossberg, @conrad-watt, or @tlively have thoughts here.

@tlively
Copy link
Member

tlively commented Aug 21, 2024

How bad would it be to have the builtin operations throw normal JS errors instead of trapping? I hope the performance impact would be negligible if the erroring path could be placed with other cold code. Throwing a normal JS error would also make polyfills simpler, and would even allow implementing the error path by calling out to JS to redo the operation using the canonical implementation.

Looking at the JS Wasm spec updated for EH, it does contain this line:

Execute the WebAssembly instructions (ref.exn address) (throw_ref).

If we can just arbitrarily execute WebAssembly instructions from the JS spec (which is more than a little questionable IMO; instructions only have semantics within a context), then we can get the trap by writing "Execute the WebAssembly instruction (unreachable)."

@rossberg
Copy link
Member

@eqrion:

I looked through the wasm3.0 branch (which has EH), it's not clear to me how wasm traps are uncatchable by wasm.

Technically, by not being exceptions at all. In the Wasm semantics, they are a completely separate form of result that exception handlers don't recognise.

They are only converted to (JavaScript) exceptions at the JS API boundary, by means of hand-wavy words. And as far as I can tell, they are never converted the other direction. That is, in a sandwich scenario, if a Wasm trap reaches JS, it converts to a JS exception, and if that reaches Wasm again, then it just materialises as a random wrapped JS exception with the JS exception tag.

At least that's what the JS API currently seems to imply. To be honest, I'm not sure if that was intended, implementations actually agree, or whether we have any tests for that behaviour. Before EH, that was not observable, but now it is.

@tlively:

If we can just arbitrarily execute WebAssembly instructions from the JS spec (which is more than a little questionable IMO; instructions only have semantics within a context), then we can get the trap by writing "Execute the WebAssembly instruction (unreachable)."

Agreed on both accounts. :)

Maybe dealing with traps generally needs to be made more precise in the JS API at this point.

@dschuff
Copy link
Member

dschuff commented Aug 28, 2024

+1 on all of these:
You are right that traps never get converted back (that would happen in section 3.10 of create a host function).
And I think it makes sense to make the trap handling more precise, and probably keep traps "uncatchable" rather than letting them stay as JS exceptions when they propagate back into wasm from JS in the sandwich scenario.
I'm not 100% sure what would be the best way to do that.
We could have traps come out to JS as a WebAssembly.Trap, and we'd just add another case where we'd just unwind instead of throwing something into wasm (or just executing unreachable, but see below) but that would be a breaking change. Otherwise we'd need some other way to know that a particular JS value was originally created by propagating a trap out of wasm.
Probably we would want to have some notion of preserving a trap's "identity" the way we do with exceptions (to keep a stack trace for the original trap location instead of creating a new one).

And yes executing random wasm instructions from the JS API, (or this proposed "just unwind instead of throwing" idea which is equally hand-wavy) would be good to improve. Probably we'd want to augment the embedder API, but what we're talking about isn't exactly an API call, so it seems nontrivial.

@ajklein ajklein deleted the add-builtins branch September 4, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants