Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Final opcodes #372

Merged
merged 13 commits into from
Jul 31, 2023
Merged

Final opcodes #372

merged 13 commits into from
Jul 31, 2023

Conversation

rossberg
Copy link
Member

@rossberg rossberg commented May 9, 2023

Here is my proposal for the final opcode reordering (fixes #337 and #370).

Trying to group related instructions together and leave space for future extensions in respective opcode ranges.

Also reordering type opcodes to group nullary constructors together and before non-nullary ones (c.f. #337).

@jakobkummerow
Copy link
Contributor

Looks OK.

The bulk array instructions (from #363) appear to be missing.

Personally, I would have compacted the opcode encoding space a lot more, essentially to assign a contiguous range. My reasoning is that whatever gaps we choose to leave now have an exceedingly small likelihood to turn out to have appropriate size, so as we add instructions in future proposals we'll likely end up with a wildly fragmented mix of remaining gaps that were too large and others that were too small where we then had to simply append new opcodes at the end of the range or squeeze them into "non-fitting" gaps. OTOH, it's not a big issue because (1) at the end of the day we're just choosing between different kinds of ugly for the encoding space a couple of proposals down the line, and (2) prettiness or ugliness of the binary encoding doesn't actually matter.

@tlively
Copy link
Member

tlively commented May 9, 2023

FWIW, we tried to leave useful gaps in the SIMD proposal opcode space, and they did not turn out to be useful at all. If we had instead used a compact opcode space, we might not have needed to use an extra LEB byte for the relaxed SIMD opcodes.

I don't feel strongly either way, though.

@titzer
Copy link
Contributor

titzer commented May 9, 2023

+1 to just compacting the opcode space without gaps. The 1-byte opcode space is important, but IMO there is marginal benefit, if any, in leaving holes in the prefixed opcode spaces.

@askeksa-google
Copy link

Nit: I'd suggest to keep different encodings for the same instruction together, i.e. put ref.test (ref null ht) at 0xfb41and ref.cast (ref ht) at 0xfb42.

@rossberg
Copy link
Member Author

I agree that we should fill up gaps before going to two bytes, but currently we are far away from hitting that boundary, and never may. Is there any particular advantage in compacting before that? I thought that leaving some gaps actually worked reasonably well for some of the Wasm 1.0 code space.

@askeksa-google, good point, fixed.

@jakobkummerow
Copy link
Contributor

Is there any particular advantage to leaving gaps?

FWIW, here's the summary:
This PR currently lists 27 0xfb-prefixed instructions (still missing 4 array bulk instructions), and pads them with 53 gaps. These gaps are located in the following ranges of opcodes:

  • 6 in "struct allocators"
  • 4 in "struct accessors"
  • 11 in "array allocators" (4 of which might be used for the bulk instructions)
  • 11 in "array accessors" (4 of which might be used for the bulk instructions)
  • 5 in "i31 instructions"
  • 6 in "extern conversions"
  • 4 in "cast instructions", non-branching group
  • 6 in "cast instructions", branching group (assuming 0x4f is seen as the end of the range)

One data point to illustrate the low likelihood of gap sizes being lucky guesses: Aske has previously suggested a set of 8 branching i31 instructions, which (if we end up deciding to adopt them in a future proposal) wouldn't have an obvious fit in any of these gaps.

@jakobkummerow jakobkummerow mentioned this pull request Jun 1, 2023
@titzer
Copy link
Contributor

titzer commented Jun 1, 2023

I don't think we should leave gaps in the prefixed opcode spaces. First, as mentioned above, I think it's unlikely we'll predict the size of gaps properly, meaning there will likely always be vestigial gaps. Second, leaving gaps reduces the number of two-byte encodings being utilized; in the future, will new instructions start filling in gaps if they'd otherwise overflow into the 3 byte space? Third, engines may way to use some of the unused opcode space for internal things, like quickened versions of bytecodes, which benefits interpreters.

@rossberg rossberg mentioned this pull request Jun 22, 2023
53 tasks
@rossberg
Copy link
Member Author

Coming back to this:

  • We left gaps in the MVP opcodes, and that proved relatively useful, both for new types and for new control instructions.
  • The proposed opcodes do not just leave gaps, they also provide some additional prefixing hints. For example,
    • 0xfb0X struct instructions
    • 0xfb1X/0xfb2X array instructions
    • 0xfb3X other reference instructions
    • 0xfb4X cast instructions
      I occasionally found such a thing useful in the past when I had to stare at binary code.
  • I happily believe this approach failed for SIMD, but that's because SIMD is extremely large and irregular.
  • I'm not sure I see the disadvantages. Note that we still have plenty of room in the single-byte space. We can still start filling up gaps or flow over into other prefixes once the need arises.

@titzer, I didn't follow your point about custom opcodes. Isn't that orthogonal?

@titzer
Copy link
Contributor

titzer commented Jun 26, 2023

@titzer, I didn't follow your point about custom opcodes. Isn't that orthogonal?

It's a minor point, but it's often useful to have a few opcodes left over for the VM, in each of the opcode spaces. This is easier to do if the opcode space fills up from the lowest numbers first. It doesn't work very well to try to use an opcode that is currently a hole and later filled up.

We left gaps in the MVP opcodes, and that proved relatively useful, both for new types and for new control instructions.

The 1-byte opcode space is a much more important space, but in retrospect, I think we would have been fine to have not left holes in it.

@rossberg
Copy link
Member Author

This is easier to do if the opcode space fills up from the lowest numbers first.

I see, but this is mostly a question of leaving sufficient space at the end, right? Which I would think we have even now.

@tlively
Copy link
Member

tlively commented Jun 27, 2023

Let's discuss this at the subgroup meeting tomorrow. If we don't come to an agreement through discussion, I propose that we take a popularity vote to settle the issue. It doesn't seem important enough to be worth spending much additional energy or time on.

@titzer
Copy link
Contributor

titzer commented Jun 27, 2023

I won't be able to attend tomorrow because of a conflict, but I would generally prefer that we pack the prefixed opcode spaces rather than leaving holes. Is a wider CG discussion warranted, as this could be a potentially precedent-setting decision?

@askeksa-google
Copy link

If the purpose is to have some nice groupings, we could still do that while leaving significantly smaller holes, e.g.:

  • 0x00+: Struct instructions
  • 0x10+: Array instructions (including bulk)
  • 0x20+: Conversions (extern conversions from 0x24)
  • 0x28+: Casts

@tlively
Copy link
Member

tlively commented Jun 29, 2023

Is a wider CG discussion warranted, as this could be a potentially precedent-setting decision?

Maybe, but it seems extremely low priority and CG time is hard to come by these days.


We didn't have a critical mass of people to resolve this at the meeting yesterday, but I'd like to take the temperature here. We are only adding 31 prefixed opcodes, so even if we leave generous holes, we will not be close to overflowing to the second LEB byte, which would require 128 opcodes. Given that, do you:

🚀 : prefer leaving holes
❤️ : prefer not leaving holes
👀 : not care at all

@askeksa-google
Copy link

Whatever direction this goes in, I think it will have significant aesthetic value, and probably some practical value as well, if encodings that are obvious pairs (last 8 in the currently proposed ordering in particular) are aligned to only differ in the lsb.

@tlively
Copy link
Member

tlively commented Jul 18, 2023

As discussed at the last subgroup meeting, I've pushed a commit that simply removes all of the holes in the instruction opcode space but preserves the order of instructions.

Here's a sheet to help visualize the difference: https://docs.google.com/spreadsheets/d/1eaMdpL-MwLOwWgj-mWgv_tNq5crP_nemQ-PW54ss-Q8/edit?usp=sharing

Unfortunately, when you just close the holes like that, the obvious pairs of instructions all start on odd numbers, so they differ in their low two bits rather than just their lowest bit. I added a third column to the spreadsheet skipping a single opcode to "fix" that, but I don't think it's worth it.

| 0xfb43 | `ref.cast (ref null ht)` | `ht : heaptype` |
| 0xfb48 | `br_on_cast $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
| 0xfb49 | `br_on_cast_fail $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
| 0xfb02 | `struct.get $t i` | `$t : typeidx`, `i : fieldidx` | struct accessors (0x08+) |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The notes about group prefixes are no longer accurate.

| 0xfb1a | `ref.test (ref null ht)` | `ht : heaptype` |
| 0xfb1b | `ref.cast (ref ht)` | `ht : heaptype` |
| 0xfb1c | `ref.cast (ref null ht)` | `ht : heaptype` |
| 0xfb1d | `br_on_cast $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 0xfb1d | `br_on_cast $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
| 0xfb1d | `br_on_cast $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`,` $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |

| 0xfb1b | `ref.cast (ref ht)` | `ht : heaptype` |
| 0xfb1c | `ref.cast (ref null ht)` | `ht : heaptype` |
| 0xfb1d | `br_on_cast $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
| 0xfb1e | `br_on_cast_fail $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 0xfb1e | `br_on_cast_fail $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, $l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |
| 0xfb1e | `br_on_cast_fail $l (ref null1? ht1) (ref null2? ht2)` | `flags : u8`, `$l : labelidx`, `ht1 : heaptype`, `ht2 : heaptype` |

| -0x31 | `rec dt*` | `dt* : vec(subtype)` | |
| -0x32 | `sub final $t* st` | `$t* : vec(typeidx)`, `st : strtype` | shorthand |
| -0x31 | `sub final $t* st` | `$t* : vec(typeidx)`, `st : strtype` | shorthand |
| -0x34 | `rec dt*` | `dt* : vec(subtype)` | |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the interpreter wasn't changed accordingly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks. Fixed. Also changed the rec opcode to -0x32, not sure why I picked something else before.

@askeksa-google
Copy link

I do think it's worth skipping an opcode to align the pairs, especially since for ref.test and ref.cast, the lsb then becomes a null flag with the same meaning as the null flags in br_on_cast[_fail].

We still stay within the first 32 opcodes.

@rossberg
Copy link
Member Author

FWIW, if we care for bits that way, shouldn't we also make it so that any _u/_s pairs of instructions only differ by LSB?

@tlively
Copy link
Member

tlively commented Jul 22, 2023

Perhaps, although I would say that struct.get{,_u,_s} and array.get(,_u,_s} are triples, so it doesn't matter so much there, leaving only i31.get_u and i31.get_s. @askeksa-google, do you have an opinion here?

@rossberg
Copy link
Member Author

I notice that this is already not the case for some MVP instructions, so I think we can leave it as is.

@tlively
Copy link
Member

tlively commented Jul 25, 2023

Cool, let's put this into something of a "final comment period." If you are ok with the currently proposed encoding, please give this comment a 🚀 react. Otherwise, please give it an 👀 react and comment with a specific change you would like to see. After a week or so, if we have consensus, we can go ahead and merge this.

@tlively
Copy link
Member

tlively commented Jul 31, 2023

Thanks, everyone. I'll go ahead and merge this now since there have been no objections.

@tlively tlively merged commit 9acb6e2 into main Jul 31, 2023
5 checks passed
@tlively tlively deleted the opcodes branch July 31, 2023 15:59
tlively added a commit to WebAssembly/function-references that referenced this pull request Aug 6, 2023
Update the encodings for ref.as_non_null, br_on_null, (ref ht), and (ref null
ht) for consistency with the final encodings chosen in
WebAssembly/gc#372.
tlively added a commit to WebAssembly/function-references that referenced this pull request Aug 6, 2023
Update the encodings for ref.as_non_null, br_on_null, (ref ht), and (ref null
ht) for consistency with the final encodings chosen in
WebAssembly/gc#372.

Fixes #103.
tlively added a commit that referenced this pull request Aug 6, 2023
Fix various places in the spec text and interpreter where binary encodings were
inconsistent with those we decided on in #372.
@tlively tlively mentioned this pull request Aug 7, 2023
tlively added a commit that referenced this pull request Aug 28, 2023
Fix various places in the spec text and interpreter where binary encodings were
inconsistent with those we decided on in #372.
tlively added a commit that referenced this pull request Aug 28, 2023
tlively added a commit that referenced this pull request Aug 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary encoding of value type codes with heap type indices
6 participants