Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change witx unions to be tagged #220

Merged
merged 17 commits into from
Feb 18, 2020
Merged

Change witx unions to be tagged #220

merged 17 commits into from
Feb 18, 2020

Conversation

pchickey
Copy link
Contributor

@pchickey pchickey commented Feb 5, 2020

Prior to this PR, witx's union was exactly like a C union - described a set of variants that occupy the same location in memory, but gave no information about which variant was present/valid. All uses of union in the WASI spec were found in structs directly following an enum that tagged the union (indicated which variant was present/valid). Doc comments described the mapping of enum tags to union variants.

This implicit specification of tagged unions is hard for our growing suite of automated code-generation tools to deal with automatically. Many programming languages do not have a concept of an untagged union, and even in those that do, its typically safer and easier to use a tagged union concept. Additionally, the concept of an untagged union is not compatible with the interface types proposal, where we expect to have ADTs available (a generalized form of tagged unions) but not untagged unions.

This PR changes the way witx union is defined to require a tag (enum) type identifier to be specified as the first argument:

(typename $t (enum $a $b))
(typename $u (union $t (field $a u8) (field $b f32)))

Each member of the tag enum must be given a corresponding field in the union. A new empty field syntax (empty $c) is provided for tags that do not have data associated with them.

This change to the witx syntax and semantics allows us to replace all of the implicit tagged union definitions in the WASI witx specs to be explicit tagged unions. All of the uses in the WASI specs are completely ABI-compatible with the new form. In other words, the changes in this PR do not change the functional specification of WASI, they just change the way in which we specify it. Old implementations of WASI snapshot 0 or 1 should continue to work against new implementations that use the new syntax.

This PR makes updates to all of the witx specs describing WASI phases in this repository, so that the existing $event_u, $subscription_u, and $prestat_u unions are annotated with the enums that tag them. In event_u and subscription_u, I had to make a separate variant (with the same type) for fd_read and fd_write rather than use the same fd_readwrite variant for each. This is a semantic difference that code generators can present to the programmer (i.e. the C union generated by wasi-headers will be slightly different) but at the ABI level, all fd_read and fd_write variants are represented in-memory exactly as before.

The use sites of unions were simple to translate. The prestat_u type was able to be renamed to prestat, and the existing prestat struct erased, because it was just a pair of tag and union. For event and subscription, I was able to erase the tag member from the struct and leave just the union. The memory layout specified for tagged unions in src/layout.rs is equivelant to the tag, followed by the untagged union, as two struct members in sequence.

Because ABI compatibility for existing WASI snapshots is critical to accepting this PR, I'm going to stage PRs on all of the witx-consuming tools in the ecosystem presently (wasi-headers in https://github.com/cranestation/wasi-libc, https://github.com/bytecodealliance/wasi, wig in https://github.com/bytecodealliance/wasmtime- please ping me if I'm missing any) that update them to use these changes, and test all of the new implementations to make sure they are indeed ABI compatible with the existing WASI ecosystem.

I will add comments and links to this PR as I complete work on those tools, and wait to merge this PR until all feedback has been addressed and everything is ready to go.

@pchickey
Copy link
Contributor Author

pchickey commented Feb 8, 2020

Unfortunately I discovered that I had overlooked the way the struct layout of event would change with this scheme. I don't believe there is any way to rescue this approach without making all the downstream consumers special-case C layout rules (which are complicated enough without special cases!) just for the sake of this one struct working across witx versions.

Instead, I'm going to rework this to make tagged unions a new sort of datatype, use those in ephemeral & snapshot 2 onwards, and set us on a path to deprecate untagged unions in witx once snapshot 1 is no longer in use.

@pchickey pchickey closed this Feb 8, 2020
@pchickey pchickey reopened this Feb 12, 2020
@pchickey
Copy link
Contributor Author

I found a way to maintain ABI compatibility and move forward with this approach:

The event struct originally contained a union event_u with just one variant fd_readwrite. This means we can eliminate the event_u union, replacing its use with the type of the fd_readwrite variant, and leave a note to users to ignore this field for the clock variants.

We can upgrade to using a tagged union for event_u in ephemeral, because we expect to extend event types in the future. We don't have to worry about ABI stability until that becomes a snapshot, so its fine to change the layout with this PR.

prior to this PR, event_u was a union with only one variant
fd_readwrite. the change into tagged union ended up changing the
memory layout of the event struct. so, rather than use tagged
unions in the event struct, we just use a struct that you ignore
in the case of the clock event, which has the same ABI as previously.

in ephemeral, we use a tagged union for events, because we dont have
any ABI stability guarantees to maintain there.
this is much more convenient for consumers, and it reflects the
requirement of the syntax to provide an Id
@pchickey
Copy link
Contributor Author

I now believe this work is complete. The PR comment on WebAssembly/wasi-libc#165 has been updated with a comparison of the layout of each union-related type before and after this change. The static assertions indicate that clang agrees with the layout calculations made in this PR, and manual inspection indicates that the layout is the same as prior to this PR.

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@pchickey pchickey merged commit 85df508 into master Feb 18, 2020
@pchickey pchickey deleted the pch/tagged_unions branch February 18, 2020 20:31
@pchickey pchickey mentioned this pull request Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants