Spec tests that conflict with mature proposals #1788

tlively · 2024-08-21T00:01:50Z

There are several spec tests that test that a given module is malformed, where that same module would be considered well-formed under some mature-but-not-yet-standard proposal. For example, there is a test that modules with multiple memories are malformed, but this test is invalidated by the multi-memory proposal, and indeed the multi-memory proposal adds tests in the other direction, testing that modules with multiple memories are well-formed.

These contradictory tests are a source of complexity for engine and tool developers when they implement mature proposals like multi-memory because they either must change their decoding behavior based on enabled features or must selectively skip tests to resolve conflicts. WebAssembly/tool-conventions#233 discusses an instance of real-world toil caused by the former strategy, and of course the latter strategy is not ideal, either.

I propose that we reduce the complexity for implementers here by updating the spec tests to avoid conflicts with mature proposals. Note that we already have to update the spec tests this way once a proposal is finished and merged into the spec; I'm simply proposing that we remove the conflicts earlier.

Concretely, I propose that we add the following item to phase 3, where it is expected that engines and tools will implement the proposal and run into these conflicts:

During this phase, the following proceeds in parallel:
...

Upstream spec tests are updated to avoid conflicts with the feature

Does this seem like a good idea?

tlively · 2024-08-21T00:02:37Z

cc @dschuff @alexcrichton @fitzgen

alexcrichton · 2024-08-21T14:55:54Z

Thanks for writing this up! I might expand upon the proposed wording to say:

Upstream spec tests are updated so an implementation supporting this proposal passes all upstream spec tests with this proposal enabled.

Basically clarifying that when the proposal-in-question is enabled that the upstream spec tests still pass which would require changing binary formats/etc if necessary and encapsulates the goal of being able to run all the spec tests with a bleeding-edge implementation.

alexcrichton · 2024-08-21T16:20:44Z

I commented elsewhere that if we ignored enabled/disabled proposals in binary decoding there was only a few failures, but turns out I didn't investigate that enough and forgot some other tests, specifically tests like this which assert multi-memory is not implemented.

This makes me realize that fixing this is not going to be that easy because it also involves sending PRs to a lot of other proposals which have additionally copied this new test too. For example if an upstream spec test is updated then any proposal created before that fix will also need to be updated which can end up in a lot of PRs to a lot of places and can delay running spec tests.

Ideally I think this could establish a few guidelines both for proposals and engine/tooling authors, along the lines of:

For proposal authors all new tests are placed in a well-known directory in the forked repo. That way tooling like https://github.com/WebAssembly/testsuite can only pick up new tests across proposals.
If proposal authors change an upstream spec test in their fork they're recommended to upstream the change as well to modify the test. This part I think is somewhat tricky since the proposed wording here says that "mature proposals" should do this but engines/toolilng still want to run tests for new proposals too.
Engine/tooling authors are recommended to support proposal flags in their validators but not elsewhere (e.g. binary decoders). This can help create alignment between proposal authors and engine/tooling implementors that validation is the "choke point" for accepting or rejecting a new proposal.
Engine/tooling authors are recommended to ignore the error messages in (assert_malformed ...) and (assert_invalid ...) (this is counter-intuitive to me because it seems like it could paper over real issues but I don't think alternatives are better)

This is pretty similar to what's proposed here but a major change would be to segregate new tests into a specific directory rather than adding them to existing tests. Upon merging to the main specification they could get merged into the main tests, but it would be extremely helpful if test conflicts only required updating the main upstream spec rather than all active proposals that have forked the specification too.

tlively · 2024-08-21T16:32:48Z

How is this different from the current practice of putting new tests in test/core/current_proposal/new_test.wast? e.g. https://github.com/WebAssembly/gc/tree/main/test/core/gc.

I do think it would be good to avoid updating upstream spec tests to accommodate phase 1 proposals, but maybe it would be ok for phase 2 proposals. Either way, if an engine is implementing a very early phase proposal, I think there will still be some extra testing complexity no matter what.

alexcrichton · 2024-08-21T18:19:22Z

Oh that's perfect, but it's just not applied across all proposal repositories right now. That means that the testsuite repository isn't able to avoid duplication of upstream tests across all of the proposal repos.

fitzgen · 2024-08-21T18:49:07Z

How is this different from the current practice of putting new tests in test/core/current_proposal/new_test.wast? e.g. https://github.com/WebAssembly/gc/tree/main/test/core/gc.

I think we should make this a requirement rather than a recommendation, and be strict about it for every single little test.

The multi-memory proposal, for example, would not fix/update/remove the existing no-multi-memory test that Alex linked above. That test would be left as-is.

Instead, it would create new multi-memory tests in new files inside the proposal-specific directory (the way that most tests are done).

This allows engines/tools to run the tests and use just a test's filename/path to determine which features to enable or disable, which means that both the old no-multi-memory test can pass when multi-memory is disabled and the multi-memory tests can pass when it is enabled.

And then we would have the final step where, when we merge a proposal into the main spec, we remove the old spec tests that were asserting against things that are now always allowed, like the old no-multi-memory test.

fitzgen · 2024-08-21T18:51:23Z

We could implement this by making the testsuite aggregator script ignore all the tests outside of a proposal's test subdirectory — even if the proposal repo ignored the above requirement/recommendation and made changes to tests outside that subdirectory.

tlively · 2024-08-21T19:01:00Z

How is this different from the current practice of putting new tests in test/core/current_proposal/new_test.wast? e.g. https://github.com/WebAssembly/gc/tree/main/test/core/gc.

I think we should make this a requirement rather than a recommendation, and be strict about it for every single little test.

That sounds good to me.

This allows engines/tools to run the tests and use just a test's filename/path to determine which features to enable or disable, which means that both the old no-multi-memory test can pass when multi-memory is disabled and the multi-memory tests can pass when it is enabled.

I still think it's worth updating the upstream tests to avoid conflicts in phase 3, since otherwise this requires using feature flags in the decoder, which as we've seen, can be a source of inflexibility with real-world consequences.

fitzgen · 2024-08-21T19:02:36Z

I still think it's worth updating the upstream tests to avoid conflicts in phase 3, since otherwise this requires using feature flags in the decoder, which as we've seen, can be a source of inflexibility with real-world consequences.

This sounds good to me 👍

rossberg · 2024-08-28T11:55:41Z

@tlively:

I do think it would be good to avoid updating upstream spec tests to accommodate phase 1 proposals, but maybe it would be ok for phase 2 proposals.

That would be somewhat in conflict with the process. In theory, a feature is not considered standardised before phase 4, and may still be redesigned (or even rejected) in a way that obsoletes such a change. So it may be a bit problematic to upstream corresponding changes before that.

@fitzgen:

The multi-memory proposal, for example, would not fix/update/remove the existing no-multi-memory test that Alex linked above. That test would be left as-is.
Instead, it would create new multi-memory tests in new files inside the proposal-specific directory (the way that most tests are done).

How would this work? Even with an implementation that has a feature flag (which not all will have), how does it recognise the outdated tests when the flag is turned on and distinguishes them from unexpected failures?

A clean way for dealing with such obsolete negative(*) tests would be to somehow annotate such tests. For example, imagine test infras recognise an annotation (@obsolete-by <proposal-name>) in wast that indicates to ignore a test if the named proposal is meant to be implemented/active. Then (a) an implementation implementing proposal(s) early can identify tests to ignore and (b) it is sufficient to have this annotation in the proposal repo, with little risk of conflicts when combining multiple such repos. The test is physically removed only when the proposal is actually merged.

(*) I believe our backwards-compat requirements imply that this problem only exists for negative tests.

tlively · 2024-08-28T14:50:43Z

I would be happy with an obsoleted-by annotation.

fitzgen · 2024-08-28T17:16:51Z

How would this work? Even with an implementation that has a feature flag (which not all will have), how does it recognise the outdated tests when the flag is turned on and distinguishes them from unexpected failures?

The way that Wasmtime does this is based on the file path. If the test is under proposals/multi-memory then our test runner enables the multi-memory flag, for example. If a test is in the main test directory then it will not enable the multi-memory flag (until the multi-memory proposal merges into the main spec).

(@obsolete-by <proposal-name>)

Is the idea that a proposal reaches phase 3 and then, as described by @tlively in the OP, the champions would add this annotation to the upstream spec for any tests that get obsoleted by their proposal? Are they also responsible for adding it to all other proposals that have forked the spec tests? I think this annotation would solve the issue in the context of a single test file or a single copy of the test suite in either the main repo or a proposal repo, but doesn't address the reality that engines and toolchains are testing against what are effectively multiple copies of the test suite, at various states of lagging-behind-main, when they are supporting multiple proposals.

I think #1788 (comment) is still the best way to avoid O(proposals) work and have something that is automated with low effort.

rossberg · 2024-08-28T19:13:37Z

If a test is in the main test directory then it will not enable the multi-memory flag (until the multi-memory proposal merges into the main spec).

Are you saying that none of the pre-existing tests would run with the new feature turned on? That's very sub-ideal, isn't it? Many old tests might potentially be affected by the implementation of a new feature.

Is the idea that a proposal reaches phase 3 and then, as described by @tlively in the OP, the champions would add this annotation to the upstream spec for any tests that get obsoleted by their proposal?

No, no upstream changes at all. You only annotate (individual test assertions) as obsolete in the respective proposal repo. And since it's just an annotation, it should usually merge effortlessly when combining multiple proposal repos. This addresses exactly that scenario, but without losing all the existing test coverage for new features.

rossberg mentioned this issue Aug 29, 2024

Alignment test failure on the latest testsuite WebAssembly/multi-memory#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec tests that conflict with mature proposals #1788

Spec tests that conflict with mature proposals #1788

tlively commented Aug 21, 2024

tlively commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

tlively commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

fitzgen commented Aug 21, 2024

fitzgen commented Aug 21, 2024 •

edited

Loading

tlively commented Aug 21, 2024

fitzgen commented Aug 21, 2024

rossberg commented Aug 28, 2024

tlively commented Aug 28, 2024

fitzgen commented Aug 28, 2024

rossberg commented Aug 28, 2024

Spec tests that conflict with mature proposals #1788

Spec tests that conflict with mature proposals #1788

Comments

tlively commented Aug 21, 2024

tlively commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

tlively commented Aug 21, 2024

alexcrichton commented Aug 21, 2024

fitzgen commented Aug 21, 2024

fitzgen commented Aug 21, 2024 • edited Loading

tlively commented Aug 21, 2024

fitzgen commented Aug 21, 2024

rossberg commented Aug 28, 2024

tlively commented Aug 28, 2024

fitzgen commented Aug 28, 2024

rossberg commented Aug 28, 2024

fitzgen commented Aug 21, 2024 •

edited

Loading