Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Soft Freeze on new operation additions post Phase 3 #203

Closed
dtig opened this issue Mar 9, 2020 · 16 comments
Closed

Soft Freeze on new operation additions post Phase 3 #203

dtig opened this issue Mar 9, 2020 · 16 comments
Labels

Comments

@dtig
Copy link
Member

dtig commented Mar 9, 2020

As the SIMD proposal is now at Phase 3, and getting closer to standardization, I'd like to propose a soft freeze on operation additions so that implementations, and toolchains have the opportunity to implement and optimize the existing operations without too much churn.

The freeze does not include existing issues/PRs, and is only for new proposals. The rationale is that due to the nature of the SIMD proposal, it's possible to keep adding new operations and given that there is reasonable consensus (barring open issues) on the current proposal, it will be hard to progress in standardizing the proposal without some sort of a freeze.

This is a soft freeze as it is possible with more implementations getting up to speed, and more applications are using the current SIMD proposal that there are must-have operations, or semantics need to be tweaked. But the bar for these would be higher.

Some open questions -

  1. Is a deadline for a soft freeze useful (as opposed to going in effect immediately)? I suspect not, as the proposal has already been under discussion for some time now and as it's not a hard freeze deadlines may be arbitrary and not useful.
  2. What should the bar be? This seems hard to codify in very concrete terms, but my thoughts are that the bar here should be that this would be very hard to emulate without an addition to the Spec, and the alternatives if any will cause performance cliffs. Addition of operations would be discussed on a case-by-case basis.

I'll be pinning this issue as suggested in a previous meeting for visibility. Any thoughts/concerns about this approach?

@dtig dtig pinned this issue Mar 9, 2020
@dtig
Copy link
Member Author

dtig commented Mar 9, 2020

This would also make it easier for having a consistent set of opcode numbers as proposed in #129 going forward.

@jan-wassenberg
Copy link

@dtig makes sense. I see no pressing need for a deadline/extension. Hard/expensive to emulate without spec change sounds like a reasonable bar to me.

@zeux
Copy link
Contributor

zeux commented Mar 11, 2020

I would propose something like:

  • hard/expensive to emulate in absence of spec change
  • well supported on 2+ major archs
  • useful in 2+ important use cases

Obviously all of this is vague (what are important use cases? what's the threshold for emulation complexity? etc.) but I think the goal is to think very carefully about any new instructions and default to maintaining the current proposal since solidifying it is (arguably) more important than adding a new instruction. Agreed that the deadline doesn't seem necessary.

@tlively
Copy link
Member

tlively commented Mar 11, 2020

I like @zeux's idea for the threshold because it shares its axes of criteria with the current threshold and is simply stricter.

@dtig
Copy link
Member Author

dtig commented Mar 17, 2020

Thanks all, I also like @zeux's idea of introducing an explicit threshold, and we all seem to be in agreement that an explicit deadline isn't necessary. Are there other ideas or concerns about this? If not, my next steps will be to add some explicit wording about this into the README, or somewhere that's easy to find.

@arunetm
Copy link
Collaborator

arunetm commented Mar 17, 2020

Thanks @dtig for putting this together and thanks @zeux for suggesting the explicit thresholds, a general agreement on these will help to ensure consistent evaluation of the current outstanding issues/proposals. The criteria and thresholds are reasonable imo, any disagreements ?

Agree that no further explicit deadline is necessary as we were having this issue pinned and open for feedback. @abrown brought up a potential need for considering new instructions in rare cases, when current instruction semantics cannot be calibrated effectively for addressing the issues. Do we need a provision for these? @abrown anything to add?

@penzn
Copy link
Contributor

penzn commented Oct 10, 2020

I think there is a common consensus that we need to stick to some variant of the criteria @zeux proposed above (#203 (comment)):

  • hard/expensive to emulate in absence of spec change
  • well supported on 2+ major archs
  • useful in 2+ important use cases

However I think we need a bit more clarity on those, especially on what we mean by important use cases. I think it is very easy to find examples of native instructions or matching intrinsics for most PRs, (not harder than demonstrating that proposed instructions would perform well in micro benchmarks). I would like to propose the following as minimum criteria for a use case:

  • code (snippet from a software project)
  • the project should compile to Wasm or be in the process of porting
  • addition of proposed instruction(s) estimated to have measurable positive impact on performance

First bullet might sound obvious, but we do have PRs that either cite no use cases or refer to publications. With our current load of PRs approaching one per day, we need to think how realistic it would be to evaluate on actual code. I don't think that references to matching native intrinsics should be considered sufficient, unless that code is in the process of being ported to Wasm.

It is important that the project has a Wasm port, even experimental one (even one done by the person proposing the PR) and everybody can build that. I think we should actually go a step further and track the application in the proposal. This is nothing out of the ordinary, performance work is done like this in the "native compilation" world.

@omnisip
Copy link

omnisip commented Oct 10, 2020

@penzn

If I understand your suggestion correctly, it would be advantageous to support every intrinsic in the Intel library that proves to be cost-effective to implement and doesn't have an effective scalar fallback. Intel not ARM because porting with emscripten only supports or emulates Intel intrinsics today.

@Maratyszcza
Copy link
Contributor

Maratyszcza commented Oct 10, 2020

@omnisip According to Emscripten documentation, ARM NEON intrinsics are supported too.

@penzn
Copy link
Contributor

penzn commented Oct 10, 2020

Intel not ARM because porting with emscripten only supports or emulates Intel intrinsics today.

@omnisip please maintain professional tone.

CC @dtig @arunetm

@omnisip
Copy link

omnisip commented Oct 10, 2020 via email

@arunetm
Copy link
Collaborator

arunetm commented Oct 10, 2020

There might have been some miscommunication here, inferring tone in comments can get tricky. I hope we can move past that and focus on the criteria of inclusion, which is a very timely topic.

Thanks @penzn for adding more clarity to the criteria we had, these are very helpful. It will be great to see if we have have general consensus around these clarifications as well. Performance figures are necessary to be captured along with the new proposals as we are trying to stabilize and move the MVP forward.

If I understand your suggestion correctly, it would be advantageous to support every intrinsic in the Intel library that proves to be cost-effective to implement and doesn't have an effective scalar fallback.

@omnisip SIMD instruction set is vast and its a non goal for the MVP to support all of them. On a high level, for the minimum viable product, its ideal to include operations with good architecture support that are necessary for real world Wasm simd use cases (this is further clarified through the criteria above by @zeux, @penzn ).

It's great to see the new ops proposals and associated use cases! Once standardized and in use, we cannot modify or drop instructions if they prove to be hidden perf cliffs for developers/tools. So i am in favor of being conservative in instruction inclusion for MVP. As instruction sets and use cases will keep evolving even after this MVP, a process to propose and standardize instructions after this MVP is already in the works. So, operations that has non-trivial implementation costs or use cases can be vetted as needed without holding the MVP back. We already have a sizable set of operations in this post-mvp bucket. If we can get to a general consensus on the criteria, we can use it as the guideline to triage the newer proposals. Great topic for the next SIMD sync. :)

(few edits for clarity)

@lars-t-hansen
Copy link
Contributor

I like the expanded criteria put forward by @penzn, but I think we should be even tougher:

  • The proposer of a new instruction commits to doing timely benchmarking work to validate the effectiveness of the instruction in a real application or set of applications (which should be identified in the proposal), cross-platform (or at least a not-just-Intel subset we can agree on), when provided with a prototype implementation in a VM and compiler

This hasn't been a problem so far (thanks to @Maratyszcza in large part), quite the contrary, so let's make it explicit.

That puts the burden on the implementers to implement... which means the V8 team for the VM and the tools both. I don't know how that's working out. To share the burden, we could take on the VM work in SpiderMonkey, even if that has a number of disadvantages - a new VM brings new sources of noise to the benchmarking; no 32-bit ARM support; emerging + immature 64-bit ARM support; still working on the finer points of optimization on Intel.

Re making progress and iterating to a fixed point, would it be meaningful to adopt @omnisip's suggestion (from a different thread) to have a regular cadence of "releases", but for determining the MVP instruction set? Every month or so, look at the proposed instructions, and discuss whether there is a set that could be made available for the next round of experimentation and possible inclusion?

@tlively
Copy link
Member

tlively commented Oct 15, 2020

That puts the burden on the implementers to implement... which means the V8 team for the VM and the tools both. I don't know how that's working out.

It's working out fairly well so far. @ngzhian and I have been working to catch up with all the new instructions. We should have plenty of new performance results not tomorrow, but for the next meeting. The recent hold up is mostly on my end with the tools.

I think that we should continue evaluating instructions on a rolling basis at our syncs as they are implemented and benchmarked. Formalizing that doesn't seem very useful.

@penzn
Copy link
Contributor

penzn commented Oct 17, 2020

The point I am trying to argue is that we probably should prioritize PRs with clear path to Wasm compilation and, avoid PRs where we have to produce the kernels ourselves. However I don't think we should consider anything without a clear Wasm use case completely invalid, just defer it to "post MVP" or at least to the bottom of the review queue.

Also see #379 (comment)

@dtig
Copy link
Member Author

dtig commented Jan 8, 2021

Closing this issue as #389 supersedes this.

@dtig dtig closed this as completed Jan 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants