Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD #41

Closed
jfbastien opened this issue May 6, 2015 · 12 comments
Closed

SIMD #41

jfbastien opened this issue May 6, 2015 · 12 comments

Comments

@jfbastien
Copy link
Member

We current suggest that we'll support SIMD.js (RFC).

The C++ standard committee is currently discuss adding explicit SIMD support as well as auto-vectorization hints to the language, and vector execution policies to executors. C++ isn't the only language that we want wasm to support, but we should make sure that what we implement can support C++! There may need to be some reconciliation between SIMD.js and C++ for the sake of wasm and not JavaScript.

Here are the recent relevant papers (older ones may also be relevant):

@sunfishcode
Copy link
Member

Initially, many of those things will be able to compile/etc. down to SIMD.js 128-bit SIMD types and operations. This will work pretty well for the short term future.

Beyond that, there's obviously a desire to support longer vectors, and fancy things like predication. We might think of SIMD.js as a "short vector" API that doesn't rule out addition of a distinct and complementary "long vector" API. However, it's not yet clear what a "long vector" API should look like.

The alternative to a proper "long vector" API most likely will be explicit 256-bit and 512-bit types and operations, and platform feature tests to put all the burden of using everything properly on applications. If no other compelling "long vector" API emerges, this will probably be what we end up with, and will be what all of the C++ SIMD constructs mentioned above will compile to.

Anyone want to propose a "long vector" API?

@jfbastien
Copy link
Member Author

I don't think all the C++ constructs will map to SIMD.js :-)

I think they'll be polyfill-able though. I'm not 100% sure, so that's why we need to keep track. I happen to currently be in the room with the authors of these papers, so it shouldn't be too hard!

@sunfishcode
Copy link
Member

What constructs won't? Keeping in mind that we're talking about compilers doing the mapping, so there will be significant lowering in many cases.

@jfbastien
Copy link
Member Author

The current fixed-width SIMD proposal allows impls to define algos+datastructures for different sized architectures, and a generic vector size which will map onto one of these at compile time using a simple typedef so ABIs work. We'll lose this capability and just expose the lowest common denominator, which is OK.

The proposal for fixed-width SIMD will keep to simple arithmetic for now (OK), but shuffle and scatter/gather are possibilities that may or may not map well (though I think they'll polyfill).

The auto-vectorization stuff has a wavefront model that it exposes. That won't work for us right now unless we just go scalar, or we add some primitives. Intel will come with a new proposal at the next meeting, so it's still far from standard.

There's also the parallelism TS which supports vectorization with STL-like algorithms. This relies heavily on individual runtimes, and can also just be scalar for us.

@kg
Copy link
Contributor

kg commented May 7, 2015

If we're dead-set on having macros, maybe the right way to polyfill SIMD is with macros that expand into scalar operations?

@jfbastien
Copy link
Member Author

That may solve only a subset of the issue: e.g. SIMD doesn't just affect the load/store and operation width it may also affect the entire algorithm and datastructures used. This will be pretty tricky to expose portably while retaining performance.

I don't think the main issue is in the encoding, I think it's between exposing many widths and what we do (hard fail, gracefully split with bad perf). There's also the question of how much ISA-specificity we expose, e.g. SIMD.js exposes a function that gives the signbit from a 4xfloat vector, and I think that's just silly.

Then there's the question of which model to expose. C++ is pursuing 2 or 3 of them depending on how you look at it, and they're all complimentary.

SIMD is a performance feature. If it doesn't perform, what's the point? :-)

@kg
Copy link
Contributor

kg commented May 7, 2015

This also raises an issue that's been troubling me, which is how we do feature detection/fallback reliably and efficiently.

For example, here's a strawman: let's say this wasm executable more or less depends on the existence of 4-wide SIMD operations that currently are only available on x64. And let's say that only chrome canary implements that feature set right now, so it needs to fallback everywhere else.

What steps do we follow to make sure that executable loads in a single pass on all supported runtimes? I.e.

On a browser without wasm support, the polyfill kicks in and loads it immediately. Good.
On a browser with modern wasm support, on a platform with the necessary SIMD support, it loads natively and compiles down to native code. Good.
On a browser with a slightly dated wasm implementation that doesn't understand the SIMD operations... we have a table somewhere that lists the ops it uses, the implementation sees unrecognized SIMD, and it punts to the polyfill... somehow? How do we make sure the polyfill isn't having to load the whole blob twice, and how do we make sure it's invoked efficiently?
Worse: On a browser with a modern wasm implementation but on an architecture that can't express all those operations natively... what happens? Do we punt to the polyfill as if we don't understand the opcodes? Do we expand the operations into slower emulation routines?

On a related note: If our fallback is to punt to a polyfill, do we do that on a whole-executable level? Or do we do it on a per-function level, and have mixed modules where arbitrary function calls are crossing the FFI and others aren't?

(This whole topic probably belongs in its own issue, but SIMD is the one place where I feel like we're most likely to hit the above issues)

@sunfishcode
Copy link
Member

@jf: I guess what you call "polyfill" here is the same as what I'm calling "lowered by compilers/etc." :-). And as far as I'm aware, falling back to scalar is only needed when operations are missing, not due to fundamental programming model differences.

@jf: The function which exposes the signbit of a 4xfloat vector has recently been removed from SIMD.js. Is there anything else you think that's silly? ;-)

@jf and @kg: You may be interested in The SIMD.js Extended API Proposal which has a decent amount of consensus as the way forward for adding new operations (though not types or programming models) to SIMD.js after the initial release. This isn't a "long vector" API proposal, of course.

@lukewagner
Copy link
Member

I think there are two levels of polyfilling here:

  1. There is an engine straight-up not knowing about the feature. This case is described in BinaryEncoding.md#backwards-compatibility including the (admittedly incomplete) idea of how one could specify a polyfill at this level.
  2. There is the engine knowing about the op but not having hardware support. This case only applies to a subset of opcodes (which we can enumerate in the spec); mostly SIMD and perhaps some future general math ops. For this, the SIMD.js group has talked about having an isFast query for any SIMD op that lets you ask this question. With that, one should be able to either inline branch (on a constant-expression, so foldable/DCEable) or use the query to choose between different modules to load; whichever is the better fit.

The main question is in 2 what happens if you try to run a SIMD op that is known to the browser but not hardware optimized. The two obvious options are "throw" or "the engine implements the op as best it can". The latter seems better to me for the case where developers forget to test on a rare configuration that lacks common hardware support and so accidentally assume it. In this case, a user will still be able to run the app on the uncommon hardware (perhaps not even unduly impacted, SIMD is important, but often not dominating the entire computation) which seems like what everyone would want in this case of oversight. Devtools can make it easy for developers to find and fix this issues.

sunfishcode referenced this issue May 12, 2015
This is meant to address the original concern in
WebAssembly/spec#41
@sunfishcode
Copy link
Member

I created WebAssembly/spec@04daaa3 to attempt to address the original concern here.

@sunfishcode
Copy link
Member

Which is now WebAssembly/spec#57

sunfishcode referenced this issue May 14, 2015
This is meant to address the original concern in
WebAssembly/spec#41
@sunfishcode
Copy link
Member

Which is now merged. If anyone has any concerns not addressed in #57, feel free to file a new issue. Of course, if anyone wants to propose a new SIMD API, feel free to file a new issue for that too :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants