Replies: 4 comments
-
Thanks for sharing this @krzysz00! Just to understand the interface to/from MIOpen, is the expectation that we will give them back a suite of kernels and a host side runtime selection function ?
Should MIOpen own the selection logic ? |
Beta Was this translation helpful? Give feedback.
0 replies
-
@manupak Good question. I'd say it makes sense for us to own the selection logic, since we'll own the "generate a pile of kernels" logic. MIOpen will be calling our API though. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
MIOpen has two main categories of solvers. One is static solvers, where a new kernel is generated for each distinct convolution problem. The other is dynamic solvers, where the solver can pick from a suite of usually pre-generated kernels that are capable of handling multiple problem sizes.
MIOpen currently doesn't have many high-performance dynamic solvers, and they have asked if we would be willing to act as a dynamic solver and a static one. This is partly the result of the ease with which we have enabled data type and platform coverage compared to other solvers.
What this isn't
This isn't a plan to generate one, fully dynamic kernel. Because our code generation process relies on a lot of static analysis, we can't expect high performance from a single megekernel - and between the large binary size and the substantial rewrite of our codegen that would be needed to accomodate this, having a whole lot of in-kernel branches isn't an option.
What this is
When generating kernels, instead of specifying a fixed size for some parameters in all cases, allow specifying an object I'm calling
divisible(N)
, whereN
is a constant. So, in general, we'll be replacing a lot ofint64_t
withdynamicInt
, which is either anint64_t
or adivisible(N)
.The exact place we put the
divisible(N)
annotations for memrefs and such is a decision we can make as we go, but I do know that coordinate transforms will start having sizes ofdynamicInt
.The following convolution patterns can't, preliminarily, be dynamic
We'd use our divisible(N) hints during things like vectorization analysis and xdlops selection to get good performance.
We would, from time to time, loop over all reasonable dynamic configs (dilation, stride, padding, operation, and the divisibility values for each parameter). Each kernel would get a unique name that encodes the values of each parameter. MIOpen's dynamic MLIR solver would then assemble the best name for a given problem size and call that kernel.
I think we'll also need to change the semantics of
Pad{}
- or introduce a new operation - to specify a target length. We'll need this for (implicit) GEMM padding when dealing with large-size perf configs. So probably aPadTo{}
operator that runs up divisibility and imposes a bounds check unconditionally.We should be able to tune these kernels similarly to how we tuned static ones, though we might need to try a few problem sizes - or at least a representative one. The tuning is likely to be part of the kernel generation process and may not be something that MIOpen needs to handle.
Because of said tuning - or even if we don't have tuning - these kernels will have a defined
blockSize
but won't have a definedgridSize
. Instead, we'll have some math the client needs to do to work out the appropriate grid length for the kernel.We will want to include runtime asserts of some flavor (or at least aborting early if the asserted divisibilities are violated). If nothing else, they'll help LLVM do the right thing.
Why?
Gets us into a new niche that's only currently served by a bunch of assembly kernels and some very slow solvers, and it should work for gemm too.
We could also maybe generated dynamic fusion kernels if we can come up with a reasonable number of ops that we'd want to fuse in.
Why not?
Potentially substantial refactoring, but most of it should be mechanical.
The infrastructure will be a whole second layer of stuff and we're historically not the best at doing infrastructure tickets.
Beta Was this translation helpful? Give feedback.
All reactions