Enable benchmarking dispatches with dynamic shapes #19518

kuhar · 2024-12-18T19:12:09Z

Request description

For dispatches with static shapes, --iree-hal-dump-exectuable-benchmark-to generates MLIR files that can be compiled and benchmarked in isolation. We should enable something similar for dispatches with dynamic shape, to be used in both manual perf work and with the tuner.

What component(s) does this issue relate to?

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

kuhar · 2024-12-18T19:13:10Z

Conversation log with @benvanik:

Ben Vanik — Today at 1:46 PM
there's a few options - I think the issue is that the current executable benchmark pass needs to die and be reworked - if rebuilt, I'd do it totally different
in nearly all programs without data dependent shapes (which no real models we have feature) we can just tree shake to leave the shape math
I do that on the tflite bindings to create their shape calculation function already, works well there

Jakub Kuderski — Today at 1:47 PM
E.g. get those from something like util.assume.int?

Ben Vanik — Today at 1:47 PM
that could be useful, but just from the program inputs

Jakub Kuderski — Today at 1:48 PM
ooh, I see what you mean

Ben Vanik — Today at 1:48 PM
if you pass tensor<42x1024xf32> as an input and the 42 is used to calculate all the shapes in the program, we can just leave that math
it's a different flow closer to "how fast does this run for the original problem size" than "how fast does this particular executable run for arbitrary sizes"
so may need others as well, but since most of our perf analysis starts with "here's my model and its input sizes" and then we slice out the executables to microbenchmark it'd probably make it easier
but yeah, generating spreads from the assume ops would be cool too - the pass could add those calculations for the benchmarking tool to use - my eventual goal was to have benchmarks driven by a custom module so we could have the compiler spit them out - but to start stripping everything but the dependent shape calculation math is easiest :P
the tflite WrapEntryPoints createShapeCalculationFunc just calls the original function, adds tensor.dim ops for the results, ignores the result values, and lets DCE/folding/etc strip everything - pretty simple but it works :P
similarly we can have the compiler create per-benchmark-function query functions that return a list of shapes to try, and the benchmark tool can call that to setup the benchmark parameters

Ben Vanik — Today at 1:57 PM
as for instrumentation from running user programs we do something similar and just need a new set of trace point ops - the instrumentation pass creates a global that it accumulates information into and then the tools call the magic __query_instruments function that the pass inserts to get that information - the only instruments we have now are for the HAL but it's meant to be able to get anything we want back (it's a set of binary blobs that we determine the format of)
each instrument blob has some metadata produced by the compiler that gets embedded (e.g. iree/schemas/instruments/dispatch_def.fbs that describes each dispatch site) and then the the compiler inserts code to produce the binary blob (e.g. iree/schemas/instruments/dispatch.h)
lots of fun things we can do with that :)
(it was all built to enable PGO - which capturing dispatch shapes effectively is - so it'd be neat to finally connect it all)

kuhar added enhancement ➕ New feature or request performance ⚡ Performance/optimization related work across the compiler and runtime tuner labels Dec 18, 2024

kuhar self-assigned this Dec 18, 2024

kuhar mentioned this issue Dec 18, 2024

Add Tuning Support (Umbrella Issue) #16952

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable benchmarking dispatches with dynamic shapes #19518

Enable benchmarking dispatches with dynamic shapes #19518

kuhar commented Dec 18, 2024

kuhar commented Dec 18, 2024

Enable benchmarking dispatches with dynamic shapes #19518

Enable benchmarking dispatches with dynamic shapes #19518

Comments

kuhar commented Dec 18, 2024

Request description

What component(s) does this issue relate to?

Additional context

kuhar commented Dec 18, 2024