You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2)
For the Neoverse V2, we would like to prefer fixed width over scalable
vectorisation if the cost-model assigns an equal cost for certain loops. This
improves 7 kernels from TSVC-2 by about 2x, and does not affect SPEC21017 INT
and FP. This also adds a new TTI new hook that can steer the loop vectoriser
to preferring fixed width vectorization, which can be set per CPU. For now,
this is only enabled for the Neoverse V2.
This tends to benefit small kernels, like the ones in TSVC, for a
number of reasons: processing the predicates does not come entirely
for free, NEON tends to generate slightly less code which can have a
big impact on these small kernels, and then there are second order
effects that SVE codegen is slightly less optimal in some areas.
This codegen strategy to generate more NEON is inline with GCC's codegen
strategy, which is actually even more aggressive in generating NEON when
no predication is required. We could be smarter and more aggressive too
about generating more NEON (and improve performance), but this seems to
be a first good and straight forward step.
0 commit comments