You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Relay] Introduce arguments limit to FuseOps pass (#15137)
* [Relay] Introduce arguments limit to FuseOps pass
In PR #8313 a parameter `max_function_args` was introduced. It leads to
limit number of function argument and in case when this value is
exceeded then concatenation layer is split to a several concat
operations.
I faced a problem on Adreno GPU that for kernel with big number of
arguments the enqueueNDRange was crashed without any errors. The
problem appeared because of the huge number of arguments. But in this
case not only concat layer was a root cause of the problem. Also after
fusing several operations the final functions had a big number of
arguments.
As it was discussed in #8313, adding a limitation on the number of
function arguments to the FuseOps pass might be a good improvement. In
this PR I introduced such mechanism for limitation number of function
arguments for FuseOps pass and add an arguments limit to OpenCL devices
at 128 parameters.
The idea of current approach is calculate the number of arguments for
each node in fusing algorithm and in case then the number of function
arguments exceeds the limit, specified by `max_function_args`, then the
fusing should be stopped. In case when node has several inputs and for
some of the inputs the number of arguments wasn't computed, then we
postpone fusing for this node and will try fuse this node later when
the number of arguments will be computed for all inputs. This approach
with postponed fusing helps to avoid additional computations during
compilation.
Additionally, case of dynamic shapes should be handled. In case of
dynamic shape, function arguments also included sizes of dynamic
dimension and strides. The number of strides can be computed by
calculating number of tensor dimensions (the number of strides equals
to the rank of the tensor). The number of additional parameters with
sizes of dynamic dimensions can be calculated by computing number of
dynamic dimensions.
* Fix memory_scope order in test
* Apply code review comments
* Apply comments
0 commit comments