Vectorize Ramp in OpenGLCompute backend #6372
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, ramps are generated as a number of independent scalar
expressions that are finally gathered into a vector. For instance,
indexing in vectorized code is filled with ramps like the following:
This patch simplifies the generated code using a multiply add expression
on a vector containing an arithmetic sequence, such that the code is
as follows:
This is more performant due to vectorization, more compact, and more
readable because the base and the stride are easily identifiable.