Skip to content

Conversation

@hariharans29
Copy link
Member

@hariharans29 hariharans29 commented Feb 4, 2021

Description:

#6376 introduced an optimization to the Tile kernels to process inputs where the net tiling effect is just multiple copies of the input buffer.

For example:
input shape = [1, 1, 256 * 50]
repeats = [1, 200, 1]
output shape = [1, 200, 256 * 50]

This worked well when there was no batching involved and the optimization didn't kick-in when batching was introduced.
As a slight extension, handle batching in this optimization.

For example:
input shape = [5, 1, 256 * 50]
repeats = [1, 200, 1]
output shape = [5, 200, 256 * 50]

In this case, we would copy each of the 5 sub-tensors in the batch 200 times.

Improves the perf of a 1PP model by ~30% (95 percentile) when batch size is 5.

Motivation and Context
Performance

@hariharans29 hariharans29 requested a review from a team as a code owner February 4, 2021 05:00
@hariharans29 hariharans29 merged commit f14c621 into master Feb 5, 2021
@hariharans29 hariharans29 deleted the tilePerfEnhancementV2 branch February 5, 2021 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants