Tile perf enhancements - continued #6561

hariharans29 · 2021-02-04T05:00:47Z

Description:

#6376 introduced an optimization to the Tile kernels to process inputs where the net tiling effect is just multiple copies of the input buffer.

For example:
input shape = [1, 1, 256 * 50]
repeats = [1, 200, 1]
output shape = [1, 200, 256 * 50]

This worked well when there was no batching involved and the optimization didn't kick-in when batching was introduced.
As a slight extension, handle batching in this optimization.

For example:
input shape = [5, 1, 256 * 50]
repeats = [1, 200, 1]
output shape = [5, 200, 256 * 50]

In this case, we would copy each of the 5 sub-tensors in the batch 200 times.

Improves the perf of a 1PP model by ~30% (95 percentile) when batch size is 5.

Motivation and Context
Performance

hariharans29 added 2 commits February 3, 2021 05:49

Initial commit

8a8073b

a

e320830

hariharans29 requested a review from a team as a code owner February 4, 2021 05:00

b

e0cccbd

hariharans29 requested a review from hanbitmyths February 4, 2021 07:55

hanbitmyths approved these changes Feb 4, 2021

View reviewed changes

hariharans29 merged commit f14c621 into master Feb 5, 2021

hariharans29 deleted the tilePerfEnhancementV2 branch February 5, 2021 04:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tile perf enhancements - continued #6561

Tile perf enhancements - continued #6561

Uh oh!

hariharans29 commented Feb 4, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tile perf enhancements - continued #6561

Tile perf enhancements - continued #6561

Uh oh!

Conversation

hariharans29 commented Feb 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hariharans29 commented Feb 4, 2021 •

edited

Loading