Proof-of-concept: parallelize argmin #9066

swolchok · 2025-03-08T00:10:51Z

I attempted to port at::parallel_reduce to ExecuTorch and use that
in reduce_util.h, but it turned out to be much trickier than expected.

(In brief: parallel reduction requires two steps: 1) split the input
range into chunks and reduce over them (easily done like
parallel_for), and then 2) combine the sub-results from chunks. The
reduction function accepted by reduce_over_dim is not well-suited to
step (2).)

Instead, I ported the parallelization strategy used by
binary_kernel_reduce_lastdim: just parallelize over the non-reduced
dimensions of the tensor. I don't understand why this strategy isn't
generally applicable and we aren't otherwise capable of parallelizing
reductions, so I haven't gated it to the case where we are reducing
over a contiguous last dimension.

I will send a follow-up that packages up this strategy nicely and uses
it in our reduction portable ops.

[ghstack-poisoned]

I attempted to port `at::parallel_reduce` to ExecuTorch and use that in reduce_util.h, but it turned out to be much trickier than expected. (In brief: parallel reduction requires two steps: 1) split the input range into chunks and reduce over them (easily done like parallel_for), and then 2) combine the sub-results from chunks. The reduction function accepted by reduce_over_dim is not well-suited to step (2).) Instead, I ported the parallelization strategy used by binary_kernel_reduce_lastdim: just parallelize over the *non*-reduced dimensions of the tensor. I don't understand why this strategy isn't generally applicable and we aren't otherwise capable of parallelizing reductions, so I haven't gated it to the case where we are reducing over a contiguous last dimension. I will send a follow-up that packages up this strategy nicely and uses it in our reduction portable ops. ghstack-source-id: e4efdab ghstack-comment-id: 2707761383 Pull Request resolved: pytorch/executorch#9066

swolchok added 30 commits March 4, 2025 11:35

Update

d0b11e8

[ghstack-poisoned]

Update

9437be1

[ghstack-poisoned]

Update

643e10e

[ghstack-poisoned]

Update

6f2842b

[ghstack-poisoned]

Update

e47dfeb

[ghstack-poisoned]

Update

231ebc3

[ghstack-poisoned]

Update

296513c

[ghstack-poisoned]

Update

845a01e

[ghstack-poisoned]

Update

a92958a

[ghstack-poisoned]

Update

3fa99d6

[ghstack-poisoned]

Update

a6c69a6

[ghstack-poisoned]

Update

3bd6437

[ghstack-poisoned]

Update

675f01b

[ghstack-poisoned]

Update

5f3a768

[ghstack-poisoned]

Update

9fdebee

[ghstack-poisoned]

Update

70a7096

[ghstack-poisoned]

Update

337dc23

[ghstack-poisoned]

Update

f388177

[ghstack-poisoned]

Update

2949daf

[ghstack-poisoned]

Update

7347915

[ghstack-poisoned]

Update

1a8481d

[ghstack-poisoned]

Update

e48e816

[ghstack-poisoned]

Update

3351d50

[ghstack-poisoned]

Update

0102e25

[ghstack-poisoned]

Update

956f8a5

[ghstack-poisoned]

Update

9f7f0c1

[ghstack-poisoned]

Update

a1aeae7

[ghstack-poisoned]

Update

c658163

[ghstack-poisoned]

Update

7e0ccd4

[ghstack-poisoned]

Update

d9cd27c

[ghstack-poisoned]

swolchok added 7 commits March 11, 2025 08:40

Update

a8dd330

[ghstack-poisoned]

Update

0088cd2

[ghstack-poisoned]

Update

6b296df

[ghstack-poisoned]

Update

854c967

[ghstack-poisoned]

Update

03f00ee

[ghstack-poisoned]

Update

87085af

[ghstack-poisoned]

Update

2ee8846

[ghstack-poisoned]

swolchok mentioned this pull request Mar 11, 2025

Extract parallel_for_each_reduce_over_dim_output_index from argmin parallelization PoC #9139

Merged

Update

e6be3fe

[ghstack-poisoned]

swolchok mentioned this pull request Mar 11, 2025

Use parallel_for_each_reduce_over_dim_output_index for {map_,}reduce_over_dim ops #9141

Merged

Update

c66f533

[ghstack-poisoned]

This was referenced Mar 11, 2025

use numel() when appropriate in get_reduced_dim_product #9142

Merged

Add parallel_for_each_reduce_over_dim_list_output_index #9143

Merged

Skip grain_size computation in parallel_for_each_reduce_* wrappers without threadpool #9144

Merged

JacobSzwejbka removed their request for review March 11, 2025 18:29

swolchok added 5 commits March 11, 2025 14:44

Update

5781018

[ghstack-poisoned]

Update

caac9df

[ghstack-poisoned]

Update

66387af

[ghstack-poisoned]

Update

051f69c

[ghstack-poisoned]

Update

d8f4b13

[ghstack-poisoned]

manuelcandales approved these changes Mar 12, 2025

View reviewed changes

swolchok added 2 commits March 12, 2025 12:19

Update

cfbd318

[ghstack-poisoned]

Update

de50f9b

[ghstack-poisoned]

swolchok mentioned this pull request Mar 12, 2025

Use parallel_for_each_reduce_over_dim_list_output_index for {Map,}ReduceOverDimListPlan ops #9197

Merged

swolchok added 2 commits March 12, 2025 13:33

Update

98e3147

[ghstack-poisoned]

Update

cb6aa4d

[ghstack-poisoned]

Base automatically changed from gh/swolchok/317/head to main March 12, 2025 23:16

swolchok merged commit 243c885 into main Mar 12, 2025
50 of 51 checks passed

swolchok deleted the gh/swolchok/329/head branch March 12, 2025 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proof-of-concept: parallelize argmin #9066

Proof-of-concept: parallelize argmin #9066

Uh oh!

swolchok commented Mar 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Proof-of-concept: parallelize argmin #9066

Proof-of-concept: parallelize argmin #9066

Uh oh!

Conversation

swolchok commented Mar 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants