Skip to content

Conversation

@swolchok
Copy link
Contributor

@swolchok swolchok commented Mar 8, 2025

I attempted to port at::parallel_reduce to ExecuTorch and use that
in reduce_util.h, but it turned out to be much trickier than expected.

(In brief: parallel reduction requires two steps: 1) split the input
range into chunks and reduce over them (easily done like
parallel_for), and then 2) combine the sub-results from chunks. The
reduction function accepted by reduce_over_dim is not well-suited to
step (2).)

Instead, I ported the parallelization strategy used by
binary_kernel_reduce_lastdim: just parallelize over the non-reduced
dimensions of the tensor. I don't understand why this strategy isn't
generally applicable and we aren't otherwise capable of parallelizing
reductions, so I haven't gated it to the case where we are reducing
over a contiguous last dimension.

I will send a follow-up that packages up this strategy nicely and uses
it in our reduction portable ops.

swolchok added 30 commits March 4, 2025 11:35
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Base automatically changed from gh/swolchok/317/head to main March 12, 2025 23:16
@swolchok swolchok merged commit 243c885 into main Mar 12, 2025
50 of 51 checks passed
@swolchok swolchok deleted the gh/swolchok/329/head branch March 12, 2025 23:17
kedarnath03 pushed a commit to kedarnath03/executorch that referenced this pull request Jun 25, 2025
I attempted to port `at::parallel_reduce` to ExecuTorch and use that
in reduce_util.h, but it turned out to be much trickier than expected.

(In brief: parallel reduction requires two steps: 1) split the input
range into chunks and reduce over them (easily done like
parallel_for), and then 2) combine the sub-results from chunks. The
reduction function accepted by reduce_over_dim is not well-suited to
step (2).)

Instead, I ported the parallelization strategy used by
binary_kernel_reduce_lastdim: just parallelize over the *non*-reduced
dimensions of the tensor. I don't understand why this strategy isn't
generally applicable and we aren't otherwise capable of parallelizing
reductions, so I haven't gated it to the case where we are reducing
over a contiguous last dimension.

I will send a follow-up that packages up this strategy nicely and uses
it in our reduction portable ops.

ghstack-source-id: e4efdab
ghstack-comment-id: 2707761383
Pull Request resolved: pytorch/executorch#9066
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants