CollectivePermute support #8815

rpsilva-aws · 2025-03-11T17:51:24Z

🐛 Bug

We currently discourage the use of CollectivePermute in #2384. There is no context behind why this is the case, including whether the motivation was hardware specific. It seems that we have enabled it for All-to-All (#2472) - do we have the same guidance/information from the XLA team?

I can not find any relevant reference to 'all_to_all_emitter'.

cc: @miladm @ManfeiBai @JackCaoG

ManfeiBai · 2025-03-11T20:58:38Z

Hi, @rpsilva-aws, thanks,

IIUC, all_to_all_emitter is defined internally for all_to_all, which helped fixed all_to_all unstable issue;

for collective_permute, we would need to sync with XLA team, cc @ddunl

btw, what's the current failure did you met with collective_permute now? any context or reproduce material?

rpsilva-aws · 2025-03-11T21:33:10Z

Thanks @ManfeiBai. I have not encountered any failure yet - at least with Neuron's TRN1, but this is a generally concerning docstring/call-out when trying to productionize with this collective on XLA. We need to use this instead of P2P send/recv for other HW specific reasons.

IIUC, all_to_all_emitter is defined internally for all_to_all, which helped fixed all_to_all unstable issue;

This is what I understood as well from Jack's PR above, but it would be nice if we had a reference point that we can use to cross check with other collectives (particularly CollectivePermute). I'll wait on the XLA team's comment.

miladm · 2025-03-13T17:01:39Z

@yaochengji - can you please share the latest technical updates on the support of this op outside of SPMD path?

rpsilva-aws · 2025-03-24T20:57:45Z

Any updates on whether there are still known XLA limitations with this op? cc: @yaochengji @ddunl

yaochengji · 2025-03-27T16:56:34Z

Hi @rpsilva-aws , thanks for asking.

Currently I have a PoC script https://github.com/pytorch/xla/blob/chengji/cm/test/torch_distributed/cm_perf.py to demostrate the CM support.

And the current main blocker is that sometimes in real workload, pin_layout should be set to True. The CM optimization in XLA compiler cannot handle this at the moment.

rpsilva-aws · 2025-03-27T17:43:33Z

Thanks for the context @yaochengji . Is this (ENABLE_COLLECTIVE_MATMUL_IN_MP) TPU specific? The #2384 PR was not clear if this was the case, and it hinted at a more fundamental issue with XLA that is hardware agnostic. If it's TPU specific, I'll reword the title but wanted to confirm. We do have similar einsum optimizations for Neuron, but iirc, that is separate from functionalizing CollectivePermute - perhaps we could add more context behind the docstr here.

yaochengji · 2025-03-27T18:31:11Z

Yes, ENABLE_COLLECTIVE_MATMUL_IN_MP is TPU specific, it will turn on some optimization flag in the tpu compiler.

rpsilva-aws changed the title ~~CollectivePermute ambiguous support~~ CollectivePermute support Mar 11, 2025

ysiraichi added enhancement New feature or request distributed SPMD and other distributed things. labels Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CollectivePermute support #8815

CollectivePermute support #8815

rpsilva-aws commented Mar 11, 2025 •

edited

Loading

ManfeiBai commented Mar 11, 2025

rpsilva-aws commented Mar 11, 2025 •

edited

Loading

miladm commented Mar 13, 2025

rpsilva-aws commented Mar 24, 2025

yaochengji commented Mar 27, 2025

rpsilva-aws commented Mar 27, 2025 •

edited

Loading

yaochengji commented Mar 27, 2025

CollectivePermute support #8815

CollectivePermute support #8815

Comments

rpsilva-aws commented Mar 11, 2025 • edited Loading

🐛 Bug

ManfeiBai commented Mar 11, 2025

rpsilva-aws commented Mar 11, 2025 • edited Loading

miladm commented Mar 13, 2025

rpsilva-aws commented Mar 24, 2025

yaochengji commented Mar 27, 2025

rpsilva-aws commented Mar 27, 2025 • edited Loading

yaochengji commented Mar 27, 2025

rpsilva-aws commented Mar 11, 2025 •

edited

Loading

rpsilva-aws commented Mar 11, 2025 •

edited

Loading

rpsilva-aws commented Mar 27, 2025 •

edited

Loading