Enable multi-device computation in runtime #1716

wooseokTT · 2025-01-06T21:43:07Z

Enable multi-device computation in runtime

Allow ttnn runtime operations including reduce_scatter, mesh_shard, and all_gather
Force mesh_shard ops to use system memory because they are host-side operations
Use strongly-typed sharding attributes of mesh_shard ops
Add Silicon multi-device test cases
Fix bug in determining axis of all_reduce when converting from stableHLO to ttir
Fix typo in ttnn workaround pass

sdjordjevicTT

Changes look good from a dialect perspective. Just a few minor comments inline.

sdjordjevicTT · 2025-01-09T15:32:35Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

@@ -275,7 +275,8 @@ class TTNNLayoutDPSOperandsRewriter
  LogicalResult matchAndRewrite(DestinationStyleOpInterface op,
                                PatternRewriter &rewriter) const final {
    // To layout op is a special case, we don't want to rewrite it
-    if (mlir::isa<ttir::ToLayoutOp>(op.getOperation())) {
+    if (mlir::isa<ttir::ToLayoutOp>(op.getOperation()) ||
+        mlir::isa<ttir::MeshShardOp>(op.getOperation())) {


Any comment on why MeshShardOp is a special one in this regard?

TTNN mesh shard APIs are currently CPU only operations.. So, by enforcing tensors to be located in system memory, we can ensure (1) a tensor can be sharded into multi-device storage in cpu side, and (2) later tiled and transferred to device to individual devices.

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

nsmithtt

This looks great! Thanks

runtime/lib/ttnn/operations/ccl/mesh_shard.cpp

runtime/lib/ttnn/operations/ccl/reduce_scatter.cpp

runtime/lib/ttnn/operations/ccl/reduce_scatter.h

runtime/lib/ttnn/operations/ccl/mesh_shard.h

* Allow ttnn runtime operations including reduce_scatter, mesh_shard, and all_gather * Force mesh_shard ops to use system memory because they are host-side operations * Use strongly-typed sharding options for mesh_shard ops * Add Silicon multi-device test cases * Fix bug in determining axis of all_reduce when converting from stableHLO * Fix typo in ttnn workaround pass

wooseokTT requested review from jnie-TT, kmabeeTT, AleksKnezevic, pilkicTT, sdjordjevicTT, svuckovicTT, mtopalovicTT, nobradovictt, jserbedzijaTT, azecevicTT, nsmithtt and tapspatel as code owners January 6, 2025 21:43

wooseokTT force-pushed the wooseok/enable_multidevice_runtime branch from 90ce1bf to 2a566ed Compare January 6, 2025 21:46

wooseokTT linked an issue Jan 7, 2025 that may be closed by this pull request

Push Jax test through #924

Closed

sdjordjevicTT approved these changes Jan 9, 2025

View reviewed changes

tapspatel approved these changes Jan 9, 2025

View reviewed changes

wooseokTT force-pushed the wooseok/enable_multidevice_runtime branch 2 times, most recently from 2fd39a1 to 992997d Compare January 10, 2025 14:34

nsmithtt approved these changes Jan 10, 2025

View reviewed changes

wooseokTT force-pushed the wooseok/enable_multidevice_runtime branch 2 times, most recently from efddf68 to f5bec77 Compare January 13, 2025 18:57

jnie-TT reviewed Jan 14, 2025

View reviewed changes

runtime/lib/ttnn/operations/ccl/mesh_shard.cpp Outdated Show resolved Hide resolved

jnie-TT reviewed Jan 14, 2025

View reviewed changes

runtime/lib/ttnn/operations/ccl/reduce_scatter.cpp Outdated Show resolved Hide resolved

jnie-TT reviewed Jan 14, 2025

View reviewed changes

runtime/lib/ttnn/operations/ccl/reduce_scatter.h Outdated Show resolved Hide resolved

jnie-TT reviewed Jan 14, 2025

View reviewed changes

runtime/lib/ttnn/operations/ccl/mesh_shard.h Outdated Show resolved Hide resolved

jnie-TT approved these changes Jan 14, 2025

View reviewed changes

wooseokTT force-pushed the wooseok/enable_multidevice_runtime branch from f5bec77 to 3e57fd9 Compare January 14, 2025 19:35

wooseokTT enabled auto-merge (squash) January 14, 2025 19:37

wooseokTT merged commit d1a5e78 into main Jan 14, 2025
20 checks passed

wooseokTT deleted the wooseok/enable_multidevice_runtime branch January 14, 2025 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable multi-device computation in runtime #1716

Enable multi-device computation in runtime #1716

wooseokTT commented Jan 6, 2025

sdjordjevicTT left a comment

sdjordjevicTT Jan 9, 2025

wooseokTT Jan 10, 2025

nsmithtt left a comment

Enable multi-device computation in runtime #1716

Enable multi-device computation in runtime #1716

Conversation

wooseokTT commented Jan 6, 2025

sdjordjevicTT left a comment

Choose a reason for hiding this comment

sdjordjevicTT Jan 9, 2025

Choose a reason for hiding this comment

wooseokTT Jan 10, 2025

Choose a reason for hiding this comment

nsmithtt left a comment

Choose a reason for hiding this comment