-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use tilized dram-interleaved as default input-output layout #1744
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -224,6 +224,22 @@ class TTNNOptimizer : public impl::TTNNOptimizerBase<TTNNOptimizer> { | |
// If schedule is set, apply order of operations to func. | ||
// | ||
if (opSchedule[func].size() > 1) { | ||
// TODO (#0000): This is a temporary solution - when defaulting to dram | ||
// tile input/output layout, GetDeviceOp can randomly appear as the last | ||
// op in the graph instead of the first. This workaround ensures | ||
// getDeviceOp is always in the beginning of the schedule. | ||
// To reproduce, remove this workaround and run | ||
// Silicon/TTNN/optimizer/mnist_sharding.mlir multiple times (as it is | ||
// non-deterministic). | ||
Operation **it = | ||
std::find_if(opSchedule[func].begin(), opSchedule[func].end(), | ||
[](Operation *op) { return isa<GetDeviceOp>(op); }); | ||
if (it != opSchedule[func].end()) { | ||
GetDeviceOp deviceOp = mlir::cast<GetDeviceOp>(*it); | ||
opSchedule[func].erase(it); | ||
opSchedule[func].insert(opSchedule[func].begin(), deviceOp); | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @odjuricicTT, can you review this file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jnie-TT The proper fix for this would be to add it here: Try changing the if to check for GetDeviceOp as well as ToLayoutOp. |
||
for (size_t i = 0; i < opSchedule[func].size() - 1; i++) { | ||
Operation *op = opSchedule[func][i]; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fbajraktariTT, can you review this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @odjuricicTT, as @fbajraktariTT completed internship recently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnie-TT I'm not sure that this extra logic is needed. Was a test failing without this temp fix? If so, can you provide more details?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@odjuricicTT there's an assert below that checks if the
currentL1Usage
is 0. This error only surfaces with my changes - it's fine without my changes because we always untilize (to_layout) before returning. However it's possible now with my changes that we will return directly without any intermediate ops between the current op and the return op, and this causes issues because we wouldn't have zeroed outcurrentL1Usage
.Since this function doesn't decrement l1 usage on return op, the assert will fire and say that the l1 usage is non 0. My change basically adds a check that if the consumer op is a return op, we decrement the l1 usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnie-TT Thanks! Your solution is fine for now. Just please file the followup issue and reference it in the comment.