Skip to content

Commit

Permalink
#13621: enable default fp32 acc for reduce (#15665)
Browse files Browse the repository at this point in the history
### Ticket
Link to Github Issue #13621

### Problem description
reduce sum is not very accurate because fp32 acc for reduce was not
enabled by default

### What's changed
enable fp32 acc for reduce by default

### Checklist
- [x] Post commit CI passes Between two runs, all jobs passed
https://github.com/tenstorrent/tt-metal/actions/runs/12147218201 and
https://github.com/tenstorrent/tt-metal/actions/runs/12160961521
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12187630112
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12187623632/job/33999246179
fails same as main except another random tt-smi reset not working. main:
https://github.com/tenstorrent/tt-metal/actions/runs/12189517366
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12187626769 passes
for WH, GS not affected and fails which it does on main
https://github.com/tenstorrent/tt-metal/actions/runs/12189542166
- [x] New/Existing tests provide coverage for changes
  • Loading branch information
bbradelTT authored Dec 6, 2024
1 parent dc6d684 commit a8a044a
Showing 1 changed file with 6 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -191,8 +191,12 @@ Tensor reduce(
auto is_multicore_hw = parallelization_strategy == ReduceOpParallelizationStrategy::MULTI_CORE_HW;
float pad_value = reduce_math == ReduceOpMath::MAX ? -std::numeric_limits<float>::infinity() : 0;

ttnn::DeviceComputeKernelConfig config = compute_kernel_config.value_or(
ttnn::init_device_compute_kernel_config(input_tensor.device()->arch(), std::nullopt, MathFidelity::HiFi4));
ttnn::DeviceComputeKernelConfig config = compute_kernel_config.value_or(ttnn::init_device_compute_kernel_config(
input_tensor.device()->arch(),
std::nullopt,
MathFidelity::HiFi4,
/*default_approx_mode=*/false,
/*default_fp32_acc=*/true));

std::vector<Tensor> output_tensors = {Tensor(operation::get_workers_for_op_output({input_tensor}))};
if (is_multicore_hw) {
Expand Down

0 comments on commit a8a044a

Please sign in to comment.