#13621: enable default fp32 acc for reduce (#15665)

### Ticket Link to Github Issue #13621 ### Problem description reduce sum is not very accurate because fp32 acc for reduce was not enabled by default ### What's changed enable fp32 acc for reduce by default ### Checklist - [x] Post commit CI passes Between two runs, all jobs passed https://github.com/tenstorrent/tt-metal/actions/runs/12147218201 and https://github.com/tenstorrent/tt-metal/actions/runs/12160961521 - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12187630112 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12187623632/job/33999246179 fails same as main except another random tt-smi reset not working. main: https://github.com/tenstorrent/tt-metal/actions/runs/12189517366 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12187626769 passes for WH, GS not affected and fails which it does on main https://github.com/tenstorrent/tt-metal/actions/runs/12189542166 - [x] New/Existing tests provide coverage for changes
tenstorrent · Dec 6, 2024 · a8a044a · a8a044a
1 parent dc6d684
commit a8a044a
Showing 1 changed file with 6 additions and 2 deletions.
diff --git a/ttnn/cpp/ttnn/operations/reduction/generic/device/reduce_op.cpp b/ttnn/cpp/ttnn/operations/reduction/generic/device/reduce_op.cpp
@@ -191,8 +191,12 @@ Tensor reduce(
     auto is_multicore_hw = parallelization_strategy == ReduceOpParallelizationStrategy::MULTI_CORE_HW;
     float pad_value = reduce_math == ReduceOpMath::MAX ? -std::numeric_limits<float>::infinity() : 0;
 
-    ttnn::DeviceComputeKernelConfig config = compute_kernel_config.value_or(
-        ttnn::init_device_compute_kernel_config(input_tensor.device()->arch(), std::nullopt, MathFidelity::HiFi4));
+    ttnn::DeviceComputeKernelConfig config = compute_kernel_config.value_or(ttnn::init_device_compute_kernel_config(
+        input_tensor.device()->arch(),
+        std::nullopt,
+        MathFidelity::HiFi4,
+        /*default_approx_mode=*/false,
+        /*default_fp32_acc=*/true));
 
     std::vector<Tensor> output_tensors = {Tensor(operation::get_workers_for_op_output({input_tensor}))};
     if (is_multicore_hw) {