We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compilation fails with --iree-dispatch-creation-enable-aggressive-fusion=true due to
quantize_dynamic.mlir:16:10: error: 'memref.alloca' op expected no unbounded stack allocations %4 = tensor.empty(%0) : tensor<?x128xf32> ^ quantize_dynamic.mlir:5:1: note: called from func.func @quantize_tensor(%arg0: tensor<?x128x32xf32>) -> (tensor<?x128x32xi8>, tensor<?x128xf32>, tensor<?x128xf32>) { ^ quantize_dynamic.mlir:16:10: note: see current operation: %51 = "memref.alloca"(%46) <{alignment = 64 : i64, operandSegmentSizes = array<i32: 1, 0>}> : (index) -> memref<?x128xf32> %4 = tensor.empty(%0) : tensor<?x128xf32>
ir_dump.txt
compiles without the --iree-dispatch-creation-enable-aggressive-fusion=true flag.
#map = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map1 = affine_map<(d0, d1, d2) -> (d0, d1)> #map2 = affine_map<(d0, d1) -> (d0, d1)> func.func @quantize_tensor(%arg0: tensor<?x128x32xf32>) -> (tensor<?x128x32xi8>, tensor<?x128xf32>, tensor<?x128xf32>) { %c0 = arith.constant 0 : index %c127_i32 = arith.constant 127 : i32 %c-128_i32 = arith.constant -128 : i32 %c0_i32 = arith.constant 0 : i32 %cst = arith.constant 0.000000e+00 : f32 %cst_0 = arith.constant 1.280000e+02 : f32 %cst_1 = arith.constant -3.40282347E+38 : f32 %cst_2 = arith.constant 1.000000e+00 : f32 %0 = tensor.dim %arg0, %c0 : tensor<?x128x32xf32> %4 = tensor.empty(%0) : tensor<?x128xf32> %5 = linalg.fill ins(%cst_1 : f32) outs(%4 : tensor<?x128xf32>) -> tensor<?x128xf32> %6 = linalg.generic { indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "reduction"] } ins(%arg0 : tensor<?x128x32xf32>) outs(%5 : tensor<?x128xf32>) { ^bb0(%in: f32, %out: f32): %16 = math.absf %in : f32 %17 = arith.maximumf %16, %out : f32 linalg.yield %17 : f32 } -> tensor<?x128xf32> %7:2 = linalg.generic { indexing_maps = [#map2, #map2, #map2], iterator_types = ["parallel", "parallel"] } ins(%6 : tensor<?x128xf32>) outs(%4, %4 : tensor<?x128xf32>, tensor<?x128xf32>) { ^bb0(%in: f32, %out: f32, %out_5: f32): %16 = arith.divf %cst_0, %in : f32 %17 = arith.divf %cst_2, %16 : f32 linalg.yield %17, %16 : f32, f32 } -> (tensor<?x128xf32>, tensor<?x128xf32>) %8 = tensor.empty(%0) : tensor<?x128x32xi8> %9 = linalg.generic { indexing_maps = [#map, #map1, #map], iterator_types = ["parallel", "parallel", "parallel"] } ins(%arg0, %7#1 : tensor<?x128x32xf32>, tensor<?x128xf32>) outs(%8 : tensor<?x128x32xi8>) { ^bb0(%in: f32, %in_5: f32, %out: i8): %16 = arith.mulf %in_5, %in : f32 %17 = math.round %16 : f32 %18 = arith.fptosi %17 : f32 to i32 %19 = arith.maxsi %18, %c-128_i32 : i32 %20 = arith.minsi %19, %c127_i32 : i32 %21 = arith.trunci %20 : i32 to i8 linalg.yield %21 : i8 } -> tensor<?x128x32xi8> return %9, %7#0, %7#1 : tensor<?x128x32xi8>, tensor<?x128xf32>, tensor<?x128xf32> }
iree-compile --iree-hal-target-device=llvm-cpu --iree-llvmcpu-target-cpu=znver4 --iree-dispatch-creation-enable-aggressive-fusion=true
Compiler
7fc8a99
No response
The text was updated successfully, but these errors were encountered:
PadDynamicAllocPass
No branches or pull requests
What happened?
compilation fails with --iree-dispatch-creation-enable-aggressive-fusion=true due to
ir_dump.txt
compiles without the --iree-dispatch-creation-enable-aggressive-fusion=true flag.
Steps to reproduce your issue
iree-compile --iree-hal-target-device=llvm-cpu --iree-llvmcpu-target-cpu=znver4 --iree-dispatch-creation-enable-aggressive-fusion=true
What component(s) does this issue relate to?
Compiler
Version information
7fc8a99
Additional context
No response
The text was updated successfully, but these errors were encountered: