[Vulkan] Lightning model failing with `failed to legalize operation 'memref.alloc'` #8741

mariecwhite · 2022-04-01T19:46:55Z

To reproduce in iree-samples, run:

./setup_venv.sh
source iree-samples.venv/bin/activate

python tflitehub/lightning_test.py --config=vulkan

MLIR files:
tflite.mlir.zip
tosa.mlir.zip

The text was updated successfully, but these errors were encountered:

hanhanW · 2022-04-04T17:49:47Z

Seeing the same issue in #8712 (comment)
They could be the same issue.

repro.mlir:13:26: error: failed to legalize operation 'arith.constant'
    %c548682072064_i64 = arith.constant 548682072064 : i64

I think it does not work on vulkan side if there are i64 constants.

benvanik · 2022-04-04T17:50:59Z

correct - you'll need #8738 and the -iree-flow-demote-i64-to-i32 flag to run this on vulkan (at least)

benvanik · 2022-04-04T17:52:21Z

(though if the model is actually trying to use i64 values then it's going to have problems - that constant makes me think it's doing something weird - does this model really need integer numbers as large as 548682072064?)

antiagainst · 2022-04-26T14:27:43Z

Seems unrelated to i64; at least I'm seeing different issues with the current tip of the tree 4c2140c:

arith.remui with index type conversion fails. I've uploaded https://reviews.llvm.org/D124451 to fix it.
Then there is really a memref.alloc issue. -iree-codegen-linalg-bufferize generates memref.alloc for the following input function:

func @main_dispatch_107() {
  %c128 = arith.constant 128 : index
  %c0_i32 = arith.constant 0 : i32
  %cst = arith.constant -3.40282347E+38 : f32
  %c0 = arith.constant 0 : index
  %c17 = arith.constant 17 : index
  %c1 = arith.constant 1 : index
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c128) alignment(64) : !flow.dispatch.tensor<readonly:2304x17xf32>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<writeonly:17xi32>
  %workgroup_id_x = hal.interface.workgroup.id[0] : index
  %workgroup_count_x = hal.interface.workgroup.count[0] : index
  %2 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_id_x]
  %3 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_count_x]
  scf.for %arg0 = %2 to %c17 step %3 {
    %4 = affine.min affine_map<(d0) -> (-d0 + 17, 4)>(%arg0)
    %5 = flow.dispatch.tensor.load %0, offsets = [0, %arg0], sizes = [2304, %4], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x?xf32>
    %6 = linalg.init_tensor [%4] : tensor<?xi32>
    %7 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%c0_i32 : i32) outs(%6 : tensor<?xi32>) -> tensor<?xi32>
    %8 = linalg.init_tensor [%4] : tensor<?xf32>
    %9 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%cst : f32) outs(%8 : tensor<?xf32>) -> tensor<?xf32>
    %10:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%5 : tensor<2304x?xf32>) outs(%7, %9 : tensor<?xi32>, tensor<?xf32>) attrs =  {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} {
    ^bb0(%arg1: f32, %arg2: i32, %arg3: f32):
      %11 = linalg.index 1 : index
      %12 = arith.index_cast %11 : index to i32
      %13 = arith.cmpf ogt, %arg1, %arg3 : f32
      %14 = arith.select %13, %arg1, %arg3 : f32
      %15 = arith.select %13, %12, %arg2 : i32
      linalg.yield %15, %14 : i32, f32
    } -> (tensor<?xi32>, tensor<?xf32>)
    flow.dispatch.tensor.store %10#0, %1, offsets = [%arg0], sizes = [%4], strides = [%c1] : tensor<?xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
  }
  return
}

%9 would trigger a new memref.alloc in the above. It's used for computing the output but later discarded.

The original flow level input:

func @main_dispatch_107(%arg0: !flow.dispatch.tensor<readonly:2304x17xf32>, %arg1: !flow.dispatch.tensor<writeonly:17xi32>) {
  %c0_i32 = arith.constant 0 : i32
  %cst = arith.constant -3.40282347E+38 : f32
  %0 = flow.dispatch.tensor.load %arg0, offsets = [0, 0], sizes = [2304, 17], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x17xf32>
  %1 = linalg.init_tensor [17] : tensor<17xi32>
  %2 = linalg.init_tensor [17] : tensor<17xf32>
  %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<17xf32>) -> tensor<17xf32>
  %4 = linalg.fill ins(%c0_i32 : i32) outs(%1 : tensor<17xi32>) -> tensor<17xi32>
  %5:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%0 : tensor<2304x17xf32>) outs(%4, %3 : tensor<17xi32>, tensor<17xf32>) {
  ^bb0(%arg2: f32, %arg3: i32, %arg4: f32):
    %6 = linalg.index 1 : index
    %7 = arith.index_cast %6 : index to i32
    %8 = arith.cmpf ogt, %arg2, %arg4 : f32
    %9 = arith.select %8, %arg2, %arg4 : f32
    %10 = arith.select %8, %7, %arg3 : i32
    linalg.yield %10, %9 : i32, f32
  } -> (tensor<17xi32>, tensor<17xf32>)
  flow.dispatch.tensor.store %5#0, %arg1, offsets = [0], sizes = [17], strides = [1] : tensor<17xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
  return
}

Looking further up, this is generated by TOSA:

%310 = "tosa.argmax"(%309) {axis = 1 : i64} : (tensor<1x2304x17xf32>) -> tensor<1x17xi32>

So I think it's the same issue as #8899.

MaheshRavishankar · 2022-05-03T22:31:57Z

For now there is no way around it. We will need a stack allocation. It is bounded. Hanhan added a pass recently that checks that all allocations are bounded statically. You could use a similar approach in the allocation callback to allocate a statically sized buffer.

antiagainst · 2022-06-10T22:47:32Z

This was fixed and now the test is passing with the candidate-20220609.164 release.

mariecwhite added bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/tosa Tensor Operator Set Architecture (TOSA) import, tests, etc. labels Apr 1, 2022

mariecwhite assigned rsuderman Apr 1, 2022

antiagainst assigned antiagainst and unassigned rsuderman Apr 26, 2022

allieculp added this to IREE - ARCHIVED (do not update) May 17, 2022

hanhanW mentioned this issue May 18, 2022

spirv: error: 'spv.Store' op mismatch in result type and pointer type #9156

Closed

ThomasRaoux linked a pull request May 18, 2022 that will close this issue

Remove or refactor a few tests under iree/test/e2e/. #9151

Merged

ThomasRaoux removed a link to a pull request May 19, 2022

Remove or refactor a few tests under iree/test/e2e/. #9151

Merged

antiagainst closed this as completed Jun 10, 2022

antiagainst moved this to Done in IREE - ARCHIVED (do not update) Jun 10, 2022

GMNGeoffrey added this to IREE Jun 27, 2022

GMNGeoffrey moved this to Done in IREE Jun 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vulkan] Lightning model failing with `failed to legalize operation 'memref.alloc'` #8741

[Vulkan] Lightning model failing with `failed to legalize operation 'memref.alloc'` #8741

mariecwhite commented Apr 1, 2022

hanhanW commented Apr 4, 2022

benvanik commented Apr 4, 2022 •

edited

Loading

benvanik commented Apr 4, 2022

antiagainst commented Apr 26, 2022

MaheshRavishankar commented May 3, 2022

antiagainst commented Jun 10, 2022

[Vulkan] Lightning model failing with failed to legalize operation 'memref.alloc' #8741

[Vulkan] Lightning model failing with failed to legalize operation 'memref.alloc' #8741

Comments

mariecwhite commented Apr 1, 2022

hanhanW commented Apr 4, 2022

benvanik commented Apr 4, 2022 • edited Loading

benvanik commented Apr 4, 2022

antiagainst commented Apr 26, 2022

MaheshRavishankar commented May 3, 2022

antiagainst commented Jun 10, 2022

[Vulkan] Lightning model failing with `failed to legalize operation 'memref.alloc'` #8741

[Vulkan] Lightning model failing with `failed to legalize operation 'memref.alloc'` #8741

benvanik commented Apr 4, 2022 •

edited

Loading