Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vulkan] Lightning model failing with failed to legalize operation 'memref.alloc' #8741

Closed
mariecwhite opened this issue Apr 1, 2022 · 6 comments
Assignees
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/tosa Tensor Operator Set Architecture (TOSA) import, tests, etc.

Comments

@mariecwhite
Copy link
Contributor

To reproduce in iree-samples, run:

./setup_venv.sh
source iree-samples.venv/bin/activate

python tflitehub/lightning_test.py --config=vulkan

MLIR files:
tflite.mlir.zip
tosa.mlir.zip

@mariecwhite mariecwhite added bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/tosa Tensor Operator Set Architecture (TOSA) import, tests, etc. labels Apr 1, 2022
@hanhanW
Copy link
Contributor

hanhanW commented Apr 4, 2022

Seeing the same issue in #8712 (comment)
They could be the same issue.

repro.mlir:13:26: error: failed to legalize operation 'arith.constant'
    %c548682072064_i64 = arith.constant 548682072064 : i64

I think it does not work on vulkan side if there are i64 constants.

@benvanik
Copy link
Collaborator

benvanik commented Apr 4, 2022

correct - you'll need #8738 and the -iree-flow-demote-i64-to-i32 flag to run this on vulkan (at least)

@benvanik
Copy link
Collaborator

benvanik commented Apr 4, 2022

(though if the model is actually trying to use i64 values then it's going to have problems - that constant makes me think it's doing something weird - does this model really need integer numbers as large as 548682072064?)

@antiagainst
Copy link
Contributor

Seems unrelated to i64; at least I'm seeing different issues with the current tip of the tree 4c2140c:

  1. arith.remui with index type conversion fails. I've uploaded https://reviews.llvm.org/D124451 to fix it.
  2. Then there is really a memref.alloc issue. -iree-codegen-linalg-bufferize generates memref.alloc for the following input function:
func @main_dispatch_107() {
  %c128 = arith.constant 128 : index
  %c0_i32 = arith.constant 0 : i32
  %cst = arith.constant -3.40282347E+38 : f32
  %c0 = arith.constant 0 : index
  %c17 = arith.constant 17 : index
  %c1 = arith.constant 1 : index
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c128) alignment(64) : !flow.dispatch.tensor<readonly:2304x17xf32>
  %1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<writeonly:17xi32>
  %workgroup_id_x = hal.interface.workgroup.id[0] : index
  %workgroup_count_x = hal.interface.workgroup.count[0] : index
  %2 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_id_x]
  %3 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_count_x]
  scf.for %arg0 = %2 to %c17 step %3 {
    %4 = affine.min affine_map<(d0) -> (-d0 + 17, 4)>(%arg0)
    %5 = flow.dispatch.tensor.load %0, offsets = [0, %arg0], sizes = [2304, %4], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x?xf32>
    %6 = linalg.init_tensor [%4] : tensor<?xi32>
    %7 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%c0_i32 : i32) outs(%6 : tensor<?xi32>) -> tensor<?xi32>
    %8 = linalg.init_tensor [%4] : tensor<?xf32>
    %9 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%cst : f32) outs(%8 : tensor<?xf32>) -> tensor<?xf32>
    %10:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%5 : tensor<2304x?xf32>) outs(%7, %9 : tensor<?xi32>, tensor<?xf32>) attrs =  {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} {
    ^bb0(%arg1: f32, %arg2: i32, %arg3: f32):
      %11 = linalg.index 1 : index
      %12 = arith.index_cast %11 : index to i32
      %13 = arith.cmpf ogt, %arg1, %arg3 : f32
      %14 = arith.select %13, %arg1, %arg3 : f32
      %15 = arith.select %13, %12, %arg2 : i32
      linalg.yield %15, %14 : i32, f32
    } -> (tensor<?xi32>, tensor<?xf32>)
    flow.dispatch.tensor.store %10#0, %1, offsets = [%arg0], sizes = [%4], strides = [%c1] : tensor<?xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
  }
  return
}

%9 would trigger a new memref.alloc in the above. It's used for computing the output but later discarded.

The original flow level input:

func @main_dispatch_107(%arg0: !flow.dispatch.tensor<readonly:2304x17xf32>, %arg1: !flow.dispatch.tensor<writeonly:17xi32>) {
  %c0_i32 = arith.constant 0 : i32
  %cst = arith.constant -3.40282347E+38 : f32
  %0 = flow.dispatch.tensor.load %arg0, offsets = [0, 0], sizes = [2304, 17], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x17xf32>
  %1 = linalg.init_tensor [17] : tensor<17xi32>
  %2 = linalg.init_tensor [17] : tensor<17xf32>
  %3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<17xf32>) -> tensor<17xf32>
  %4 = linalg.fill ins(%c0_i32 : i32) outs(%1 : tensor<17xi32>) -> tensor<17xi32>
  %5:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%0 : tensor<2304x17xf32>) outs(%4, %3 : tensor<17xi32>, tensor<17xf32>) {
  ^bb0(%arg2: f32, %arg3: i32, %arg4: f32):
    %6 = linalg.index 1 : index
    %7 = arith.index_cast %6 : index to i32
    %8 = arith.cmpf ogt, %arg2, %arg4 : f32
    %9 = arith.select %8, %arg2, %arg4 : f32
    %10 = arith.select %8, %7, %arg3 : i32
    linalg.yield %10, %9 : i32, f32
  } -> (tensor<17xi32>, tensor<17xf32>)
  flow.dispatch.tensor.store %5#0, %arg1, offsets = [0], sizes = [17], strides = [1] : tensor<17xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
  return
}

Looking further up, this is generated by TOSA:

%310 = "tosa.argmax"(%309) {axis = 1 : i64} : (tensor<1x2304x17xf32>) -> tensor<1x17xi32>

So I think it's the same issue as #8899.

@antiagainst antiagainst assigned antiagainst and unassigned rsuderman Apr 26, 2022
@MaheshRavishankar
Copy link
Contributor

For now there is no way around it. We will need a stack allocation. It is bounded. Hanhan added a pass recently that checks that all allocations are bounded statically. You could use a similar approach in the allocation callback to allocate a statically sized buffer.

@antiagainst
Copy link
Contributor

This was fixed and now the test is passing with the candidate-20220609.164 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/tosa Tensor Operator Set Architecture (TOSA) import, tests, etc.
Projects
No open projects
Archived in project
Development

No branches or pull requests

6 participants