-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vulkan] Lightning model failing with failed to legalize operation 'memref.alloc'
#8741
Comments
Seeing the same issue in #8712 (comment)
I think it does not work on vulkan side if there are i64 constants. |
correct - you'll need #8738 and the |
(though if the model is actually trying to use i64 values then it's going to have problems - that constant makes me think it's doing something weird - does this model really need integer numbers as large as 548682072064?) |
Seems unrelated to i64; at least I'm seeing different issues with the current tip of the tree 4c2140c:
func @main_dispatch_107() {
%c128 = arith.constant 128 : index
%c0_i32 = arith.constant 0 : i32
%cst = arith.constant -3.40282347E+38 : f32
%c0 = arith.constant 0 : index
%c17 = arith.constant 17 : index
%c1 = arith.constant 1 : index
%0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) offset(%c128) alignment(64) : !flow.dispatch.tensor<readonly:2304x17xf32>
%1 = hal.interface.binding.subspan set(0) binding(1) type(storage_buffer) offset(%c0) alignment(64) : !flow.dispatch.tensor<writeonly:17xi32>
%workgroup_id_x = hal.interface.workgroup.id[0] : index
%workgroup_count_x = hal.interface.workgroup.count[0] : index
%2 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_id_x]
%3 = affine.apply affine_map<()[s0] -> (s0 * 4)>()[%workgroup_count_x]
scf.for %arg0 = %2 to %c17 step %3 {
%4 = affine.min affine_map<(d0) -> (-d0 + 17, 4)>(%arg0)
%5 = flow.dispatch.tensor.load %0, offsets = [0, %arg0], sizes = [2304, %4], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x?xf32>
%6 = linalg.init_tensor [%4] : tensor<?xi32>
%7 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%c0_i32 : i32) outs(%6 : tensor<?xi32>) -> tensor<?xi32>
%8 = linalg.init_tensor [%4] : tensor<?xf32>
%9 = linalg.fill {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} ins(%cst : f32) outs(%8 : tensor<?xf32>) -> tensor<?xf32>
%10:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%5 : tensor<2304x?xf32>) outs(%7, %9 : tensor<?xi32>, tensor<?xf32>) attrs = {lowering_config = #iree_codegen.lowering_config<tile_sizes = [[4], [1]]>} {
^bb0(%arg1: f32, %arg2: i32, %arg3: f32):
%11 = linalg.index 1 : index
%12 = arith.index_cast %11 : index to i32
%13 = arith.cmpf ogt, %arg1, %arg3 : f32
%14 = arith.select %13, %arg1, %arg3 : f32
%15 = arith.select %13, %12, %arg2 : i32
linalg.yield %15, %14 : i32, f32
} -> (tensor<?xi32>, tensor<?xf32>)
flow.dispatch.tensor.store %10#0, %1, offsets = [%arg0], sizes = [%4], strides = [%c1] : tensor<?xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
}
return
}
The original flow level input: func @main_dispatch_107(%arg0: !flow.dispatch.tensor<readonly:2304x17xf32>, %arg1: !flow.dispatch.tensor<writeonly:17xi32>) {
%c0_i32 = arith.constant 0 : i32
%cst = arith.constant -3.40282347E+38 : f32
%0 = flow.dispatch.tensor.load %arg0, offsets = [0, 0], sizes = [2304, 17], strides = [1, 1] : !flow.dispatch.tensor<readonly:2304x17xf32> -> tensor<2304x17xf32>
%1 = linalg.init_tensor [17] : tensor<17xi32>
%2 = linalg.init_tensor [17] : tensor<17xf32>
%3 = linalg.fill ins(%cst : f32) outs(%2 : tensor<17xf32>) -> tensor<17xf32>
%4 = linalg.fill ins(%c0_i32 : i32) outs(%1 : tensor<17xi32>) -> tensor<17xi32>
%5:2 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d1, d0)>, affine_map<(d0, d1) -> (d0)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%0 : tensor<2304x17xf32>) outs(%4, %3 : tensor<17xi32>, tensor<17xf32>) {
^bb0(%arg2: f32, %arg3: i32, %arg4: f32):
%6 = linalg.index 1 : index
%7 = arith.index_cast %6 : index to i32
%8 = arith.cmpf ogt, %arg2, %arg4 : f32
%9 = arith.select %8, %arg2, %arg4 : f32
%10 = arith.select %8, %7, %arg3 : i32
linalg.yield %10, %9 : i32, f32
} -> (tensor<17xi32>, tensor<17xf32>)
flow.dispatch.tensor.store %5#0, %arg1, offsets = [0], sizes = [17], strides = [1] : tensor<17xi32> -> !flow.dispatch.tensor<writeonly:17xi32>
return
} Looking further up, this is generated by TOSA: %310 = "tosa.argmax"(%309) {axis = 1 : i64} : (tensor<1x2304x17xf32>) -> tensor<1x17xi32> So I think it's the same issue as #8899. |
For now there is no way around it. We will need a stack allocation. It is bounded. Hanhan added a pass recently that checks that all allocations are bounded statically. You could use a similar approach in the allocation callback to allocate a statically sized buffer. |
This was fixed and now the test is passing with the |
To reproduce in iree-samples, run:
MLIR files:
tflite.mlir.zip
tosa.mlir.zip
The text was updated successfully, but these errors were encountered: