Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HAL][Metal] Runtime returns incorrect tensor values #19530

Open
chrsmcgrr opened this issue Dec 19, 2024 · 2 comments
Open

[HAL][Metal] Runtime returns incorrect tensor values #19530

chrsmcgrr opened this issue Dec 19, 2024 · 2 comments
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend hal/metal Runtime Apple Metal HAL backend

Comments

@chrsmcgrr
Copy link
Contributor

What happened?

Compiling a simple MatMul example with IREE using the Metal backend and executing with the runtime yields invalid results.

iree-run-module --module=model.vmfb --input=10x10xf32=0.42 --function="main" --device=metal
EXEC @main
result[0]: hal.buffer_view
10x10xf32=[0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0][0 0 0 0 0 0 0 0 0 0]

Steps to reproduce your issue

  1. Given the following linalg
module @module {
  util.func public @main$async(%arg0: !hal.buffer_view, %arg1: !hal.fence, %arg2: !hal.fence) -> !hal.buffer_view attributes {inlining_policy = #util.inline.never, iree.abi.model = "coarse-fences", iree.abi.stub} {
    %cst = arith.constant 0.000000e+00 : f32
    %0 = hal.tensor.import wait(%arg1) => %arg0 : !hal.buffer_view -> tensor<10x10xf32>
    %1 = tensor.empty() : tensor<10x10xf32>
    %2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<10x10xf32>) -> tensor<10x10xf32>
    %3 = linalg.matmul ins(%0, %0 : tensor<10x10xf32>, tensor<10x10xf32>) outs(%2 : tensor<10x10xf32>) -> tensor<10x10xf32>
    %4 = hal.tensor.barrier join(%3 : tensor<10x10xf32>) => %arg2 : !hal.fence
    %5 = hal.tensor.export %4 : tensor<10x10xf32> -> !hal.buffer_view
    util.return %5 : !hal.buffer_view
  }
  util.func public @main(%arg0: !hal.buffer_view) -> !hal.buffer_view attributes {iree.abi.stub} {
    %0 = util.null : !hal.fence
    %c-1_i32 = arith.constant -1 : i32
    %c0 = arith.constant 0 : index
    %device_0 = hal.devices.get %c0 : !hal.device
    %fence = hal.fence.create device(%device_0 : !hal.device) flags("None") : !hal.fence
    %1 = util.call @main$async(%arg0, %0, %fence) : (!hal.buffer_view, !hal.fence, !hal.fence) -> !hal.buffer_view
    %status = hal.fence.await until([%fence]) timeout_millis(%c-1_i32) : i32
    util.return %1 : !hal.buffer_view
  }
}
  1. Compiling the linalg with:
iree-compile --iree-input-type=auto --iree-hal-target-backends=metal-spirv iree.mlir -o model.vmfb
  1. Executing with run-module
iree-run-module --module=model.vmfb --input=10x10xf32=0.42 --function="main" --device=metal
  1. Check that tensor returns elements that are not zero.

What component(s) does this issue relate to?

Runtime

Version information

iree-base-compiler 3.0.0
iree-base-runtime 3.0.0

Additional context

If I run on latest main: ed9a028d3f3bfb0ab32004881c87539577048aa8

I get the following errorin the runtime:

Assertion failed: (placement.device), function iree_hal_metal_buffer_wrap, file metal_buffer.m, line 52.
@chrsmcgrr chrsmcgrr added the bug 🐞 Something isn't working label Dec 19, 2024
@ScottTodd ScottTodd added codegen/spirv SPIR-V code generation compiler backend hal/metal Runtime Apple Metal HAL backend labels Dec 19, 2024
@ScottTodd
Copy link
Member

Thanks for the report! Two clarifying questions:

  1. Do your repro steps work as expected on other backends, like --iree-hal-target-backends=llvm-cpu and --device=local-sync?

  2. When you tried with

    If I run on latest main: ed9a028d3f3bfb0ab32004881c87539577048aa8

    I get the following errorin the runtime:

    Assertion failed: (placement.device), function iree_hal_metal_buffer_wrap, file metal_buffer.m, line 52.
    

    Did you also recompile the .vmfb file? Compatibility with the prior release may have changed, and that looks like a different error than the output being incorrect. Could either be a bug in the runtime code or a mismatch between the compiler and the runtime, which should be fixed by compiling with a recent version of iree-compile (e.g. from the nightly releases: https://iree.dev/reference/bindings/python/#__tabbed_2_2)

@chrsmcgrr
Copy link
Contributor Author

Thanks for the quick response!

Do your repro steps work as expected on other backends, like --iree-hal-target-backends=llvm-cpu and --device=local-sync?

Yes compiling for CPU works with llvm-cpu and local-sync

Did you also recompile the .vmfb file?

I did with locally built binaries of iree-compile and iree-run-module. I will debug a little further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend hal/metal Runtime Apple Metal HAL backend
Projects
None yet
Development

No branches or pull requests

2 participants