Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[regression][GPU]: 'func.func' op uses 81920 bytes of shared memory; exceeded the limit of 65536 bytes post 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f #19511

Open
pdhirajkumarprasad opened this issue Dec 18, 2024 · 5 comments
Assignees
Labels
bug 🐞 Something isn't working

Comments

@pdhirajkumarprasad
Copy link

What happened?

We have 100+ models failing in GPU which were earlier passing numeric and started post 6ff00a8

module {
  func.func @torch_jit(%arg1:!torch.vtensor<[1,128,4,256],f32>) -> !torch.vtensor<[1,257,4,256],f32    > attributes {torch.onnx_meta.ir_version = 7 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "pytorch", torch.onnx_meta.producer_version = "1.12.1"} {
    %1 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_stages.2.1.transformer.0.attn.qkv_proj.weight> : tensor<257x128x1x1xf32>} : () -> !torch.vtensor<[257,128,1,1],f32> 
    %2 = torch.operator "onnx.Constant"() {torch.onnx.value = dense_resource<_stages.2.1.transformer.0.attn.qkv_proj.bias> : tensor<257xf32>} : () -> !torch.vtensor<[257],f32> 
    %3 = torch.operator "onnx.Conv"(%arg1, %1, %2) {torch.onnx.dilations = [1 : si64, 1 : si64], torch.onnx.group = 1 : si64, torch.onnx.kernel_shape = [1 : si64, 1 : si64], torch.onnx.pads = [0 : si64, 0 : si64, 0 : si64, 0 : si64], torch.onnx.strides = [1 : si64, 1 : si64]} : (!torch.vtensor<[1,128,4,256],f32>, !torch.vtensor<[257,128,1,1],f32>, !torch.vtensor<[257],f32>) -> !torch.vtensor<[1,257,4,256],f32> 
    return %3 : !torch.vtensor<[1,257,4,256],f32    >
  }
}

command:

iree-compile model.torch_onnx.mlir --iree-hal-target-backends=rocm --iree-hip-target=gfx942 -o abc.vmfb

model.torch_onnx.mlir.txt

Steps to reproduce your issue

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What component(s) does this issue relate to?

Compiler

Version information

No response

Additional context

No response

@pdhirajkumarprasad pdhirajkumarprasad added the bug 🐞 Something isn't working label Dec 18, 2024
@ScottTodd
Copy link
Member

Which 100 models? Please be specific in bug reports.

@ScottTodd
Copy link
Member

@ScottTodd
Copy link
Member

Good news: with the existing tests in https://github.com/iree-org/iree-test-suites/tree/main/onnx_models, I can reproduce a regression in a few models like mobilenet and resnet50 using --iree-hal-target-backends=rocm --iree-hip-target=gfx1100. So once I land PRs to get those tests running on multiple backends, we should have earlier signal for these sorts of regressions.

@ScottTodd
Copy link
Member

ScottTodd commented Dec 19, 2024

Repro using resnet from iree-org/iree-test-suites#65:

# Setup
cd onnx_models
.\.venv\Scripts\activate.bat
pip install --upgrade -r requirements-iree.txt

# Run test that passes
pytest --log-cli-level=info -rA --durations=0 -k resnet --test-config-file=./configs/onnx_models_gpu_rocm_rdna3.json

# Switch to broken release, run again to see failure
pip install --find-links https://iree.dev/pip-release-links.html iree-base-compiler==3.1.0rc20241217
pytest --log-cli-level=info -rA --durations=0 -k resnet --test-config-file=./configs/onnx_models_gpu_rocm_rdna3.json

Once that PR and #19524 land we'll have that coverage on all IREE PRs.

@ScottTodd
Copy link
Member

I also noticed that this error spews a looooot of logs (multiple thousands of lines):

return funcOp.emitOpError("uses ")
<< cumSize << " bytes of shared memory; exceeded the limit of "
<< limit << " bytes";

Maybe we don't need the entire context?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants