Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime hangs on DG2 (and Gen12 iGPU maybe?) #706

Open
tazz4843 opened this issue Feb 9, 2024 · 3 comments
Open

Runtime hangs on DG2 (and Gen12 iGPU maybe?) #706

tazz4843 opened this issue Feb 9, 2024 · 3 comments

Comments

@tazz4843
Copy link

tazz4843 commented Feb 9, 2024

I'm running into random hangs when my app is running during normal use, that began occurring several months ago, roughly September 2023. A stack trace is attached, see end for it. I was doing some digging and found this related comment with the exact same stack trace, although only on DG2 and running an unsupported kernel, while I was able to occasionally reproduce this on Gen12 iGPUs and on a much more modern kernel version.
I'm using whisper.cpp with its OpenCL backend to run arbitrary speech-to-text. If one thread ends up hanging, all other runtime threads also end up hanging, spinning multiple cores to 100%.

I'm very new to all of this so please let me know if there's any information I can supply :)

Host details:
GPU: Arc A770
Arch Linux w/ kernel 6.7.3-arch1-1.1
intel-compute-runtime-23.48.27912.11-1

backtrace.txt

@eero-t
Copy link

eero-t commented Feb 13, 2024

Looking at the backtrace:

  • 8 "tokio-runtime-w" threads have yielded their execution in NEO::CommandStreamReceiver::baseWaitFunction()
  • 1 "scripty_stt_ser" thread is futex waiting worker closing in NEO::DrmGemCloseWorker::worker()
  • 1 "scripty_stt_ser" thread is Tokyo Rust code directly hanging in futex_wait() syscall

@geekboood
Copy link

I have the same issue when running openvino model server

@tazz4843
Copy link
Author

Sorry this took me so long to get back to.

Looking at the backtrace:

  • 1 "scripty_stt_ser" thread is Tokyo Rust code directly hanging in futex_wait() syscall

From what I've looked at the code, it seems that this runtime worker is waiting for compute runtime code to return thus making me think this is the issue. Disabling the OpenCL runtime and falling back to CPU makes this issue completely disappear, even after weeks of runtime, compared to usually at most 1 week before it locks up and starts spinning on CPU with OpenCL integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants