Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-suspend-resume-with-playback failed with workqueue lockup error #5118

Open
fredoh9 opened this issue Jul 22, 2024 · 3 comments
Open

check-suspend-resume-with-playback failed with workqueue lockup error #5118

fredoh9 opened this issue Jul 22, 2024 · 3 comments
Labels
Intel Daily tests This issue can be found in Intel internal daily tests LNL Applies to Lunar Lake platform suspend resume Issues related to suspend resume (e.g. rtcwake)

Comments

@fredoh9
Copy link
Collaborator

fredoh9 commented Jul 22, 2024

Found this error from weekend stress test in LNLM_RVP_HDA.

The command to reproduce:
TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-suspend-resume-with-audio.sh -l 100 -m playback

The error message is this,

===========================>>
[ 1843.893004] kernel: BUG: workqueue lockup - pool cpus=0-7 node=0 flags=0x4 nice=0 stuck for 54s!
<<===========================

Input/output error already happened in 4th loop

2024-07-20 16:15:50 UTC Sub-Test: [REMOTE_INFO] ===== Round(4/100) =====
2024-07-20 16:15:50 UTC Sub-Test: [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Sat Jul 20 16:15:56 2024
2024-07-20 16:15:56 UTC Sub-Test: [REMOTE_COMMAND] sleep for 5
aplay: pcm_write:2127: write error: Input/output error

kernel lockup happened in 25th loop

2024-07-20 16:20:07 UTC Sub-Test: [REMOTE_INFO] ===== Round(25/100) =====
2024-07-20 16:20:07 UTC Sub-Test: [REMOTE_COMMAND] Run the command: rtcwake -m mem -s 5
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Sat Jul 20 16:20:13 2024
2024-07-20 16:21:58 UTC Sub-Test: [REMOTE_COMMAND] sleep for 5
2024-07-20 16:22:06 UTC Sub-Test: [REMOTE_INFO] Check for the kernel log status
declare -- cmd="journalctl_cmd --since=@1721492406"
journalctl_cmd is a function
journalctl_cmd () 
{ 
    sudo journalctl -k -q --no-pager --utc --output=short-monotonic --no-hostname "$@"
}
2024-07-20 16:22:06 UTC [ERROR] Caught kernel log error
===========================>>
[ 1843.893004] kernel: BUG: workqueue lockup - pool cpus=0-7 node=0 flags=0x4 nice=0 stuck for 54s!
<<===========================

I don't remember this error lately, reproduction rate will be very low.

Intel Internal test result link:
planresultdetail/44120?model=LNLM_RVP_HDA&testcase=check-suspend-resume-with-playback-100

@fredoh9 fredoh9 added suspend resume Issues related to suspend resume (e.g. rtcwake) Intel Daily tests This issue can be found in Intel internal daily tests LNL Applies to Lunar Lake platform labels Jul 22, 2024
@plbossart
Copy link
Member

dmesg is not available @fredoh9 ?

@fredoh9
Copy link
Collaborator Author

fredoh9 commented Jul 22, 2024

right, dmesg was not captured in this case. I'm looking at the issue logs are not collected in some failure cases also.

I will use journalctl and upload equivalent dmesg when I have the access to the device.

@fredoh9
Copy link
Collaborator Author

fredoh9 commented Jul 22, 2024

dmesg for the failure boot from ba-lnlm-rvp-hda-02.
dmesg_20240720_stress_test.txt

Jul 20 16:21:58.904948 kernel: snd_sof_intel_hda_common:hda_dsp_state_log: sof-audio-pci-intel-lnl 0000:00:1f.3: Current DSP power state: D3
Jul 20 16:21:58.905276 kernel: snd_sof:sof_set_fw_state: sof-audio-pci-intel-lnl 0000:00:1f.3: fw_state change: 7 -> 0
Jul 20 16:21:58.905612 kernel: ACPI: EC: interrupt blocked
Jul 20 16:21:58.905671 kernel: ACPI: EC: interrupt unblocked
Jul 20 16:21:58.905713 kernel: BUG: workqueue lockup - pool cpus=0-7 node=0 flags=0x4 nice=0 stuck for 54s!
Jul 20 16:21:58.905922 kernel: Showing busy workqueues and worker pools:
Jul 20 16:21:58.952292 kernel: workqueue events: flags=0x0
Jul 20 16:21:58.952375 kernel:   pwq 2: cpus=0 node=0 flags=0x0 nice=0 active=1 refcnt=2
Jul 20 16:21:58.952417 kernel:     pending: vmstat_shepherd
Jul 20 16:21:58.952457 kernel: workqueue events_unbound: flags=0x2
Jul 20 16:21:58.952496 kernel:   pwq 33: cpus=0-7 node=0 flags=0x4 nice=0 active=1 refcnt=2
Jul 20 16:21:58.952542 kernel:     pending: crng_reseed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intel Daily tests This issue can be found in Intel internal daily tests LNL Applies to Lunar Lake platform suspend resume Issues related to suspend resume (e.g. rtcwake)
Projects
None yet
Development

No branches or pull requests

2 participants