Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LNL] Multiple Pipeline Playback/capture stress test fails. #5086

Open
ssavati opened this issue Jun 26, 2024 · 7 comments
Open

[LNL] Multiple Pipeline Playback/capture stress test fails. #5086

ssavati opened this issue Jun 26, 2024 · 7 comments
Assignees
Labels
bug Something isn't working LNL Applies to Lunar Lake platform

Comments

@ssavati
Copy link

ssavati commented Jun 26, 2024

Multiple Pipeline Playback/capture t stress fails for HDA/NOCODEC/SDW. The issue reproduction is low but happens sporadically during CI stress test.

Steps
HDA Test command
TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 200

NOCDEC Tes tcommand
TPLG=/lib/firmware/intel/development/sof-lnl-nocodec.tplg MODEL=LNLM_RVP_NOCODEC SOF_TEST_INTERVAL=5 ~/sof-test/test-case/multiple-pipeline.sh -f p -l 200

config
Linux Branch:
topic/sof-dev
Linux Commit:
c75a95d34885

SOF Branch:
stable-v2.10
SOF Commit:
b15f1f1a3238

Issue reproduction : sporadic and low

2024-06-25 22:24:06 UTC [REMOTE_INFO] checking pipeline status again
2024-06-25 22:24:06 UTC [REMOTE_COMMAND] pkill -9 aplay arecord
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156308 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156301 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
declare -- cmd="journalctl_cmd --since=@1719354239"
2024-06-25 22:24:07 UTC [REMOTE_INFO] ===== Testing: (Loop: 31/200) =====
2024-06-25 22:24:07 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg to run the test case
2024-06-25 22:24:07 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-06-25 22:24:07 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-06-25 22:24:07 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg -f 'type:playback  & ~pcm:HDMI' -b ' pcm:HDA Digital' -s 0 -e
2024-06-25 22:24:07 UTC [REMOTE_INFO] Testing: HDA Analog [hw:0,0]
2024-06-25 22:24:07 UTC [REMOTE_COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-06-25 22:24:07 UTC [REMOTE_INFO] Testing: Deepbuffer HDA Analog [hw:0,31]
2024-06-25 22:24:07 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
2024-06-25 22:24:07 UTC [REMOTE_COMMAND] aplay   -D hw:0,31 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-06-25 22:24:08 UTC [REMOTE_INFO] checking pipeline status
2024-06-25 22:24:08 UTC [REMOTE_INFO] Letting playback/capture run for 5s
2024-06-25 22:24:13 UTC [REMOTE_INFO] checking pipeline status again
2024-06-25 22:24:13 UTC [REMOTE_COMMAND] pkill -9 aplay arecord
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156408 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156400 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
declare -- cmd="journalctl_cmd --since=@1719354246"
2024-06-25 22:24:14 UTC [REMOTE_INFO] ===== Testing: (Loop: 32/200) =====
2024-06-25 22:24:14 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg to run the test case
2024-06-25 22:24:14 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-06-25 22:24:14 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-06-25 22:24:14 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg -f 'type:playback  & ~pcm:HDMI' -b ' pcm:HDA Digital' -s 0 -e
2024-06-25 22:24:14 UTC [REMOTE_INFO] Testing: HDA Analog [hw:0,0]
2024-06-25 22:24:14 UTC [REMOTE_COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-06-25 22:24:14 UTC [REMOTE_INFO] Testing: Deepbuffer HDA Analog [hw:0,31]
2024-06-25 22:24:14 UTC [REMOTE_COMMAND] aplay   -D hw:0,31 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-06-25 22:24:14 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
2024-06-25 22:24:14 UTC [REMOTE_INFO] checking pipeline status
2024-06-25 22:24:14 UTC [REMOTE_INFO] Letting playback/capture run for 5s
2024-06-25 22:24:19 UTC [REMOTE_INFO] checking pipeline status again
2024-06-25 22:24:19 UTC [REMOTE_COMMAND] pkill -9 aplay arecord
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156499 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
declare -- cmd="journalctl_cmd --since=@1719354253"
2024-06-25 22:24:21 UTC [REMOTE_INFO] ===== Testing: (Loop: 33/200) =====
2024-06-25 22:24:21 UTC [REMOTE_INFO] /home/ubuntu/sof-test/test-case/multiple-pipeline.sh will use topology /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg to run the test case
2024-06-25 22:24:21 UTC [REMOTE_INFO] Pipeline list to ignore is specified, will ignore 'pcm=HDA Digital' in test case
2024-06-25 22:24:21 UTC [REMOTE_INFO] Run command to get pipeline parameters
2024-06-25 22:24:21 UTC [REMOTE_COMMAND] sof-tplgreader.py /usr/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg -f 'type:playback  & ~pcm:HDMI' -b ' pcm:HDA Digital' -s 0 -e
2024-06-25 22:24:21 UTC [REMOTE_INFO] Testing: HDA Analog [hw:0,0]
2024-06-25 22:24:21 UTC [REMOTE_COMMAND] aplay   -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
2024-06-25 22:24:21 UTC [REMOTE_INFO] Testing: Deepbuffer HDA Analog [hw:0,31]
2024-06-25 22:24:21 UTC [REMOTE_INFO] wait 0.5s for aplay_opts()
2024-06-25 22:24:21 UTC [REMOTE_COMMAND] aplay   -D hw:0,31 -c 2 -r 48000 -f S16_LE /dev/zero -q
aplay: main:831: audio open error: Device or resource busy
2024-06-25 22:24:21 UTC [REMOTE_INFO] checking pipeline status
2024-06-25 22:24:21 UTC [REMOTE_INFO] Letting playback/capture run for 5s
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156506 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
2024-06-25 22:24:26 UTC [REMOTE_INFO] checking pipeline status again
2024-06-25 22:24:26 UTC [REMOTE_ERROR] Running process count is 1, but 2 is expected
156597 aplay -D hw:0,0 -c 2 -r 48000 -f S16_LE /dev/zero -q
/home/ubuntu/sof-test/test-case/../case-lib/lib.sh: line 750: 156597 Killed                  aplay $SOF_ALSA_OPTS $SOF_APLAY_OPTS "$@"
2024-06-25 22:24:26 UTC [REMOTE_INFO] Starting func_exit_handler(1)
2024-06-25 22:24:26 UTC [REMOTE_ERROR] Starting func_exit_handler(), exit status=1, FUNCNAME stack:
2024-06-25 22:24:26 UTC [REMOTE_ERROR]  func_error_exit()  @  /home/ubuntu/sof-test/test-case/multiple-pipeline.sh
2024-06-25 22:24:26 UTC [REMOTE_ERROR]  ps_checks()  @  /home/ubuntu/sof-test/test-case/multiple-pipeline.sh:148
2024-06-25 22:24:26 UTC [REMOTE_ERROR]  main()  @  /home/ubuntu/sof-test/test-case/multiple-pipeline.sh:202
2024-06-25 22:24:31 UTC [REMOTE_INFO] pkill -TERM -f mtrace-reader.py
2024-06-25 22:24:31 UTC [REMOTE_INFO] nlines=4914 /home/ubuntu/sof-test/logs/multiple-pipeline/2024-06-25-22:20:33-2957/mtrace.txt
+ grep -B 2 -A 1 -i --word-regexp -e ERR -e ERROR -e '' -e OSError /home/ubuntu/sof-test/logs/multiple-pipeline/2024-06-25-22:20:33-2957/mtrace.txt
2024-06-25 22:24:37 UTC [REMOTE_INFO] ktime=14535 sof-test PID=153125: ending
2024-06-25 22:24:37 UTC [REMOTE_INFO] Test Result: FAIL!
@ssavati
Copy link
Author

ssavati commented Jun 26, 2024

"pkill -9 aplay arecord" cmd not able to terminate as one of aplay/arecord instance is stuck/hang/blocked. So next iteration fails to start new instances playback/capture

@kv2019i kv2019i added bug Something isn't working LNL Applies to Lunar Lake platform labels Jun 26, 2024
@lyakh
Copy link

lyakh commented Jun 28, 2024

if thesofproject/sof-test#1213 is merged, testing with it should tell us whether any aplay / arecord processes are blocked in the kernel. As suggested by @lgirdwood enabling some kernel soft lockup debugging options could help too

@jsarha
Copy link

jsarha commented Jul 4, 2024

I've now tried to reproduce this on both HDA and nocodec LNL setup. On HDA I only have 5 successful test runs, but on nocodec I just had 17 successful runs (about 7 hours) in a row. So the reproduction rate is very low. I still intend to catch up the nocodec numbers on HDA, but its starting to look like this issue does not show itself in reasonable time.

@jsarha
Copy link

jsarha commented Jul 8, 2024

Now I have run 19 successful runs (about 8 hours) on LNL HDA RVP. So that case too does not appear to reproduce. @lgirdwood , @kv2019i , should we close this one?

BTW, both tests were run with relatively recent daily build.

@kv2019i
Copy link
Collaborator

kv2019i commented Jul 8, 2024

Ack @jsarha . @ssavati ok to close or is this still seen in larger test plans? So either close or keep open as P3.

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 5, 2024

if thesofproject/sof-test#1213 is merged, testing with it should tell us whether any aplay / arecord processes are blocked in the kernel.

This just happened in daily test run /44499?model=LNLM_RVP_NOCODEC&testcase=multiple-pipeline-all-50

[ 5141.981465] kernel: task:aplay           state:R stack:10456 pid:60957 tgid:60957 ppid:60953  flags:0x00000006
[ 5141.981468] kernel: Call Trace:
[ 5141.981469] kernel:  
[ 5141.981470] kernel:  __schedule+0x311/0xa80
[ 5141.981473] kernel:  schedule+0x3e/0x130
[ 5141.981474] kernel:  schedule_timeout+0xb1/0x130
[ 5141.981476] kernel:  ? __pfx_process_timeout+0x10/0x10
[ 5141.981480] kernel:  __snd_pcm_lib_xfer+0x6ab/0x900 [snd_pcm]
[ 5141.981488] kernel:  ? __pfx_interleaved_copy+0x10/0x10 [snd_pcm]
[ 5141.981492] kernel:  ? __snd_pcm_lib_xfer+0x1a2/0x900 [snd_pcm]
[ 5141.981496] kernel:  ? __pfx_default_write_copy+0x10/0x10 [snd_pcm]
[ 5141.981500] kernel:  ? __pfx_default_wake_function+0x10/0x10
[ 5141.981503] kernel:  snd_pcm_common_ioctl+0xf7a/0x1600 [snd_pcm]
[ 5141.981507] kernel:  ? ioctl_has_perm.constprop.0.isra.0+0xba/0x110
[ 5141.981511] kernel:  snd_pcm_ioctl+0x27/0x40 [snd_pcm]
[ 5141.981515] kernel:  __x64_sys_ioctl+0x8e/0xd0
[ 5141.981517] kernel:  do_syscall_64+0x9e/0x1a0
[ 5141.981519] kernel:  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 5141.981520] kernel: RIP: 0033:0x7f42f861794f
[ 5141.981521] kernel: RSP: 002b:00007ffe8f159ed0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 5141.981522] kernel: RAX: ffffffffffffffda RBX: 000055b4ddb26010 RCX: 00007f42f861794f
[ 5141.981523] kernel: RDX: 00007ffe8f159f30 RSI: 0000000040184150 RDI: 0000000000000004
[ 5141.981524] kernel: RBP: 000055b4ddb25f90 R08: 0000000000000000 R09: 000055b4ddb288c0
[ 5141.981524] kernel: R10: 000055b4ddb2e000 R11: 0000000000000246 R12: 0000000000000000
[ 5141.981525] kernel: R13: 000055b4ddb288c0 R14: 7fffffffffffffff R15: 0000000000000010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working LNL Applies to Lunar Lake platform
Projects
None yet
Development

No branches or pull requests

6 participants