Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LNL HDA alsabat capture failed, "Peak freq too low" #9164

Open
marc-hb opened this issue May 23, 2024 · 47 comments
Open

LNL HDA alsabat capture failed, "Peak freq too low" #9164

marc-hb opened this issue May 23, 2024 · 47 comments
Assignees
Labels
bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection LNL Applies to Lunar Lake platform P2 Critical bugs or normal features
Milestone

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented May 23, 2024

Originally posted by @marc-hb in #9123 (comment)

https://sof-ci.01.org/sofpr/PR9151/build4789/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997 ( ba-lnlm-rvp-hda-02)

2024-05-22 10:38:13 UTC [REMOTE_COMMAND] alsabat -Phw:CODEC,0 --standalone -n 240000 -r 48000 -c 2 -f S16_LE -F 997 -k 2.1
2024-05-22 10:38:14 UTC [REMOTE_COMMAND] alsabat -Chw:sofhdadsp,0 -c 2 -r 48000 -f S16_LE -F 997 -k 2.1
 FAIL: Peak freq too low 973.39 Hz
 FAIL: Peak freq too low 974.85 Hz
 FAIL: Peak freq too low 977.05 Hz
 FAIL: Peak freq too low 978.52 Hz
 FAIL: Peak freq too low 979.98 Hz
 FAIL: Peak freq too low 982.18 Hz
 FAIL: Peak freq too low 983.64 Hz
 FAIL: Peak freq too low 987.30 Hz
 FAIL: Peak freq too low 988.77 Hz
alsa-utils version 1.2.6

Entering capture thread (ALSA).
Get period size: 3000  buffer size: 24000
Recording ...
Capture completed.

BAT analysis: signal has 65536 frames at 48000 Hz, 2 channels, 2 bytes per sample.

Channel 1 - Checking for target frequency 997.00 Hz
Amplitude: 9878.5; Percentage: [30]
Detected peak at 973.39 Hz of 17.57 dB
 Total 17.6 dB from 973.39 to 973.39 Hz
Detected peak at 974.85 Hz of 17.64 dB
 Total 20.6 dB from 974.85 to 974.85 Hz
Detected peak at 977.05 Hz of 17.74 dB
 Total 22.4 dB from 977.05 to 977.05 Hz
Detected peak at 978.52 Hz of 18.62 dB
 Total 23.9 dB from 978.52 to 978.52 Hz
Detected peak at 979.98 Hz of 18.92 dB
 Total 25.1 dB from 979.98 to 979.98 Hz
Detected peak at 982.18 Hz of 18.84 dB
 Total 26.7 dB from 981.45 to 982.18 Hz
Detected peak at 983.64 Hz of 19.88 dB
 Total 27.5 dB from 983.64 to 983.64 Hz
Detected peak at 987.30 Hz of 20.82 dB
 Total 29.8 dB from 985.11 to 987.30 Hz
Detected peak at 988.77 Hz of 22.17 dB
 Total 30.5 dB from 988.77 to 988.77 Hz
Detected peak at 996.83 Hz of 34.94 dB
 Total 41.0 dB from 990.23 to 1004.15 Hz
 PASS: Peak detected at target frequency
Detected at least 10 signal(s) in total

Return value is -1003
@marc-hb marc-hb added the bug Something isn't working as expected label May 23, 2024
@plbossart
Copy link
Member

@fredoh9 how many jack connectors do we have on the RVP? is the same 3.5mm jack connector for HDaudio and SoundWire reworks?

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 23, 2024

@fredoh9 how many jack connectors do we have on the RVP?

Only one jack. I don't remember any RVP with more than one jack.

is the same 3.5mm jack connector for HDaudio and SoundWire reworks?

I don't know. I looked at jf-lnlm-rvp-hda-1 and the USB loopback goes through the AIOC board, the RVP jack is not connected. On the other hand, this device is not enabled in CI right now :-( afraid we have to wait for @fredoh9 for this one.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 23, 2024

I just noticed someone disabled ba-lnlm-rvp-hda-02, on which the two failures were found...

@abonislawski abonislawski added the P2 Critical bugs or normal features label May 28, 2024
@fredoh9
Copy link
Contributor

fredoh9 commented May 28, 2024

@plbossart We have only one jack in the RVP, i don't remember any RVP with multiple JACK.

@marc-hb For LNLM_RVP_HDA, the RVP doesn't have HDA codec, so we attached external AIOC. Hence the jack for the HDA codec is in external AIOC is being used.

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 29, 2024

I just noticed someone disabled ba-lnlm-rvp-hda-02

From @ssavati : silicon being upgraded.

@ssavati
Copy link

ssavati commented May 29, 2024

@marc-hb now ba-lnlm-rvp-hda-02 board is up with silicon upgrade. Its uses community firmware

@lgirdwood lgirdwood added this to the v2.10 milestone Jun 10, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 13, 2024

Very recent one today:
https://sof-ci.01.org/softestpr/PR1180/build512/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-playback-821

Underrun: Broken pipe(-32)
 FAIL: Peak freq too low 799.07 Hz
 FAIL: Peak freq too low 802.73 Hz
 FAIL: Peak freq too low 806.40 Hz
 FAIL: Peak freq too low 807.86 Hz
 FAIL: Peak freq too high 834.96 Hz
 FAIL: Peak freq too high 838.62 Hz
 FAIL: Peak freq too high 842.29 Hz

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 13, 2024

Got one reproduction of this, had to run test 200+ times to hit the problem, so occurence rate would seem to be below 1%. In the one failing case, captured bat.wav looks good but analysis fails, so not clear yet what is happening, but the error looks similar as in original report. If you can hit @jsarha please add data. @marc-hb is this roughly in line with occurence rate you've seen?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 13, 2024

I would guess the reproduction rate is somewhere between 0.1% and 5% so, yes: 1% falls in that range :-)

@jsarha
Copy link
Contributor

jsarha commented Jun 13, 2024

Did 249 runs more on ba-lnlm-rvp-hda-02, no failures. I'll pick a machine tomorrow again, run couple of hours more. It bothers me a bit that there is an obvious glitch in the audio in the both referred occurrences, but they are not at all similar. Its very hard to imagine a common cause for both of them.

@jsarha
Copy link
Contributor

jsarha commented Jun 14, 2024

occurrence_2024-06-14-07.zip

Here is one more occurrence, very much like the one in #9164 (comment) . Gap af around 0.2s seconds (one bit longer another shorter) in the middle of the capture, then a little glitch ~ 0.015s before the signal starts to come back in a ramp.

@jsarha
Copy link
Contributor

jsarha commented Jun 14, 2024

That was bit under 300 test runs this morning, and one occurrence.

@jsarha
Copy link
Contributor

jsarha commented Jun 14, 2024

occurrence_2024-06-14-10.zip
occurrence_2024-06-14-11-25.zip
occurrence_2024-06-14-11-46.zip

Here is three more occurrences, but I do not think its the same issue. They look more like test setup failures to me. The bat.wav files look perfectly Ok, but for some reason the validation fails.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 14, 2024

The bat.wav files look perfectly Ok, but for some reason the validation fails.

I'm not familiar with alsabat but I've been told that it is much more sensitive than the human eye or even ear.

@jsarha
Copy link
Contributor

jsarha commented Jun 17, 2024

The bat.wav files look perfectly Ok, but for some reason the validation fails.

I'm not familiar with alsabat but I've been told that it is much more sensitive than the human eye or even ear.

I did not try to analyze it with my ears, but did frequency analysis with Audacity, and the frequency peak was there exactly in the right place, with no other local peaks.

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 17, 2024

There's a small glitch in occurrence_2024-06-14-11-46.zip around 0.6740, which explains the alsabat fail.

@jsarha
Copy link
Contributor

jsarha commented Jun 17, 2024

occurrence-2024-06-17-11-04.zip
occurrence-gap-2024-06-17-12-02.zip

Two more cases. One without any immediately obvious fault, but probably some subtle discontinuation in the sine-wave somewhere. The other has the obvious gap-pattern. The gap is a bit wider this time, about 700ms.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 18, 2024

There's a small glitch in occurrence_2024-06-14-11-46.zip around 0.6740, which explains the alsabat fail.

Can you please summarize how you found (with Audacity?) what @jsarha didn't?

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 18, 2024

Can you please summarize how you found (with Audacity?) what @jsarha didn't?

I think the same tools can be used. I missed this as well at first as expectation was a big visible gap or a repeating glitch pattern. I noticed this when listening to the file and noted the glitch. Then freq analysis in Audacity in small segments to limit the search space further and final bits by manual analysis of sample values (zooming into waveform display and/or exporting sample data values to text file) to find the exact point.

@jsarha
Copy link
Contributor

jsarha commented Jun 20, 2024

Not sure how much light this sheds the issue, but I first run 673 successful round of alsabat test [1] using sof-hda-benchmark-gain32.tplg, and then quit the test script without a single failure. Then I restored the original sof-hda-generic-ace1-4ch.tplg and was able to run 263 rounds when I hit the error. BUT, the error I hit is of a completely new class. The signal is simply cut off in the middle of sample, and it does not resume. This was all with 3da8e64 FW commit. The test logs and the failed test-case is in the attached zip.

testlogs-and-failed-testcase.zip

[1] TPLG=/lib/firmware/intel/sof-ipc4-tplg/sof-hda-generic-ace1-4ch.tplg MODEL=LNLM_RVP_HDA SOF_TEST_INTERVAL=5 ~/sof-test/test-case/check-alsabat.sh -p hw:sofhdadsp,0 -c hw:CODEC,0 -C 2 -F 821

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 20, 2024

@jsarha
Copy link
Contributor

jsarha commented Jun 24, 2024

June 20th: https://sof-ci.01.org/softestpr/PR1180/build552/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-capture-997

Oh, this is yet a new type of failure. There is a gap of only ~6ms and - bit suspiciously - the signal continues from exactly the same phase after the gap. Buffer under-run somewhere?

There is couple of FW log messages like these in the middle of the test log (so not at the setup or tear-down time):

[  771.127966] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x10004 no bytes to copy, available samples: 0, free_samples: 384
[  771.127991] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_zephyr_copy(): nothing to copy
[  771.128913] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x10004 no bytes to copy, available samples: 0, free_samples: 384

@jsarha
Copy link
Contributor

jsarha commented Jun 24, 2024

I run today another 680 round of alsabat test with sof-hda-benchmark-gain32.tplg to be sure that I just did not get lucky last friday. E.g. run again this test:

#9164 (comment)

To complete this test, I ran another 230 rounds with the same daily build (20240623/sof-28a5265568a8-1) this time with standard generic-ace1-4ch topology, to hit the "phase shift"-error again [1]. Starts to look like the issue does not show with gain widget only. I'll try some other benchmark topologies next, to see if I can find the problematic widget that way.

[1]
generic-ace1-4ch-2024-06-24-22.zip

@lgirdwood
Copy link
Member

lgirdwood commented Jun 25, 2024

I hacked alsa-utils alsabat to allow 100M frames in the test instead of just 10M and increased MAX_PEAK also 10 fold to 100. Then I needed also fix sof-test check-alsabat.sh to pass -n parameter also to capture part of the test. This allowed me to run bit over 36min alsabat test, which was successful. I do not have full understanding of alsabat code or how my hacks may affect it, so the result should be taken with a grain of salt, but in any case the test log is here:

alsabat-test-36mis.txt

Thanks @jsarha - results look good to me. We should upstream the alsa-bat update.

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 25, 2024

One case today in PR testing, see Intel test run 43099. This has a 3000 sample (one host period?) gap like #9164 (comment) but in this case, audio is corrupt after the gap (dominant 599Hz tone present but multiple other tones). The left-right signals show different values.

@jsarha
Copy link
Contributor

jsarha commented Jun 25, 2024

It took 724 rounds but then I got a glitch with sof-hda-benchmark-drc32.tplg. Not sure if this is directly pointing at drc widget, but anyway here is the eveidence:

benchmark-drc32-2024-06-25-11.zip

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 28, 2024

Reproduction rate low in past week, continue with P2 and assign to v2.11.

@kv2019i kv2019i modified the milestones: v2.10, v2.11 Jun 28, 2024
@kv2019i kv2019i added HDA Applies to HD-Audio bus for codec connection LNL Applies to Lunar Lake platform labels Jun 28, 2024
@jsarha
Copy link
Contributor

jsarha commented Jul 1, 2024

I became suspicious about the last weeks findings and decided to try them again with enough cycles to know with reasonable certainty in what configurations the issue happens and in what it does not. So I run the test again using following configurationns:

  • benchmark-gain32 topology, issue occurred after 583 cycles [1]
  • DSPless mode, issue occurred after 852 cycles [2]
  • nocodec, playback and capture from hw:0,0, no issue found after 1214 cycles [3]

This starts to point now either to the HDA interface or to the USB audio device that is used for capture in DUT. It would be nice to be able to test this with a loop-back cable from RVP line-out to line-in.

[1] alsabat-benchmark-gain32.zip
[2] alsabat-DSPless.zip
[3] alsabat-nocodec-log.txt

@plbossart
Copy link
Member

@jsarha when I look at the captured waveform for DSPless I don't see anything suspicious, there's one nice sinewave with a 0.08 FS value?

@marc-hb

This comment was marked as off-topic.

@jsarha
Copy link
Contributor

jsarha commented Jul 2, 2024

@jsarha when I look at the captured waveform for DSPless I don't see anything suspicious, there's one nice sinewave with a 0.08 FS value?

@plbossart there is an obvious glitch at 1.313s:
Screenshot at 2024-07-02 17-17-33

@plbossart
Copy link
Member

@jsarha indeed this looks like a 0.5ms loss of data

Screenshot from 2024-07-03 10-55-05

@jsarha
Copy link
Contributor

jsarha commented Jul 4, 2024

To pull together all findings the phase shift glitch is seen on all these configurations, but one. The reproduction rate varies between 1/200 and 1/1000. The nocodec test was running more than 1200 cycles (about 8 hours) without a glitch.

Topology			Playback	Capture	Glitch seen		
sof-hda-generic-ace1-4ch.tplg	sofhdadsp,0	USB	Yes
sof-hda-benchmark-gain32.tplg	sofhdadsp,0	USB	Yes
sof-hda-benchmark-drc32.tplg	sofhdadsp,0	USB	Yes
DSPLess mode			sofhdadsp,0	USB	Yes
sof-lnl-nocodec.tplg			0,0	0,0	No

One common thing with all failed setups is the USB audio device doing the capture. We should still do one more test with analog loop-back cable connected back to SOF HDA audio device to rule out a USB audio failure.

The other failure type is the long gap (hundreds of milliseconds) and a ramp when the test signal resumes. My theory there is that the test script, that starts the playback and capture parts asynchronously, somehow fails with the timings, and the capture catches the gap between the tests.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 9, 2024

@kv2019i
Copy link
Collaborator

kv2019i commented Jul 10, 2024

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 17, 2024

@wszypelt
Copy link

wszypelt commented Jul 19, 2024

@marc-hb Is it possible to get logs from IPC payloads?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 20, 2024

IPC logs should be enabled on LNL. Where are they missing?

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 23, 2024

@wszypelt
Copy link

@marc-hb these logs will be good, thanks!

@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 14, 2024

@lgirdwood
Copy link
Member

Still failing regularly: https://sof-ci.01.org/sofpr/PR9351/build6997/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-alsabat-headset-playback-821

We can stress this after upgrading to new loopback dongles.

@wszypelt
Copy link

@lgirdwood do you know when we can expect the upgrade?

@lgirdwood
Copy link
Member

@lgirdwood do you know when we can expect the upgrade?

Probably Q4.

@kv2019i
Copy link
Collaborator

kv2019i commented Sep 9, 2024

Keeping this open although likely same rootcause as #9449

Let's see another week of daily plan results and conclude then.

@kv2019i kv2019i modified the milestones: v2.11, v2.12 Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected HDA Applies to HD-Audio bus for codec connection LNL Applies to Lunar Lake platform P2 Critical bugs or normal features
Projects
None yet
Development

No branches or pull requests

9 participants