Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: kernel call trace error when loading eq topology #203

Closed
keqiaozhang opened this issue Oct 23, 2018 · 4 comments
Closed

Regression: kernel call trace error when loading eq topology #203

keqiaozhang opened this issue Oct 23, 2018 · 4 comments
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features

Comments

@keqiaozhang
Copy link
Collaborator

This is a kernel regression issue. Tested with sof-apl-eq-pcm512x.tplg, there're lot of kernel call trace log in dmesg.

Call trace:
[ 8.555639] sof-audio sof-audio: error: ipc error for 0x50030000 size 0x180
[ 8.555648] sof-audio sof-audio: error: failed to set control -1107147773 values
[ 8.555651] sof-audio sof-audio: error: failed kcontrol value set for widget: 3
[ 8.555653] sof-audio sof-audio: error: failed to restore pipeline after resume -22
[ 8.573460] pcm512x i2c-104C5122:00: No SCLK, using BCLK: -2
[ 8.576238] sof-audio sof-audio: pcm: open stream 0 dir 0
[ 8.576242] sof-audio sof-audio: error: pcm open failed to resume -16
[ 8.576247] sof-audio sof-audio: ASoC: can't open component sof-audio: -16
[ 8.576251] sof-audio sof-audio: pcm: close stream 0 dir 0
[ 8.576264] BUG: unable to handle kernel NULL pointer dereference at 0000000000000074
[ 8.576266] PGD 0 P4D 0
[ 8.576271] Oops: 0000 [#1] SMP NOPTI
[ 8.576274] CPU: 0 PID: 804 Comm: pulseaudio Not tainted 4.18.0+ #1
[ 8.576276] Hardware name: AAEON UP-APL01/UP-APL01, BIOS UPA1AM36 04/10/2018
[ 8.576284] RIP: 0010:hda_dsp_pcm_close+0x1b/0x80 [snd_sof_intel_hda_common]
[ 8.576285] Code: 92 c0 e8 58 0d ca f3 b8 ed ff ff ff eb df 90 0f 1f 44 00 00 55 53 48 89 f3 48 8b 86 10 01 00 00 48 89 fd 48 8b 80 50 01 00 00 <0f> b6 70 8b 43 3c 85 c0 75 31 e8 d5 d8 ff ff 85 c0 74 31 0f 1f
[ 8.576321] RSP: 0018:ffffa73c82133a68 EFLAGS: 00010286
[ 8.576323] RAX: 0000000000000000 RBX: ffff9a0b6f930800 RCX: 0000000000000006
[ 8.576349] RDX: 0000000000000000 RSI: ffff9a0b6f930800 RDI: ffff9a0b77ebc818
[ 8.576351] RBP: ffff9a0b77ebc818 R08: 000000000000043e R09: 0000000000000004
[ 8.576353] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9a0b6f930800
[ 8.576355] R13: 00000000fffffff0 R14: ffff9a0b763e9b80 R15: ffff9a0b763cdea8
[ 8.576358] FS: 00007f7a48de4500(0000) GS:ffff9a0b7fc00000(0000) knlGS:0000000000000000
[ 8.576360] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.576362] CR2: 0000000000000074 CR3: 0000000178528000 CR4: 00000000003406f0
[ 8.576364] Call Trace:
[ 8.576411] sof_pcm_close+0x89/0xe0 [snd_sof]
[ 8.576427] soc_pcm_open+0x497/0x970 [snd_soc_core]
[ 8.576439] dpcm_fe_dai_open+0x10f/0x760 [snd_soc_core]
[ 8.576449] snd_pcm_open_substream+0x7f/0x140 [snd_pcm]
[ 8.576457] snd_pcm_open+0xb7/0x230 [snd_pcm]
[ 8.576464] ? wake_up_q+0x70/0x70
[ 8.576470] snd_pcm_playback_open+0x3d/0x70 [snd_pcm]
[ 8.576476] chrdev_open+0xc4/0x1a0
[ 8.576480] ? cdev_put.part.3+0x20/0x20
[ 8.576483] do_dentry_open+0x1b3/0x2f0
[ 8.576486] path_openat+0x675/0x16b0
[ 8.576489] ? filename_lookup+0xf2/0x190
[ 8.576492] do_filp_open+0x9b/0x110
[ 8.576500] ? snd_card_file_remove+0x97/0x170 [snd]
[ 8.576503] ? do_sys_open+0x1bd/0x250
[ 8.576506] do_sys_open+0x1bd/0x250
[ 8.576510] do_syscall_64+0x55/0x110
[ 8.576516] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 8.576520] RIP: 0033:0x7f7a46eddc8e
[ 8.576521] Code: 25 00 00 41 00 3d 00 00 41 00 74 48 48 8d 05 81 0c 2e 00 8b 00 85 c0 75 69 89 f2 b8 01 01 00 00 48 89 fe bf 9c ff ff ff 0f 05 <48> 3d 00 ff ff 0f 87 a6 00 00 00 48 8b 4c 24 28 64 48 33 0c 25
[ 8.576556] RSP: 002b:00007ffefe0be960 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 8.576559] RAX: ffffffffffffffda RBX: 00007ffefe0be9f0 RCX: 00007f7a46eddc8e
[ 8.576561] RDX: 0000000000080802 RSI: 00007ffefe0beb10 RDI: 00000000ffffff9c
[ 8.576562] RBP: 00000000ffffffff R08: 0000205000000000 R09: 0000120000000000
[ 8.576564] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 8.576566] R13: 0000000000000004 R14: 00007ffefe0beb10 R15: 00007ffefe0bee10
[ 8.576568] Modules linked in: snd_soc_sst_bxt_pcm512x nls_iso8859_1 snd_soc_hdac_hdmi spi_pxa2xx_platform 8250_dw intel_rapl sof_pci_dev intel_telemetry_prv intel_pmc_ipc snd_sof_intel_hda_common intel_punit_ipc intel_telemetry_core x86_pkg_temp_thermal coretemp snd_soc_hdac_hda kvm_intel snd_hda_codec snd_sof_el_hda snd_sof_intel_byt kvm snd_sof_nocodec snd_sof snd_sof_xtensa_dsp irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_soc_acpi_intel_match ssoc_acpi pcbc snd_hda_ext_core snd_soc_pcm512x_i2c snd_soc_wm8804_i2c snd_hda_core snd_soc_pcm512x snd_soc_wm8804 aesni_intel snd_soc_core aes_x86_64 crypto_s snd_compress cryptd glue_helper snd_pcm_dmaengine ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi ax88179_178a usbnet lpc_ich pwm_lpss_pci pwm_l input_leds
[ 8.576611] snd_seq idma64 virt_dma snd_seq_device intel_lpss_pci intel_lpss snd_timer snd soundcore mei_me mac_hid mei squashfs sch_fq_codel parport_pc pv lp parport ip_tables x_tables autofs4 hid_generic usbhid hid mmc_block dwc3 udc_core ulpi sdhci_pci cqhci i915 sdhci r8169 mii dwc3_pci i2c_algo_bit drm_kmslper syscopyarea sysfillrect sysimgblt pinctrl_broxton fb_sys_fops video pinctrl_intel drm ahci libahci
[ 8.576652] CR2: 0000000000000074
[ 8.576655] ---[ end trace 4f44d88b60412c06 ]---
[ 8.576660] RIP: 0010:hda_dsp_pcm_close+0x1b/0x80 [snd_sof_intel_hda_common]
[ 8.576661] Code: 92 c0 e8 58 0d ca f3 b8 ed ff ff ff eb df 90 0f 1f 44 00 00 55 53 48 89 f3 48 8b 86 10 01 00 00 48 89 fd 48 8b 80 50 01 00 00 <0f> b6 70 8b 43 3c 85 c0 75 31 e8 d5 d8 ff ff 85 c0 74 31 0f 1f
[ 8.576696] RSP: 0018:ffffa73c82133a68 EFLAGS: 00010286
[ 8.576698] RAX: 0000000000000000 RBX: ffff9a0b6f930800 RCX: 0000000000000006
[ 8.576699] RDX: 0000000000000000 RSI: ffff9a0b6f930800 RDI: ffff9a0b77ebc818
[ 8.576701] RBP: ffff9a0b77ebc818 R08: 000000000000043e R09: 0000000000000004
[ 8.576703] R10: 0000000000000000 R11: 0000000000000001 R12: ffff9a0b6f930800
[ 8.576705] R13: 00000000fffffff0 R14: ffff9a0b763e9b80 R15: ffff9a0b763cdea8
[ 8.576707] FS: 00007f7a48de4500(0000) GS:ffff9a0b7fc00000(0000) knlGS:0000000000000000
[ 8.576709] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8.576710] CR2: 0000000000000074 CR3: 0000000178528000 CR4: 00000000003406f0
[ 8.881248] ax88179_178a 2-4:1.0 enx000ec6dc813d: ax88179 - Link status is: 1
[ 8.894228] IPv6: ADDRCONF(NETDEV_CHANGE): enx000ec6dc813d: link becomes ready
[ 9.553365] sof-audio sof-audio: ipc rx: 0x90020000: GLB_TRACE_MSG
[ 9.553395] sof-audio sof-audio: ipc rx done: 0x90020000: GLB_TRACE_MSG

Test configure:
sof-dev:7ef809d
sof-master:6baff99f
soft-master:a1dfcb7f4
topology:sof-apl-eq-pcm512x.tplg
eq-dmesg.log

@keqiaozhang keqiaozhang added bug Something isn't working P1 Blocker bugs or important features APL Applies to ApolloLake platform br-sof-dev labels Oct 23, 2018
@keqiaozhang
Copy link
Collaborator Author

The first bad commit is between bf67a28 to 93853cb. we will bisect it.

@singalsu
Copy link

It seems that this key EQ related SOF PR has not been merged thesofproject/sof#483 . I wonder if it impacts. Though the kernel should not crash.

@keqiaozhang
Copy link
Collaborator Author

@singalsu
This issue is gone after applying PR483 and eq topology can be loaded.

@mengdonglin
Copy link
Collaborator

@keqiaozhang I closed this issue based on your comments. You can directly close it after your verified it works.

plbossart pushed a commit that referenced this issue Mar 22, 2021
…before setting skb ownership

There are two ref count variables controlling the free()ing of a socket:
- struct sock::sk_refcnt - which is changed by sock_hold()/sock_put()
- struct sock::sk_wmem_alloc - which accounts the memory allocated by
  the skbs in the send path.

In case there are still TX skbs on the fly and the socket() is closed,
the struct sock::sk_refcnt reaches 0. In the TX-path the CAN stack
clones an "echo" skb, calls sock_hold() on the original socket and
references it. This produces the following back trace:

| WARNING: CPU: 0 PID: 280 at lib/refcount.c:25 refcount_warn_saturate+0x114/0x134
| refcount_t: addition on 0; use-after-free.
| Modules linked in: coda_vpu(E) v4l2_jpeg(E) videobuf2_vmalloc(E) imx_vdoa(E)
| CPU: 0 PID: 280 Comm: test_can.sh Tainted: G            E     5.11.0-04577-gf8ff6603c617 #203
| Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
| Backtrace:
| [<80bafea4>] (dump_backtrace) from [<80bb0280>] (show_stack+0x20/0x24) r7:00000000 r6:600f0113 r5:00000000 r4:81441220
| [<80bb0260>] (show_stack) from [<80bb593c>] (dump_stack+0xa0/0xc8)
| [<80bb589c>] (dump_stack) from [<8012b268>] (__warn+0xd4/0x114) r9:00000019 r8:80f4a8c2 r7:83e4150c r6:00000000 r5:00000009 r4:80528f90
| [<8012b194>] (__warn) from [<80bb09c4>] (warn_slowpath_fmt+0x88/0xc8) r9:83f26400 r8:80f4a8d1 r7:00000009 r6:80528f90 r5:00000019 r4:80f4a8c2
| [<80bb0940>] (warn_slowpath_fmt) from [<80528f90>] (refcount_warn_saturate+0x114/0x134) r8:00000000 r7:00000000 r6:82b44000 r5:834e5600 r4:83f4d540
| [<80528e7c>] (refcount_warn_saturate) from [<8079a4c8>] (__refcount_add.constprop.0+0x4c/0x50)
| [<8079a47c>] (__refcount_add.constprop.0) from [<8079a57c>] (can_put_echo_skb+0xb0/0x13c)
| [<8079a4cc>] (can_put_echo_skb) from [<8079ba98>] (flexcan_start_xmit+0x1c4/0x230) r9:00000010 r8:83f48610 r7:0fdc0000 r6:0c080000 r5:82b44000 r4:834e5600
| [<8079b8d4>] (flexcan_start_xmit) from [<80969078>] (netdev_start_xmit+0x44/0x70) r9:814c0ba0 r8:80c8790c r7:00000000 r6:834e5600 r5:82b44000 r4:82ab1f00
| [<80969034>] (netdev_start_xmit) from [<809725a4>] (dev_hard_start_xmit+0x19c/0x318) r9:814c0ba0 r8:00000000 r7:82ab1f00 r6:82b44000 r5:00000000 r4:834e5600
| [<80972408>] (dev_hard_start_xmit) from [<809c6584>] (sch_direct_xmit+0xcc/0x264) r10:834e5600 r9:00000000 r8:00000000 r7:82b44000 r6:82ab1f00 r5:834e5600 r4:83f27400
| [<809c64b8>] (sch_direct_xmit) from [<809c6c0c>] (__qdisc_run+0x4f0/0x534)

To fix this problem, only set skb ownership to sockets which have still
a ref count > 0.

Fixes: 0ae89be ("can: add destructor for self generated skbs")
Cc: Oliver Hartkopp <[email protected]>
Cc: Andre Naujoks <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Oleksij Rempel <[email protected]>
Reviewed-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features
Projects
None yet
Development

No branches or pull requests

3 participants