Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ath11k kernel panic on Pi 5 with 8GB RAM, but not on 2GB (DMA/PCI-E kernel panic) #6424

Open
omerk opened this issue Oct 17, 2024 · 12 comments

Comments

@omerk
Copy link

omerk commented Oct 17, 2024

Describe the bug

ath11k kernel module works as expected on a Raspberry Pi 5 board with 2 GB RAM but the same image (same boot media) fails with a DMA/memory related kernel panic on the 8GB unit.

Limiting the memory of the 8GB unit to 2GB (mem=2G in cmdline.txt) fixes the issue. Detailed logs below.

Test setup consists of two units:

Unit 1: Raspberry Pi 5, 2GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)
Unit 2: Raspberry Pi 5, 8GB RAM, Official M.2 Hat, QCN9074 WiFi module (PCI-E)

WiFi modules used are identical brand/model and from the same batch, boot media is shared across the two units to ensure there are no config-related issues.

Not entirely sure if this is specifically an issue with the ath11k driver, as it is seems to work on other platforms, perhaps this is a BCM2712 DMA / PCI-E restriction? Speculating, of course, thanks in advance for your assistance.

Steps to reproduce the behaviour

(On a fresh device, with no WiFi configuration)

  • Compile, install and boot up the custom kernel
  • Use nmcli to connect to a WiFi network
  • kernel panic

(With a valid WiFi configuration set up)

  • Boot device with custom kernel
  • kernel panic

Device (s)

Raspberry Pi 5

System

Kernel version:

$ git rev-parse HEAD
84ab77459e61c648299d32464127b89ca65de40a

$ uname -a
Linux raspberrypi 6.6.56-v8-16k-x+ #1 SMP PREEMPT Thu Oct 17 13:34:10 BST 2024 aarch64 GNU/Linux

.config used to compile the kernel, which is essentially the standard 2712 config with ath11k enabled, attached: kernel-config.zip

config.txt used on device:

dtoverlay=disable-wifi
dtoverlay=disable-bt

# For QCN9074
dtparam=pciex1
dtparam=pciex1_gen=3

# Force PCIe config to support 32bit DMA addresses at the expense of having to bounce buffers.
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3597
dtoverlay=pcie-32bit-dma

# Compatibility features
# https://github.com/raspberrypi/firmware/blob/b154632e320b87ea95c6ce8b59f96dbbe523ecf1/boot/overlays/README#L3611
# no-mip: Use if a) more than 8 interrupt vectors are required or b) the EP requires DMA and MSI addresses to be 32bit.
dtoverlay=pciex1-compat-pi5,no-mip

# Uncomment some or all of these to enable the optional hardware interfaces
#dtparam=i2c_arm=on
#dtparam=i2s=on
#dtparam=spi=on

# Enable audio (loads snd_bcm2835)
dtparam=audio=on

# Additional overlays and parameters are documented
# /boot/firmware/overlays/README

# Automatically load overlays for detected cameras
camera_auto_detect=1

# Automatically load overlays for detected DSI displays
display_auto_detect=1

# Automatically load initramfs files, if found
auto_initramfs=1

# Enable DRM VC4 V3D driver
dtoverlay=vc4-kms-v3d
max_framebuffers=2

# Don't have the firmware create an initial video= setting in cmdline.txt.
# Use the kernel's default instead.
disable_fw_kms_setup=1

# Run in 64-bit mode
arm_64bit=1

# Disable compensation for displays with overscan
disable_overscan=1

# Run as fast as firmware / board allows
arm_boost=1

[cm4]
# Enable host mode on the 2711 built-in XHCI USB controller.
# This line should be removed if the legacy DWC2 controller is required
# (e.g. for USB device mode) or if USB support is not required.
otg_mode=1

[cm5]
dtoverlay=dwc2,dr_mode=host

Logs

Working kit (Unit with 2GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            2009         254        1638           5         172        1754
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

ath11k is loaded on boot:

$ dmesg | grep ath11k
[    6.801102] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    6.801137] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    6.820708] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    6.820724] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.329153] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.329165] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

WiFi networks are listed:

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        5A:09:D4:FA:34:89  BTWi-fi         Infra  40    405 Mbit/s  44      ▂▄_>
        5A:09:D4:FA:34:8A  BTWifi-X        Infra  40    405 Mbit/s  40      ▂▄_>
        4C:09:D4:FA:34:88  BTHub5-CMCS     Infra  40    405 Mbit/s  37      ▂▄_>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  27      ▂__>
...

nmcli used to connect to WiFi network:

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '7a1e9176-f639-4ccf-8b19-c656fc9a1150'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:83:eb:b8 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:a2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.194/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86377sec preferred_lft 86377sec
    inet6 fe80::a7c7:324f:bc91:522a/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (172.67.154.53) 56(84) bytes of data.
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=1 ttl=58 time=9.50 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=2 ttl=58 time=12.3 ms
64 bytes from 172.67.154.53 (172.67.154.53): icmp_seq=3 ttl=58 time=13.5 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 9.502/11.780/13.507/1.680 ms

Works as expected, no issue to report.

Non-working kit (Unit with 8GB RAM)

$ cat /proc/cpuinfo | grep "Model"
Model           : Raspberry Pi 5 Model B Rev 1.0

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            8052         300        7691           5         168        7752
Swap:            199           0         199

$ vcgencmd get_mem arm && vcgencmd get_mem gpu
arm=1020M
gpu=4M

$ dmesg | grep ath11k
[    7.140417] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.140444] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.140717] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.140728] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.590439] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.590449] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID                      MODE   CHAN  RATE        S>
        4C:09:D4:FA:34:88  BTHub5-CMCS               Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:8A  BTWifi-X                  Infra  40    405 Mbit/s  3>
        5A:09:D4:FA:34:89  BTWi-fi                   Infra  40    405 Mbit/s  3>
        EC:6C:9A:4A:61:54  BT-JWAKQR                 Infra  40    540 Mbit/s  2>
        62:6C:9A:4A:61:56  EE WiFi-X                 Infra  40    540 Mbit/s  2>
...

Trying to connect to a WiFi network results in a kernel panic:

$ sudo nmcli dev wifi connect <ap> password <password>
[  123.832476] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  123.841313] Mem abort info:
[  123.844114]   ESR = 0x0000000096000145
[  123.847909]   EC = 0x25: DABT (current EL), IL = 32 bits
[  123.853243]   SET = 0, FnV = 0
[  123.856304]   EA = 0, S1PTW = 0
[  123.859452]   FSC = 0x05: level 1 translation fault
[  123.864348] Data abort info:
[  123.867234]   ISV = 0, ISS = 0x00000145, ISS2 = 0x00000000
[  123.872742]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[  123.877811]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  123.883141] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000101bcc000
[  123.889694] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[  123.898432] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[  123.904722] Modules linked in: michael_mic qrtr_mhi binfmt_misc qrtr ath11k_pci mhi ath11k qmi_helpers spidev mac80211 vc4 snd_soc_hdmi_codec drm_display_helper libarc4 cec cfg80211 drm_dma_helper sg drm_kms_helper snd_soc_core rpivid_hevc(C) aes_ce_blk pisp_be v4l2_mem2mem aes_ce_cipher snd_compress ghash_ce videobuf2_dma_contig gf128mul snd_pcm_dmaengine libaes rfkill videobuf2_memops snd_pcm videobuf2_v4l2 sha2_ce sha256_arm64 sha1_ce videodev snd_timer raspberrypi_hwmon videobuf2_common snd mc v3d i2c_brcmstb gpio_keys spi_bcm2835 gpu_sched raspberrypi_gpiomem pwm_fan rp1_adc drm_shmem_helper nvmem_rmem uio_pdrv_genirq uio drm fuse drm_panel_orientation_quirks backlight dm_mod ip_tables x_tables ipv6
[  123.967462] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C         6.6.56-v8-16k-x+ #1
[  123.976108] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
[  123.981960] pstate: 00400009 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  123.988947] pc : dcache_inval_poc+0x28/0x58
[  123.993145] lr : arch_sync_dma_for_cpu+0x34/0x50
[  123.997776] sp : ffffc00080003c40
[  124.001095] x29: ffffc00080003c40 x28: ffff80010162c860 x27: ffffc00080003eb8
[  124.008257] x26: ffffc00080003ce4 x25: 0000000000000000 x24: 0000000000000005
[  124.015419] x23: 00000000000025f0 x22: 0000000000000040 x21: 0000000000000002
[  124.022581] x20: ffff800100fab0c0 x19: ffffffffffffffff x18: 0000000000000000
[  124.029743] x17: ffffb0017a7b8000 x16: ffffd000841375c8 x15: 00005555fa586b70
[  124.036905] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[  124.044067] x11: 00000000000000cf x10: 00000000000000c8 x9 : ffffd000841376c0
[  124.051229] x8 : ffffc00080003d38 x7 : 0000000000000000 x6 : 0000000000000000
[  124.058390] x5 : 00000001040c0000 x4 : ffff800104cc6820 x3 : 000000000000003f
[  124.065552] x2 : 0000000000000040 x1 : 0000000000000000 x0 : ffffffffffffffff
[  124.072714] Call trace:
[  124.075161]  dcache_inval_poc+0x28/0x58
[  124.079006]  dma_sync_single_for_cpu+0xf8/0x128
[  124.083549]  ath11k_hal_srng_prefetch_desc+0x6c/0xa0 [ath11k]
[  124.089341]  ath11k_hal_srng_access_begin+0x44/0x58 [ath11k]
[  124.095038]  ath11k_dp_process_rx+0xd0/0x3b8 [ath11k]
[  124.100124]  ath11k_dp_service_srng+0x32c/0x360 [ath11k]
[  124.105471]  ath11k_pcic_ext_grp_napi_poll+0x3c/0xd8 [ath11k]
[  124.111254]  __napi_poll+0x40/0x208
[  124.114751]  net_rx_action+0x2e0/0x338
[  124.118508]  handle_softirqs+0x118/0x360
[  124.122440]  __do_softirq+0x1c/0x28
[  124.125935]  ____do_softirq+0x18/0x30
[  124.129605]  call_on_irq_stack+0x24/0x58
[  124.133536]  do_softirq_own_stack+0x24/0x38
[  124.137730]  irq_exit_rcu+0x8c/0xd0
[  124.141225]  el1_interrupt+0x38/0x68
[  124.144810]  el1h_64_irq_handler+0x18/0x28
[  124.148917]  el1h_64_irq+0x64/0x68
[  124.152325]  default_idle_call+0x5c/0x170
[  124.156344]  do_idle+0x204/0x238
[  124.159579]  cpu_startup_entry+0x40/0x50
[  124.163512]  rest_init+0xec/0xf8
[  124.166745]  arch_call_rest_init+0x18/0x20
[  124.170853]  start_kernel+0x528/0x690
[  124.174523]  __primary_switched+0xbc/0xd0
[  124.178544] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21)
[  124.184658] ---[ end trace 0000000000000000 ]---
[  124.189287] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[  124.196186] SMP: stopping secondary CPUs
[  124.200118] Kernel Offset: 0x100004000000 from 0xffffc00080000000
[  124.206231] PHYS_OFFSET: 0x0
[  124.209114] CPU features: 0x1,00000001,70028143,0000720b
[  124.214442] Memory Limit: none
[  124.217501] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Non-working 8GB unit made to work with mem=2G in cmdline.txt

$ cat /boot/firmware/cmdline.txt
console=serial0,115200 console=tty1 root=PARTUUID=8c5b1cb2-02 rootfstype=ext4 fsck.repair=yes mem=2G rootwait

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            1947         250        1582           5         171        1697
Swap:            199           0         199

$ dmesg | grep ath11k
[    7.862557] ath11k_pci 0000:01:00.0: BAR 0: assigned [mem 0x1b80000000-0x1b801fffff 64bit]
[    7.862603] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.863780] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.863795] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    8.310542] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    8.310551] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1

$ nmcli dev wifi list
IN-USE  BSSID              SSID            MODE   CHAN  RATE        SIGNAL  BAR>
        EC:6C:9A:4A:61:54  BT-JWAKQR       Infra  40    540 Mbit/s  29      ▂__>
        62:6C:9A:4A:61:56  EE WiFi-X       Infra  40    540 Mbit/s  25      ▂__>
        62:6C:9A:4A:61:55  EE WiFi         Infra  40    540 Mbit/s  25      ▂__>
<...>

$ sudo nmcli dev wifi connect <ap> password <password>
Device 'wlan0' successfully activated with '6ca93d62-f17e-4580-aaa4-f1dbe64a902b'.

$ ip -c a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host noprefixroute
       valid_lft forever preferred_lft forever
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 2c:cf:67:67:8d:23 brd ff:ff:ff:ff:ff:ff
3: wlan0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether c4:93:00:3a:34:99 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.185/24 brd 192.168.1.255 scope global dynamic noprefixroute wlan0
       valid_lft 86384sec preferred_lft 86384sec
    inet6 fe80::5a4e:f962:55b2:ca18/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

$ ping raspberrypi.com
PING raspberrypi.com (104.21.88.234) 56(84) bytes of data.
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=1 ttl=58 time=7.68 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=2 ttl=58 time=7.94 ms
64 bytes from 104.21.88.234 (104.21.88.234): icmp_seq=3 ttl=58 time=8.53 ms
^C
--- raspberrypi.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 7.677/8.050/8.531/0.356 ms

By limiting the memory to 2GB, the 8GB unit works as expected.

Additional context

I have tried various permutations of the following config options in cmdline.txt with no success:

  • iommu=soft
  • iommu.strict=1
  • coherent_pool=1M
@P33M
Copy link
Contributor

P33M commented Nov 13, 2024

Are you saying that using

dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip
dtparam=pciex1
dtparam=pciex1_gen=3

in config.txt on an 8GB device results in a kernel panic?
What happens if you remove dtparam=pciex1_gen=3?

@P33M
Copy link
Contributor

P33M commented Nov 13, 2024

It's likely this bug - https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/#25662182

Usage of virt_to_phys() is bad. This is fixed in kernel 6.9 and onwards. Our rpi-6.12.y branch is mostly functional (and will be the next target for rpi-update releases), can you try building that?

@Couch-Potato
Copy link

Went through this trying to use a QCN9074 module. I was running on the latest release RPi kernel compiled with the ath11k drivers and the network interface was able to be initiated. I then began having the problems described here when trying to make a network connection. Following your instructions, I upgraded to kernel rpi-6.12.y. After upgrading, the Network interface no longer mounts.

[ 6.027067] ath11k_pci 0000:01:00.0: swiotlb buffer is full (sz: 1048583 bytes), total 65536 (slots), used 992 (slots) [ 6.027102] ath11k_pci 0000:01:00.0: failed to set up tcl_comp ring (0) :-12 [ 6.027355] ath11k_pci 0000:01:00.0: failed to init DP: -12

Any ideas?

@P33M
Copy link
Contributor

P33M commented Nov 18, 2024

Remove dtoverlay=pcie-32bit-dma from config.txt - it should not be necessary.

@omerk
Copy link
Author

omerk commented Nov 19, 2024

With 12856cc compiled and running, here's various combinations of config.txt options and ath11k output.

It appears that both dtoverlay=pcie-32bit-dma and dtoverlay=pciex1-compat-pi5,no-mip are still needed, and while there are no kernel panics on 6.12, I am seeing the same error @Couch-Potato gets: failed to set up tcl_comp ring (0) :-12

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
#dtparam=pciex1_gen=3
#dtoverlay=pcie-32bit-dma
#dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.291863] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.291899] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.291992] ath11k_pci 0000:01:00.0: arch assigned 64-bit MSI address 0xffffffe000 but device only supports 32 bits
[    7.292025] ath11k_pci 0000:01:00.0: failed to enable msi: -22
[    7.292071] ath11k_pci 0000:01:00.0: probe with driver ath11k_pci failed with error -22

-----------------------------

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
dtparam=pciex1_gen=3
#dtoverlay=pcie-32bit-dma
#dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.259459] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.259499] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.259608] ath11k_pci 0000:01:00.0: arch assigned 64-bit MSI address 0xffffffe000 but device only supports 32 bits
[    7.259651] ath11k_pci 0000:01:00.0: failed to enable msi: -22
[    7.259738] ath11k_pci 0000:01:00.0: probe with driver ath11k_pci failed with error -22

-----------------------------

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
dtparam=pciex1_gen=3
dtoverlay=pcie-32bit-dma
#dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.543273] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.543313] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.545935] ath11k_pci 0000:01:00.0: arch assigned 64-bit MSI address 0xffffffe000 but device only supports 32 bits
[    7.548551] ath11k_pci 0000:01:00.0: failed to enable msi: -22
[    7.548867] ath11k_pci 0000:01:00.0: probe with driver ath11k_pci failed with error -22

-----------------------------

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
dtparam=pciex1_gen=3
dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.083124] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.083153] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.083551] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.083562] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.559806] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.559815] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
[    8.924539] ath11k_pci 0000:01:00.0: swiotlb buffer is full (sz: 1048583 bytes), total 32768 (slots), used 992 (slots)
[    8.924553] ath11k_pci 0000:01:00.0: failed to set up tcl_comp ring (0) :-12
[    8.924604] ath11k_pci 0000:01:00.0: failed to init DP: -12

-----------------------------

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
dtparam=pciex1_gen=3
#dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.241951] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.241983] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.242252] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.242264] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.268070] ath11k_pci 0000:01:00.0: probe with driver ath11k_pci failed with error -12

-----------------------------

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
#dtparam=pciex1_gen=3
#dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.627480] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.627519] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.627817] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.627837] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.644143] ath11k_pci 0000:01:00.0: probe with driver ath11k_pci failed with error -12

This was on a 8GB board running the latest bootloader:

$ vcgencmd bootloader_version
2024/11/05 12:38:12
version 3c4fc886f1f3b34a794d7157e0f26ec5942408ca (release)
timestamp 1730810292
update-time 1731352560
capabilities 0x0000007f

Thanks in advance for the pointers, @P33M.

@P33M
Copy link
Contributor

P33M commented Nov 19, 2024

Hm. It really is a 32-bit device. No-mip is also necessary because the endpoint needs 16 vectors.
So the issue is the swiotlb exhaustion, which is a bit puzzling - what's the output of dmesg | grep -i cma, and what happens if you set cma=128M@128M in /boot/firmware/cmdline.txt?

@omerk
Copy link
Author

omerk commented Nov 19, 2024

With the system running as it was:

$ dmesg | grep -i cma
[    0.000000] Reserved memory: created CMA memory pool at 0x0000000002000000, size 64 MiB
[    0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000] OF: reserved mem: 0x0000000002000000..0x0000000005ffffff (65536 KiB) map reusable linux,cma
[    0.026104] Memory: 8166224K/8384512K available (13952K kernel code, 2294K rwdata, 4768K rodata, 5376K init, 837K bss, 145312K reserved, 65536K cma-reserved)

After adding cma=128M@128M to cmdline.txt:

$ dmesg | grep -i cma
[    0.000000] Reserved memory: bypass linux,cma node, using cmdline CMA params instead
[    0.000000] OF: reserved mem: node linux,cma compatible matching fail
[    0.000000] cma: Reserved 128 MiB at 0x0000000008000000 on node -1
[    0.000000] Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave  smsc95xx.macaddr=2C:CF:67:67:8D:23 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000  console=ttyAMA10,115200 console=tty1 root=PARTUUID=8c5b1cb2-02 rootfstype=ext4 fsck.repair=yes cma=128M@128M rootwait
[    0.049943] Memory: 8100688K/8384512K available (13952K kernel code, 2294K rwdata, 4768K rodata, 5376K init, 837K bss, 145312K reserved, 131072K cma-reserved)

Config + ath11k output:

$ sed -n 1,4p /boot/firmware/config.txt
dtparam=pciex1
dtparam=pciex1_gen=3
dtoverlay=pcie-32bit-dma
dtoverlay=pciex1-compat-pi5,no-mip

$ dmesg | grep ath11k
[    7.250378] ath11k_pci 0000:01:00.0: BAR 0 [mem 0x1b80000000-0x1b801fffff 64bit]: assigned
[    7.250411] ath11k_pci 0000:01:00.0: enabling device (0000 -> 0002)
[    7.250682] ath11k_pci 0000:01:00.0: MSI vectors: 16
[    7.250694] ath11k_pci 0000:01:00.0: qcn9074 hw1.0
[    7.743254] ath11k_pci 0000:01:00.0: chip_id 0x0 chip_family 0x0 board_id 0xa0 soc_id 0xffffffff
[    7.743264] ath11k_pci 0000:01:00.0: fw_version 0x270206d0 fw_build_timestamp 2022-08-04 12:48 fw_build_id WLAN.HK.2.7.0.1-01744-QCAHKSWPL_SILICONZ-1
[    9.113350] ath11k_pci 0000:01:00.0: swiotlb buffer is full (sz: 1048583 bytes), total 32768 (slots), used 992 (slots)
[    9.113363] ath11k_pci 0000:01:00.0: failed to set up tcl_comp ring (0) :-12
[    9.113412] ath11k_pci 0000:01:00.0: failed to init DP: -12

No change, unfortunately.

@P33M
Copy link
Contributor

P33M commented Nov 19, 2024

The driver must be allocating a huge amount of DMA coherent memory (and because it's forced to be DMA32, going via swiotlb) for the transfer ring in a single chunk. It's failing on the first tcl_comp setup where apparently we've already created HAL_SW2WBM_RELEASE, HAL_TCL_CMD, HAL_TCL_STATUS, and one HAL_TCL_DATA ring.

I don't know enough about the swiotlb internals to suggest how to expand the slots within the reserved memory segment.

@P33M
Copy link
Contributor

P33M commented Nov 20, 2024

Can you build a rpi-6.12.y kernel with CONFIG_SWIOTLB_DYNAMIC=y and test?

@omerk
Copy link
Author

omerk commented Nov 20, 2024

Trying to figure out if this is a problem with my config, but so far no luck building 12856cc:

...
  CC [M]  drivers/media/dvb-frontends/itd1000.o
  CC [M]  drivers/media/dvb-frontends/ves1820.o
  CC [M]  drivers/media/dvb-frontends/ves1x93.o
  CC [M]  drivers/media/dvb-frontends/zd1301_demod.o
  CC [M]  drivers/media/dvb-frontends/zl10036.o
  CC [M]  drivers/media/dvb-frontends/zl10039.o
  CC [M]  drivers/media/dvb-frontends/zl10353.o
  LD [M]  drivers/media/dvb-frontends/cxd2820r.o
  LD [M]  drivers/media/dvb-frontends/drxd.o
  LD [M]  drivers/media/dvb-frontends/drxk.o
  LD [M]  drivers/media/dvb-frontends/stb0899.o
  LD [M]  drivers/media/dvb-frontends/stv0900.o
  AR      drivers/built-in.a
  AR      built-in.a
  AR      vmlinux.a
  LD      vmlinux.o
  OBJCOPY modules.builtin.modinfo
  GEN     modules.builtin
  GEN     .vmlinux.objs
  MODPOST Module.symvers
ERROR: modpost: "__swiotlb_find_pool" [drivers/gpu/drm/vc4/vc4.ko] undefined!
make[2]: *** [scripts/Makefile.modpost:145: Module.symvers] Error 1
make[1]: *** [/tmp/raspios-builder/linux/Makefile:1888: modpost] Error 2
make: *** [Makefile:224: __sub-make] Error 2

@P33M
Copy link
Contributor

P33M commented Nov 20, 2024

Hmm, reliance on an unexported symbol. It builds for me with git revert 10da9e6868942 - shouldn't affect video on Pi 5 as it has IOMMUs.

@6by9
Copy link
Contributor

6by9 commented Nov 20, 2024

10da9e6 is using is_swiotlb_buffer which I thought was valid.

However that then uses swiotlb_find_pool if CONFIG_SWIOTLB_DYNAMIC is defined.
The usage changes between 6.6 and 6.12, so that may solve some of those issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants