Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gatt_security_server example crash (Cache disabled but cached memory region accessed) (IDFGH-7342) #8928

Closed
jw-redpanda opened this issue May 10, 2022 · 17 comments
Labels
Awaiting Response awaiting a response from the author Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@jw-redpanda
Copy link

Environment

  • Development Kit: ESP32-Wrover-Kit
  • Module or chip used: ESP32-WROVER-B
  • IDF version (run git describe --tags to find it): v4.1.2-383-g23e8233506
  • Build System: idf.py
  • Compiler version (run xtensa-esp32-elf-gcc --version to find it): xtensa-esp32-elf-gcc (crosstool-NG esp-2020r3) 8.4.0
  • Operating System: Ubuntu 16.04
  • Power Supply: USB or battery

Problem Description

Repeatable crash (Cache disabled but cached memory region accessed) in example: gatt_security_server example.
It crashes during the pairing process if enabled a small value of SPIRAM_MALLOC_ALWAYSINTERNAL, e.g. 50

Enabled options via menuconfig

Component config > ESP32-specific > Support for external, SPI-connected RAM > SPI RAM config:
(50) Maximum malloc() size, in bytes, to always put in internal memory
Component config > ESP32-specific > Support for external, SPI-connected RAM > SPI RAM config > SPI RAM access method:
(x) Make RAM allocatable using malloc() as well

The following is not enabled to ensure BT stack does not use dynamic allocation

BT_ALLOCATION_FROM_SPIRAM_FIRST
BT_BLE_DYNAMIC_ENV_MEMORY

Expected Behavior

  1. When the app starts, it starts the BLE server and advertising.
  2. When the client connects to the server, pairing is initiated on the client side, e.g. Android/iOS smartphone.
  3. After the smartphone confirms the pairing, the server app is able to complete the pairing process and saves the bonded device information.

Actual Behavior

In step3, the BLE server app crashes and the crash is repeatable with the following message:
Guru Meditation Error: Core 0 panic'ed (Cache disabled but cached memory region accessed)

Steps to reproduce

Simply enable the sdkconfig flags as described in the problem description. It crashes every time when pairing is performed.
If the value of SPIRAM_MALLOC_ALWAYSINTERNAL is larger, say 100, the crash is gone.

Code to reproduce this issue

use the example gatt_security_server example without code modification.

Debug Logs

Trying out to locate where it crashes, by inserting some debug log in esp-idf/components/bt/host/bluedroid/btc/core/btc_dm.c

static void btc_dm_ble_auth_cmpl_evt (tBTA_DM_AUTH_CMPL *p_auth_cmpl)
{
...
    if (p_auth_cmpl->success) {
         ...
         printf(">>> before btc_storage_get_remote_addr_type()\n");
         if (btc_storage_get_remote_addr_type(&bdaddr, &addr_type) != BT_STATUS_SUCCESS) {
            printf(">>>before btc_storage_set_remote_addr_type()\n");
            btc_storage_set_remote_addr_type(&bdaddr, p_auth_cmpl->addr_type, true);
        }
        printf(">>> before btc_storage_set_ble_dev_auth_mode()\n");
        btc_storage_set_ble_dev_auth_mode(&bdaddr, p_auth_cmpl->auth_mode, true);
        printf(">>> before btc_dm_save_ble_bonding_keys()\n");
        btc_dm_save_ble_bonding_keys();
        printf(">>> after btc_dm_save_ble_bonding_keys()\n");
...

When the app crashes, the debug log is seen.

I (1869) SEC_GATTS_DEMO: app_main init bluetooth
I (2039) SEC_GATTS_DEMO: The number handle = 8
I (2079) SEC_GATTS_DEMO: advertising start success
I (12729) SEC_GATTS_DEMO: ESP_GATTS_CONNECT_EVT
W (15369) BT_SMP: FOR LE SC LTK IS USED INSTEAD OF STK
I (15559) SEC_GATTS_DEMO: key type = [ESP_LE_KEY_LENC]
I (15559) SEC_GATTS_DEMO: key type = ESP_LE_KEY_PENC
I (15559) SEC_GATTS_DEMO: key type = ESP_LE_KEY_LID
I (15789) SEC_GATTS_DEMO: key type = ESP_LE_KEY_PID
>>> before btc_storage_get_remote_addr_type()
>>> before btc_storage_set_ble_dev_auth_mode()
>>> before btc_dm_save_ble_bonding_keys()
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed)

Other items if possible

  • [x ] sdkconfig file (attach the sdkconfig file from your project folder)
    sdkconfig.txt

The issue is in fact found in a large ESP32 application that needs to reserve most IRAM/internal memory to FreeRTOS tasks' stack memory. So far, most components work perfectly with the memory optimization except for the crash issue found in Bluedroid.

@espressif-bot espressif-bot added the Status: Opened Issue is new label May 10, 2022
@github-actions github-actions bot changed the title gatt_security_server example crash (Cache disabled but cached memory region accessed) gatt_security_server example crash (Cache disabled but cached memory region accessed) (IDFGH-7342) May 10, 2022
@jw-redpanda
Copy link
Author

More info about the reason why moving most of the internal memory to PSRAM is critically required, since our company is developing an IoT device that makes use of a few major components, e.g. esp-adf opus encoder(rec sound), aac decoder (play sound), WiFi/tcp/socket for cloud connectivity, BLE for near-field communication, GPIOs and many other esp components.

The development got stuck since the Bluedroid BLE stack eats up 60KB from the already limited 160KB internal RAM, although the board has plenty of external 8MB PSRAM.

@Alson-tang
Copy link

The problem has been fixed. You can use the master branch or our latest 4.4 branch for testing.

@jw-redpanda
Copy link
Author

The problem has been fixed. You can use the master branch or our latest 4.4 branch for testing.

Thank you very much, Alson, for your prompt reply.
Need to migrate my existing code base to v4.4 and check.

@jw-redpanda
Copy link
Author

jw-redpanda commented May 12, 2022

@Alson-tang Unfortunately, upgrading to v4.4 does not completely resolve the BLE pairing crash issue when memory optimization is applied, say SPIRAM_MALLOC_ALWAYSINTERNAL=20 or 1.

Compare with v4.1.2 and v4.4, v4.4 still crashes randomly when pairing. The crash happens in function btc_dm_save_ble_bonding_keys(), where it saves paired key info on nvs. v4.1.2 crashes much more likely than v4.4

To repeat the crash, simply repeat the steps below:

  1. compile and start the BLE sample app on the esp32 module.
  2. Use the Android nRF Connect app. Scan and connect to esp32.
  3. Pairing dialog box is popped up in Android (the double popup issue still exists)
  4. When the second pairing dialog box is done, the esp32 app is either finished pairing or crash
  5. Delete the BLE paired profile in Android settings > Bluetooth
  6. Repeat the step#1

Again, memory optimization is critically required since there is only 160KB of internal RAM left for freeRTOS tasks' stack.

@AxelLin
Copy link
Contributor

AxelLin commented May 13, 2022

@Alson-tang Unfortunately, upgrading to v4.4 does not completely resolve the BLE pairing crash issue

What's your "git describe --tags" output? (Just to clarify if you are on v4.4 tag or v4.4 branch.)

@jw-redpanda
Copy link
Author

@Alson-tang Unfortunately, upgrading to v4.4 does not completely resolve the BLE pairing crash issue

What's your "git describe --tags" output? (Just to clarify if you are on v4.4 tag or v4.4 branch.)

v4.4.1

@AxelLin
Copy link
Contributor

AxelLin commented May 13, 2022

The problem has been fixed. You can use the master branch or our latest 4.4 branch for testing.

@Alson-tang

From #8928 (comment), v4.4.1 still has the issue.
And seems no fix related to this issue after v4.4.1 (in v4.4 branch).
Can you share the commit for the fix?

@Alson-tang
Copy link

@jw-redpanda @AxelLin You can try to verify on IDF v4.4 branch first. If you still have problems on ESP32, you can check whether the choice (High level interrupt) is enabled in menuconfig first.

Component config -> Bluetooth -> Bluetooth -> Bluetooth controller

@AxelLin
Copy link
Contributor

AxelLin commented May 16, 2022

The (High level interrupt) is default y and it is not a visible option on menuconfig, so it's unlikely users will disable it.
Given the fact the issue was reported against v4.4 and v4.4.1, it's clear this issue is NOT fix.

@jw-redpanda
Copy link
Author

@AxelLin Double confirm "CONFIG_BTDM_CTRL_HLI=y" is used in sdkconfig. In fact, this option is the default value.
When a small value, say "CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=20" is applied, the crash will happen randomly in the pairing process. The crash happens when Bluedroid writes pairing info to the nvs store.
"git describe --tags" returns v4.4.1

Hope this can be fixed soon.

@jw-redpanda
Copy link
Author

@Alson-tang @AxelLin I am trying the same test on ESP32-WROVER-E (ESP32-D0WD-V3) and it seems the crash is gone, although I need more tests to confirm. At least, ESP32-WROVER-B is able to repeat the crash easily.
Is the issue possible due to my older ESP32-WROVER-B module (ESP32-D0WD)?

@blueMoods
Copy link
Contributor

blueMoods commented May 18, 2022

@jw-redpanda @AxelLin Thanks very much for your debug.

I have read your sdkconfig and logs and have some conjectures, can you help me verify it? Just disable
CONFIG_SPIRAM_USE_MALLOC and test again. You won't be able to reproduce the problem if my guess is right.

@Alvin1Zhang Alvin1Zhang added the Awaiting Response awaiting a response from the author label May 19, 2022
@jw-redpanda
Copy link
Author

@blueMoods Trying out the following configs but pairing still causes random crashes. The crash logs are slightly different but all crashes happen in the function nvs_set_blob().

esp-idf: esp-idf v4.4.1
Board: ESP32-WROVER-B (ESP32-D0WD) / ESP32-WROVER-E (ESP32-D0WD-V3)
Test tool: nRF Connect (Android)
How to repeat: Repeat the pairing process. Before pairing, remember to delete the paired info in Android Bluetooth settings.

Config#1

CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST=y
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=1
CONFIG_SPIRAM_USE_MALLOC=y
CONFIG_SPIRAM_ALLOW_BSS_SEG_EXTERNAL_MEMORY=y

Config#2 (CONFIG_SPIRAM_USE_MALLOC is disabled)

CONFIG_BT_ALLOCATION_FROM_SPIRAM_FIRST=y
CONFIG_BT_BLE_DYNAMIC_ENV_MEMORY=y
CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=1
SPIRAM_USE_CAPS_ALLOC=y
CONFIG_SPIRAM_ALLOW_BSS_SEG_EXTERNAL_MEMORY=y

Crash log (config#1): Cache disabled but cached memory region accessed
Added some printf() to pinpoint where it crashes

I (52860) SEC_GATTS_DEMO: key type = ESP_LE_KEY_PID
>>>> case BTA_DM_BLE_AUTH_CMPL_EVT
>>>> entered btc_dm_ble_auth_cmpl_evt()
405 >>>> after nvs_open()
463 >>>> before nvs_set_blob()
465 >>>> keyname=bt_cfg_key0
467 >>>> after nvs_set_blob()
498 >>>> before nvs_commit()
500 >>>> after nvs_commit()
>>>> calling btc_storage_set_ble_dev_auth_mode()
405 >>>> after nvs_open()
463 >>>> before **nvs_set_blob**()
465 >>>> keyname=bt_cfg_key0
Guru Meditation Error: Core  0 panic'ed (Cache disabled but cached memory region accessed).

Core  0 register dump:
PC      : 0x4009542e  PS      : 0x00060a35  A0      : 0x800821a4  A1      : 0x3ffbe680  
A2      : 0x3f801318  A3      : 0x00000000  A4      : 0x00000001  A5      : 0xbad00bad  
A6      : 0x00000000  A7      : 0x00000001  A8      : 0x400857f8  A9      : 0x3ffbe790  
A10     : 0x00000000  A11     : 0x00000107  A12     : 0x800875c0  A13     : 0x400942c0  
A14     : 0x00000000  A15     : 0x00000002  SAR     : 0x00000004  EXCCAUSE: 0x00000007  
EXCVADDR: 0x00000000  LBEG    : 0x400923ec  LEND    : 0x40092408  LCOUNT  : 0x00000000  

Backtrace:0x4009542b:0x3ffbe6800x400821a1:0x3ffbe6b0 0x4008ec43:0x3ffbe6d0 0x4008ee97:0x3ffbe700 0x4008ebda:0x3ffbe730 0x40087a41:0x3ffbe750 0x40091781:0x3ffbe770 0x40085d22:0x3ffbe790 0x400857f5:0x3ffbe7c0 0x400875bf:0x3ffcdbd0 0x400917c5:0x3ffcdbf0 0x40087225:0x3ffcdc10 0x400dbcf6:0x3ffcdc50 0x400dd597:0x3ffcdc80 0x400de2cf:0x3ffcdca0 0x400de6ad:0x3ffcdcc0 0x400dc795:0x3ffcdd20 0x400dcde6:0x3ffcdd70 0x400dd3e5:0x3ffcdde0 0x400dc253:0x3ffcde00 0x40107263:0x3ffcde30 0x400e248c:0x3ffcde90 0x400e16ac:0x3ffcdeb0 0x400e1a8d:0x3ffcdee0 0x400e1de8:0x3ffcdf20 0x4010a8dc:0x3ffcdf40 0x4010ac59:0x3ffcdf70 0x40105c7f:0x3ffce010 0x40108039:0x3ffce030 0x40097071:0x3ffce050

ELF file SHA256: 49b5be26919b56b1

Rebooting...
ets Jul 29 2019 12:21:46

rst:0x3 (SW_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)

Crash log (config#2): sudden reboot (TG1WDT_SYS_RESET)

>>>> entered btc_storage_add_ble_bonding_key()
405 >>>> after nvs_open()
463 >>>> before **nvs_set_blob**()
465 >>>> keyname=bt_cfg_key0
ets Jul 29 2019 12:21:46

rst:0x8 (TG1WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:6632
load:0x40078000,len:14788
load:0x40080400,len:3792
entry 0x40080694

According to the trace log, the crash is caused by writing to nvs during the pairing process. Since the point where the crash happens is quite clear, will it be able to resolve it soon?

@blueMoods
Copy link
Contributor

@jw-redpanda Thanks very mush for you help.

The reason of this issues is not at nvs_set_blob(), the real reason is that: we try to access cache in interrupt, but cache is disabled. And nvs_set_blob() will write NVS and disable cache.

I have find some issues will cause assceeing cache in interrupt: queue, sempher and mutex will be placed into PSRAM if CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL is too small. I have fixed them in IDF, here is the patch for release/v4.1
fix.zip. And we are merging it to esp-idf.

Please make sure the queue, sempher and mutex you used in interrupt is mallocd in IRAM, you can refer to the function queue_create_wrapper, semphr_create_wrapperand mutex_create_wrapper in components/bt/controller/bt.c

If this issue re-produce after changes, please save the log and .elf file and send to me. thanks.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Jul 11, 2022
@espressif-bot espressif-bot added Resolution: NA Issue resolution is unavailable Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: In Progress Work is in progress Resolution: NA Issue resolution is unavailable labels Jul 22, 2022
@Alvin1Zhang
Copy link
Collaborator

Thanks for reporting, will close due to short of feedback, feel free to reopen with more updates, thanks.

@jw-redpanda
Copy link
Author

jw-redpanda commented Aug 9, 2022

@blueMoods Thanks for the patch.
when using project gatt_security_server, BLE pairing crash is gone even though small CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL is used, e.g. 1-10,

Then, I tried patching to a larger project, which has Bluetooth BLE, Wi-Fi, and many other components like esp-adf. However, the crash is still seen. After many trials, I suspect Wi-Fi may cause the crash.

To help repeat the issue, I build a small project
gatt_security_server_wifi
. This project is able to demonstrate the crash in BLE pairing when Wi-Fi is connected to AP. When Wi-Fi is stopped, the crash will be gone. I hope that this project will help troubleshoot the issue.

@Alvin1Zhang would you please re-open the issue? The issue can be repeated easily.

@jw-redpanda
Copy link
Author

@AxelLin @blueMoods Can I re-open the issue? I have prepared a project that is able to repeat the issue. It seems to be a Wi-Fi / BLE co-existence issue. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting Response awaiting a response from the author Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests

6 participants