-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NimBLE - immediate disconnect when BLE key store fills and device changes address (IDFGH-3586) #5530
Comments
Thanks for reporting, we will look into. Thanks. |
Hi @bfriedkin , iPhone/Android devices use RPA addresses by default, which change periodically or after Bluetooth on-off. Now once the iPhone changes address, then it's treated as a new device and hence you are observing pairing event for each Bluetooth on-off cycle. However the RPA address can be resolved using IRKs (Identity Resolving keys). For that you may need to add, |
Thank you for the quick reply and patch. Unfortunately the patch seems to make the issue trigger right away. Perhaps that provides a clue? Here are my latest steps to reproduce:
That triggers the pairing sequence which succeeds. But then on my iPhone:
At this point LightBlue immediately displays a Disconnect alert. In the LightBlue log the error is "Peer removed pairing information". The attached I then reverted the patch and tried the same. Without the patch, I am able to back out of the service view, reconnect with the peripheral and read the characteristic. After the one-time bonding, each subsequent time I reconnect and read the characteristic there is no pairing dialog. This is what I would expect. Unfortunately if I then toggle Bluetooth off and on, I experience the same disconnect failure in LightBlue. The attached I've also attached the Looking forward to next steps. Regards, |
Hi @bfriedkin Can you please confirm if IDF commit: 04e3cf4 is present in your git repo ? If not Can you please pull in latest changes from
Thank you for pointing out, I will fix this. |
Thank you. I will confirm the commit and try the patches today. Regarding LE Secure Connections, should I still be running with the prior Regards, |
I merged commit 04e3cf4 into ESP-IDF v3.3.2. I also applied your NimBLE_RPA_not_resolved_Debug_logs.txt patch. Your patch didn't originally build, but I was able to fix the problems and run the bleprph app. I have the bleprph app configured to use the The good news is that PR!7 does seem to fix both the disconnect problem and subsequent failures in the ble key store. I was able to toggle Bluetooth off/on on my iPhone and the device reconnected without requiring another passkey entry. I was able to read the encrypted random number characteristic in LightBlue. The bad news is that with this patch applied, there is a new problem: Failure reading encrypted characteristic. Steps to reproduce:
At this point the iPhone fails to read the characteristic. The IDF log at this point shows the following:
I've attached the complete log showing the successful re-pair and reconnects. The failure above can be found at the end of the log. Regards, |
In your log just before the error:
status=1035 means: BLE_SM_ERR_DHKEY Indicates to the remote device that the DHKey Check value received doesn’t match the one calculated by the local device. Which happened because you deleted the bond from your phone, which would trigger the repeat pairing event on the peripheral and in the example it deletes the bond locally. After this you need to reconnect to re-pair the devices and it should work. Would be nice if that wasn't necessary but it seems to be at this point. |
@h2zero - Thank you for helping here and good catch on the encryption error. What you describe in terms of the DHKey Check value makes sense and is consistent with the error message I see in the LightBlue log on the phone. Unfortunately I still bump into problems even after trying to reconnect and re-pair from the phone. I was also able to confirm that the
That trace appears in the attached NimBLE log. I spent a little more time today trying to further isolate when the failure happens. With the 04e3cf4 commit merged into ESP-IDF 3.3.2, I found that the encryption failure always occurs after you remove the bond on the phone and toggle Bluetooth off and on. I believe this is because the iPhone changes it's BT address. In today's testing I started fresh, erasing the ESP32 flash and did the following:
The third try always fails with the I then built the same service on top of the ESP32/Bluedroid and repeated the same steps above. No failure was reproduced. When running on top of Bluedroid, I notice that the pairing dialog displays on the iPhone before I try to read the characteristic, as opposed to when I read the characteristic using NimBLE. I have seen this type of behavior before and assume it is unrelated to the issue at hand. Both the NimBLE and Bluedroid tests were run on the same ESP32 hardware connected to the same iPhone. I therefore believe the problem lies in the NimBLE implementation. A mobile client should not have to reconnect and/or delete bonds for this scenario to succeed. I've attached both NimBLE and Bluedroid logs for reference. The Bluedroid log is a little thin... I can increase the verbosity if that would help at all. @prasad-alatkar - I am hoping this additional information helps to better isolate the failure. Crossing my fingers for a fix... Regards, |
I can confirm and repro this, but also confirm that after the failure to pair/read, disconnecting and reconnecting will allow pairing to succeed. I'm unsure as to why this happens but it may be something that needs to be addressed upstream, @prasad-alatkar would know more about that though.
This is how it works with NimBLE and is more inline with the BLE core spec as the database is allowed to be inspected but values are what is protected. Just thought I'd share that while I'm posting. If you want to have similar behavior to bluedroid you can call |
Hi @bfriedkin Thank you for putting detailed description of the issue. Can you please try the attached patch and let me know if it resolves the issue ? Patch: NimBLE_RPA_fix_identity_change.txt The patch needs to be applied on NimBLE submodule ( |
Thank you for your continued focus here. I tried the additional patch. Unfortunately that leads to a crash when I reconnect the iPhone after deleting the bond. It looks like the crash occurs when the app calls
I've also attached the full log here as well as my copy of the Regards, |
@bfriedkin I am attaching modified patch to try, please try it out. I am a little afraid that this patch may not help you overcome the issue, but the crash. Please let me know your observations after trying out this patch, If the issue still persists I may need to try with iPhone instead of Android phone. Patch to try out: |
Today I built with the following:
I ran the following tests:
Tests 1-3 succeeded. Unfortunately test 4 failed. Turning Bluetooth off and back on on the iPhone but keeping the bond to the ESP32 device triggered the immediate disconnect failure. I've attached the log here. Note that I tried step 4 two times before terminating the device app and copying the log. Good idea to try with an iPhone. Hoping you can do that and more easily reproduce/resolve the issue. This issue will need to be resolved on both iOS and Android anyway... Regards, |
I can confirm the issue with the patch on my iPhone as described above. I made a small addition to the patch that seems to have resolved it. NimBLE_RPA_fix_identity_change_h2zero.txt Edit: wrong file 😢 |
@h2zero - Thank you for your continued support here. I removed the NimBLE_RPA_fix_identity_change_2.txt and applied the NimBLE_RPA_fix_identity_change_h2zero.txt patch. Unfortunately that didn't resolve issue (4) above. The connection fails on the phone. The LightBlue log shows Error: Peer removed pairing information. I've attached the log corresponding to tests (1)-(4) above. Regards, |
@bfriedkin You're welcome, I'd like to see this fixed as well. In your last 2 logs I see I just realized you're on idf v3.3.2, I tested that patch on v4.0.1. Thanks for testing and logging 😄 |
Hello - Just to underscore what @h2zero pointed out, we need a fix for ESP-IDF v3.3.2. We have products built on that SDK version that we cannot currently migrate to v4.x. Furthermore, v3.3.2 is a Long Term Support release that will be supported until March 2022. @h2zero - I appreciate you independently confirming the failures. Lots of moving parts here with the steps to reproduce and multiple patches. @prasad-alatkar - Hoping today's discussion helps to get us closer to a solution. Regards, |
Hi @bfriedkin , I was able to reproduce the issue with iPhone, however I do see the issue is not iPhone specific, iPhone just disconnects continuously whereas Android device deletes the saved bond and re-pairs again. There have been lots of patches and trials here, so I will try to put conclusive solution here:
Regarding
No need to migrate to v4.x, for that matter v4.x will also need this fix. The fix will be available in next Regards, |
I followed your steps (1) - (3) above, but I applied the
Unfortunately I see the same failure on iPhone. The connection fails on the iPhone when the bond is saved on the iPhone but Bluetooth is toggled off and then back on. I've attached a diff of my I've attached my copy of the
Regards, |
Hi @bfriedkin , I have confirmed the issue gets resolved with the patch for
|
I have mostly good news. :-) Following you steps above I was able to successfully repeat the test steps 1 to 4 on my iPhone without any failures! Apparently I previously didn't apply the Unfortunately when I tested our BLE peripheral application on Android (not the
At this point the ESP32 app asserts and drops into GDB. Here is the relevant part of the log output:
Our repeat pairing event handler case looks like this:
I've attached the complete log here. Hoping this is an easy fix. Perhaps there is a missing Regards, |
@bfriedkin yes, you are right regarding |
The Because we have products that require this fix, I'd appreciate any information you can provide related to how and when these patches will be publicly available. Will these fixes be back-ported to ESP-IDF v3.3.2 or do we need to wait for the next v3.3.x release? Can I be notified in either case? Regards, |
Hi @bfriedkin , glad that the issue is resolved. So the fix will be first merged into master branch and subsequently it will be backported till |
Unfortunately we've bumped into another issue that needs to be resolved. It seems possibly related to these bonding fixes, so I'm reporting the issue here. The problem is that when the CCCD store (NVS on ESP32) fills up, the functions that purges the store, starting from the Storage Status callback, eventually fails. This is triggered by the client subscribing for notifications, when the client/server are bonded and the CCCD store is full. It can be a little challenging to reproduce, but here are the basic steps:
I am using I've attached a log that shows the problem. To help pinpoint the failure, I've also added log output to a few of the NimBLE modules, which I've also attached here. Recall I am running ESP-IDF v3.3.2 with the patches provided. Looking at the log, the first failure occurs here at line 901:
We are using the stock At line 1447 we see a case that succeeds:
Finally at line 1588 there is another failure, but with a different trigger:
I hope these notes, log and test app help you more quickly isolate and resolve the problem. Regards, |
Hi @bfriedkin, unfortunately I could not reproduce the issue. Either I am missing something here or we are not at same NimBLE pointer. Can you please confirm if your NimBLE submodule pointer is updated to
Now nimble submodule pointer should point to |
Sorry to hear you are having trouble reproducing the issue. Git seems to think that I am pointing at the correct submodule:
In that same log, I see the two commits you reference above:
I am running with the While problem is difficult to reproduce, it does occur. There are two different failures, which I've already documented above. Filling up the CCCD store along with the iPhone changing its address (RPA) seem to trigger the failure. I spent some time today trying to find a simple set of steps. While I wasn't able to find a simple way to reproduce the failure, I did reproduce the failure by doing the following in the previously attached
(Note at this point two of the three CCCD slots are filled)
At this point, at line 3649 in the attached
Also at this point in LightBlue, the This is one of the two errors I pointed out previously. The logs don't lie. :-) Certainly the problem is difficult to reproduce. I'm hoping given the additional details you can find a quicker way to reproduce the failure. If you are still having trouble, I would recommend taking a look at the prior logs and modified sources I provided in my previous post. You can easily see where things are failing in the code. Regards, |
I haven't tested to confirm this but I was looking through the log and I can see the client takes up all 3 cccd storage slots and they are not all deleted with unpairing. It happens like this:
|
Hi @bfriedkin , extremely sorry to make you go through pain of listing all possible steps, couldn't help as I was not able to reproduce the issue despite trying multiple sequence of steps. After following the steps you mentioned, I was able to reproduce the issue with "iPhone + LightBlue" combination. I mostly have good news here, please try attached patch |
Glad to hear that you were ultimately able to reproduce a problem! Thank you for the patch and instructions. Based on your analysis thus far, do you anticipate the latest fixes address both of the error cases @h2zero and I outlined above? Unfortunately I might not be able to give this new patch a try until Thursday (U.S. time). Apologies for the delay. I'll let you know then how our testing goes. If things still fail I'll get you a log built against the CCCD_failure_RPA_log.txt patch. Regards, |
Hi @bfriedkin, @h2zero has not pointed out any other/different issue, rather has tried to summarized the issue. Yes, He is right regarding this comment: "should not happen, maybe called with ota address". This is exactly what I try to fix with the latest patch. So I believe the latest fix addresses the issue, however I will wait for confirmation from your end for the same. |
@prasad-alatkar btw - if it helps, I can confirm that 0003 patch worked nicely to fix RPA android clients connecting to my open-source project. |
I am having trouble applying the patch on top of the updated NimBLE submodule. I am applying these changes to ESP IDF v3.3.2 cloned today:
What am I missing? Thanks, |
@geeksville Thank you for confirming the fix independently. @bfriedkin, looks like you pulled in latest NimBLE submodule, so fix is already present :) Can you please confirm if NimBLE submodule pointer is this: |
This is what I see before updating to nimble-1.1.0-idf-v3.3 :
And then after updating to nimble-1.1.0-idf-v3.3 :
Do I need to add anything else? Please let me know. From what I can tell it looks like this is all I need. Thanks, |
Yes @bfriedkin , you don't need to do anything else. The fix is already present. |
I did some testing against the latest fix using LightBlue on iOS, Android and macOS. No problems found. :-) Thank you! We didn't get a chance to test the products today. Hoping to do that tomorrow. Sorry for the delay. I do have a few questions regarding the fixes. The app updates will be rolled out as a product update built against ESP-IDF v3.3.2 and the fixes. The products were previously built against Bluedroid. The ESP32 side will likely have bonds stored by Bluedroid and similarly the mobile client apps will have the associated bonds. Will the ESP32 device ignore the Bluedroid bond because the NimBLE store is different? Will the bond stored by the mobile client(s) be effectively ignored and replaced by a new bond after re-pairing completes? Thanks, |
Hi @bfriedkin
Yes.
This totally depends on design of mobile client, I have observed few Android phones deleting the old bond if the peer has deleted the bond. However in case of iPhone it may not be the case. So to answer your question, the mobile client may require to delete the previous bond and reinitialize pairing again. However this will be only one time activity. |
… (Backport v4.0) - Merges espressif/esp-nimble#12 - Fixes repeated pairing failure in RPA feature Closes #5530
… (Backport v4.2) - Merges espressif/esp-nimble#12 - Fixes repeated pairing failure in RPA feature Closes #5530
… (Backport v3.3) - Merges espressif/esp-nimble#12 - Fixes repeated pairing failure in RPA feature Closes #5530
… (Backport v4.1) - Merges espressif/esp-nimble#12 - Fixes repeated pairing failure in RPA feature Closes #5530
Environment
Problem Description
Using the
bleprph
NimBLE example app configured for secure connections, MITM, bonding and MAX_BONDS = 3, the device immediately disconnects with an iPhone running LightBlue after turning Bluetooth off and on the iPhone. The iPhone maintains the bond when toggling Bluetooth off and on, but the iPhone does change its BT address. This seems to be an iPhone BLE behavior. I believe that the failure might be related to the config store being full. When this happens, there are already three bond entries saved to NVS.Expected Behavior
The device should detect that the peer address has changes, call the store status callback to make room, and proceed without disconnecting.
Actual Behavior
The ESP32 disconnects and the iPhone is unable to establish a connection until the iPhone deletes the bond.
Steps to reproduce
bleprph
app with the options described aboveCode to reproduce this issue
The bleprph sample app without any changes.
Debug Logs
When I launch I see that three bonds are already stored:
When I run without turning Bluetooth off on the iPhone, the log shows a successful connection with bonding:
After I turn Bluetooth off and on, a subsequent connection shows the failure. Note that the peer_id_addr has changed:
I've attached my sdkconfig file to this case.
At this point we are looking for a workaround. Ideally it would be helpful to know in advance at the app level when this failure might trigger so we can hedge against any issues by deleting all bonds or something similar.
Thank you!
Regards,
Brian
The text was updated successfully, but these errors were encountered: