-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP32 NimBLE panic when closing connection #1172
Comments
Hello, can you please provide a stack trace with the failure? Can you please provide some BLE client code to reproduce the problem? |
Hi @mkellner Here's the stack trace:
About the BLE client, I don't have any example ready... I'm reproducing the issue with a special app I'm working on. Finally, I put together the idea of resetting the Let me know what you think! Thanks! |
Thanks for the stack trace. We found a way to reproduce the problem reliably by using the name-change-server example with a small change to have the onCharacteristicRead callback close the BLE server. Then, using nRF Connect on iOS, the crash triggers by connecting and waiting a second. Thanks for sharing the patch. The use of critical sections may help in places, but some of them don't seem likely to have an effect.
We have a fix that we've successfully tested that uses a reference count to ensure that the BLE server isn't disposed while in use. That allows |
Hi @phoddie, You're right, there are some cases unnecessarily protected. Let me know how we can get that. Appreciate your support once more! |
Please try: https://gist.github.com/mkellner/80a28e35da39d9d3f5e4f68db67b8182 Replace |
Hello @beckerzito, Did the |
I have tested the version lately and it seems that for the specific issue of communication during disconnection it solved the issue. But now, with these changes I'm getting a new problem, never faced before... It's not a consistent issue but with a very high frequency (Android 13) and sometimes in (iOs and Android 11), the pairing PIN is not generated, then those clients end up closing the connection. I wasn't able to find the root cause yet but just wanted to point out this issue that I'm supposing is related to this update, once it wasn't running before! I'll keep you posted on the findings, but just wondering if sharing that you could have some insight what might be happening... Maybe some timeout protection on the BLE client? In which the BLE Server on firmware is waiting for something to continue the connection procedure? |
@beckerzito – thanks for the update. Glad to hear that the changes resolve the problems at close.
That's unexpected. Do you have a way to reproduce that with one of the example apps or another project you can share? We can take a look using the BLE security-server example but there may be differences in your configuration.
You might try enabling logging in the BLE server to see what is different between the success and failure cases. That is done by changing both
That would be on the mobile device side (Android or iOS). I can't say. Presumably they timeout eventually, but there's no obvious reason that the ESP32 should be particularly slow. |
@beckerzito - any update here. Pairing works in my testing. You indicate that this isn't 100% reproducible, so maybe there's some special condition needed. |
hi @phoddie sorry for the long delay. Unfortunately, I couldn't work to understand why the proposed change is not working properly. But it's clear that it doesn't work during the integration tests in some Android platforms or even iOS versions. As I described, the pins are not generated and the clients end up closing the connection (at least this is the observed behavior). I apologize for not being able to go deeper on that and creating the app to reproduce it, it's hard to reproduce in an example where the CPU usage is very low... My hypothesis is that it appears in critical CPU usages.... (high number of threads, interrupts, and so on). My only point is that I don't suggest going forward with the proposed change. So far we have been working with the patch I sent on the issue, and all the tests performed pretty well. |
@beckerzito – Thank you for the update. As you will understand, we are most effective at fixing problems when we can reproduce them. Still, these changes do fix some real problems. We'll have to revisit that to decide whether it makes sense to integrate them. |
Build environment: macOS, Windows, or Linux
Moddable SDK version: OS220805b
Target device: ESP32 platforms
Description
When using the NimBLE library, in some conditions I can see that if some BLE communication happens while the BLE connection is being closed, a panic exception is raised, with LoadProhibited.
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Steps to Reproduce
test-server
example), in which the BLE is closed after some time.close
is requested.Expected behavior
The communication should close properly, without any exceptions raised.
Images
From an initial analysis, what I could observe is:
As soon as the
close
is called, themodBLEServer.c
already sets the global instancegBLE
to NULL. The same instance is used by the GATT callbacks when events happen on the BLE communication.The module then posts a message to the machine to call the function
bleServerCloseEvent
that is going to call the functionvoid xs_ble_server_destructor
that properly deinitializes the NIBLE driver and closes all the current connections.So, once the NimBLE has its own thread to receive BLE events, it seems a concurrence problem. If right before the
close
event is executed by the event loop, a message on BLE arrives, the NimBLE task would be running the callback waiting for therequestPending
property to be set to False by thereadEvent
, which is also in the event queue. (see below).As soon as the
close
is executed in the event loop, thegBLE
is set to NULL and thewhile
loop on the NimBLE callback will crash with load prohibited.I imagine that the quick win here would be to set the
gBLE
to NULL only after resetting the gatt server. On the other hand, I can see also some opportunities to improve the protection against the gBLE usage by multiple threads on the long term.Note: I checked that in newer versions we still have the same code.
The text was updated successfully, but these errors were encountered: