-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unable to perform factory reset after removing fabric during ota #24329
Comments
What happens when you call the API? Is the lambda just not scheduled, or does it for some reason not do the right thing @fuxiaoming-lumi ? |
Does the OTA stuff keep running even though the fabric has been removed or something? |
I saw that some packets sent by OTAP were still received after removing the fabric (at least I saw some UDP message received logs), but I'm not certain if I remember correctly. Will have to test more to confirm. |
Seems like a session is used after it's released (hence the assert in the stack trace above). Marked some entries with Edit: this behavior is seen on contact-sensor app, which is a SED ( |
@marius-alex-tache Is a fabric removing another fabric, or is a fabric self-removing itself in this case? |
The use case is done using only one controller, so it's self-removing.
|
@marius-alex-tache Can you please attach, not paste, a full log, not one starting with the "[34112[D][IN]Expiring all sessions for fabric 0x1!!" bit? I need to understand what happened before that bit, because for self-removing we should not really be getting there at all. |
Hi, sorry about the logs. I've attached one: contact-sensor-remove-fabric.log. You can see the number of OTAP retries (8 - max retries number) in the last lines of the log. |
OK, so in that log at the point when we receive RemoveFabric we have three sessions around:
Then the following things happen:
Looking at the screenshot above, it's not 100% clear where we are aborting, because a lot of the line numbers on the right are cut off. Is it on this line in
or somewhere else? If it's on that line, then what would be really useful would be a complete log and from the same run the value of |
I've added a log at the beginning of the lambda function. Please see this log: contact-sensor-remove-fabric-print-exchange-context.log. It seems the exchange context is |
Thanks, I think I know what's going on now. In that log, we tear down the session here:
Now the exchange should have been removed from the MRP bits as part of that, but it's not because we end up doing the following:
The key part: Item 7 above is an API contract violation. You are NOT allowed to call @Damian-Nordic @holbrookt @carol-apple what is the best way to fix this code to follow the expected contract for exchanges? Some thoughts:
Other options? |
Nulling out |
@marius-alex-tache Could you verify that the change in #24818 fixes the issue for you? |
I confirm the fix works. Thank you! @fuxiaoming-lumi, could you also try to test on your side with this fix? |
I can also confirm this bug and that the fix corrects the issue (on a Nordic/Thread platform). |
I also confirm this bug is fixed with our product. Thanks all your support! |
Reproduction steps
Bug prevalence
100% reproduce
GitHub hash of the SDK that was being used
83c3b6b
Platform
efr32, k32w
Platform Version(s)
No response
Anything else?
With NXP K32W Platform:
chip-ota-provider-app-with-nxp-k32w.txt
chip-tool-nxp-k32w.txt
dut-nxp-k32w.txt
With Silicon Labs MG24 Platform:
chip-ota-provider-app-silicon-mg24.txt
chip-tool-silicon-mg24.txt
dut_silicon-mg24.txt
It is necessary to verify whether other platform has the same problem.
The text was updated successfully, but these errors were encountered: