-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-fix OTA exchange leaks #20632
Re-fix OTA exchange leaks #20632
Conversation
* [ota] Fix exchange context leak It turns out that Exchange Context allocated for BDX transfer is not released on completion or connection abort. It is not seen in a happy path that results in applying and rebooting into a new firmware, but may lead to the exchange leak when the transfer is interrupted. Furthermore, if an exchange is never released, a Sleepy End Device never returns to the idle mode, needlessly draining the battery. Signed-off-by: Damian Krolik <[email protected]> * Restyled by clang-format Co-authored-by: Restyled.io <[email protected]>
PR #20632: Size comparison from 57cb679 to 7fa3055 Increases (18 builds for bl602, cc13x2_26x2, cyw30739, efr32, linux, nrfconnect)
Decreases (5 builds for cc13x2_26x2, esp32)
Full report (43 builds for bl602, cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
Is there a unit-test, or YAML test we could add to re-create the failure, and validate the fix? |
|
There is one existing test that for a successful transfer. We could build on that but maybe force the provider to be stopped (via SystemCommands) to try to re-create this failure? The happy path test is at: https://github.com/project-chip/connectedhomeip/blob/master/src/app/tests/suites/OTA_SuccessfulTransfer.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this change manually and it doesn't crash anymore but when I tried to inject faults in different places in OTA, I found two more places that need to be fixed:
diff --git a/src/app/clusters/ota-requestor/BDXDownloader.cpp b/src/app/clusters/ota-requestor/BDXDownloader.cpp
index 8a00a3820..d3b9f0398 100644
--- a/src/app/clusters/ota-requestor/BDXDownloader.cpp
+++ b/src/app/clusters/ota-requestor/BDXDownloader.cpp
@@ -201,10 +201,10 @@ void BDXDownloader::EndDownload(CHIP_ERROR reason)
{
mImageProcessor->Abort();
}
- SetState(State::kIdle, OTAChangeReasonEnum::kSuccess);
// Because AbortTransfer() will generate a StatusReport to send.
PollTransferSession();
+ SetState(State::kIdle, OTAChangeReasonEnum::kSuccess);
}
else
{
diff --git a/src/app/clusters/ota-requestor/DefaultOTARequestor.cpp b/src/app/clusters/ota-requestor/DefaultOTARequestor.cpp
index 3f4cf6782..fdc3b4364 100644
--- a/src/app/clusters/ota-requestor/DefaultOTARequestor.cpp
+++ b/src/app/clusters/ota-requestor/DefaultOTARequestor.cpp
@@ -832,8 +832,19 @@ CHIP_ERROR DefaultOTARequestor::StartDownload(OperationalDeviceProxy & devicePro
mBdxDownloader->SetMessageDelegate(&mBdxMessenger);
mBdxDownloader->SetStateDelegate(this);
- ReturnErrorOnFailure(mBdxDownloader->SetBDXParams(initOptions, kDownloadTimeoutSec));
- return mBdxDownloader->BeginPrepareDownload();
+ CHIP_ERROR error = mBdxDownloader->SetBDXParams(initOptions, kDownloadTimeoutSec);
+
+ if (error == CHIP_NO_ERROR)
+ {
+ error = mBdxDownloader->BeginPrepareDownload();
+ }
+
+ if (error != CHIP_NO_ERROR)
+ {
+ mBdxMessenger.Reset();
+ }
+
+ return error;
}
The first is needed because if we call SetState(State::kIdle, OTAChangeReasonEnum::kSuccess);
to early and close the exchange, we won't manage to send the status report message to inform the provider about the failure. The other one is just another exchange leak if preparation of the download fails in an early stage.
Problem
OTA exchanges can leak on error.
Change overview
BDXMessenger
to address the crashes initially observed with [ota] Fix exchange context leak in OTA requestor #20304.Fixes #20565
Testing
Ran the steps in #20563 (comment) and verified those do not crash.