-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement shadow fail-safe data in FabricTable #19819
Implement shadow fail-safe data in FabricTable #19819
Conversation
- FabricTable management during commissioning did not properly handle the fact that committing needs to only be done on CommissioningComplete, which prevented the AddTrustedRootCertificate, UpdateNOC and AddNOC command semantics to be implemented properly and prevented proper state observation of operational credential clusters server Fixes project-chip#7695 Issue project-chip#8905 Fixes project-chip#18633 Issue project-chip#17208 Fixes project-chip#15585 This PR: - Removes direct access to FabricInfo from everywhere, which caused possibly stale FabricInfo references during commissioning. - Remove immediate committing of fabric table on UpdateNOC. - Make Fabrics, NOCs and TrustedRootCertificates attributes reflect proper partial state during fail-safe, by using the shadow data capabilities of OperationalCertificateStore and by updates to FabricInfo - Make it possible to unit test fabric table by providing the necessary lifecycle public APIs to test every modality of the commissioning flow - Make Server and DeviceController use OperationalCertificateStore to allow proper external lifecycle management of the operational cert chain. - Update all examples/controller code to new API - Remove dangerous internal APIs from FabricTable and replace with direct accessors where needed - Add more of the necessary spec validations to the UpdateNOC and AddNOC flows Testing done: - Updated all unit tests, all pass - Cert tests still pass as before - Working on further integration tests and unit tests as a follow-up noting that current state has not regressed on existing test coverage, and that new usage of OperationalCertificateStore class in FabricTable gains a large amount of additional coverage transitively via some of the existing tests making use of FabricTable.
PR #19819: Size comparison from 776c2b8 to d8864e3 Increases above 0.2%:
Increases (32 builds for cc13x2_26x2, cyw30739, k32w, linux, mbed, p6, telink)
Decreases (12 builds for cc13x2_26x2, cyw30739, k32w, linux)
Full report (32 builds for cc13x2_26x2, cyw30739, k32w, linux, mbed, p6, telink)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All follow-ups are ready to go. @bzbarsky-apple asked me to make all follow-ups in a separate PR to avoid needing to-review such a large change.
PR #19819: Size comparison from 776c2b8 to 444cbd5 Increases above 0.2%:
Increases (49 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
Decreases (12 builds for cc13x2_26x2, cyw30739, k32w, linux)
Full report (49 builds for cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, p6, telink)
|
* Don't notify attribute changes on commit * Apply follow-up comments from @bzbarsky-apple, all style/nits * Simplify AddOrUpdateInner from @msandstedt comment * Restyled * Fix comment editorials from @bzbarsky-apple * Add test for AddNOC fail-safe handling * Add test for UpdateNOC failsafe handling * Fix #19819 (comment) * Add Persistence Unit test * Restyled * Update src/credentials/FabricTable.cpp Co-authored-by: Boris Zbarsky <[email protected]> * Fix CI * Restyled Co-authored-by: Boris Zbarsky <[email protected]>
- Since project-chip#19819, commits are very small and safer. There is less surface to fail during commit. The previous large-scale fail-safe behavior stored too much data, for too long and could cause larger reverts even if nothing was committed yet. FabricTable data no longer is ever persisted without commit. - The existing code deleted fabrics unwittingly when not required, such as when powering off a light during a fail-safe for an update when there was nothing committed yet, assuming we still committed immediately. This change: - Detects failed commits - Only deletes data on failed commits - Fixes Thread driver to detect stale data where a backup was done (since we cannot prevent internal commits from OpenThread) Testing done: - Added unit test to FabricTable to generate the condition - Did manual testing of all-clusters-app/chip-tool Linux that aborted on second commissioning, during commit. Found that cleanup occurred as expected on restart - Integration/cert testing (including Cirque that validates fail-safe) still pass
- Since project-chip#19819, commits are very small and safer. There is less surface to fail during commit. The previous large-scale fail-safe behavior stored too much data, for too long and could cause larger reverts even if nothing was committed yet. FabricTable data no longer is ever persisted without commit. - The existing code deleted fabrics unwittingly when not required, such as when powering off a light during a fail-safe for an update when there was nothing committed yet, assuming we still committed immediately. This change: - Detects failed commits - Only deletes data on failed commits - Fixes Thread driver to detect stale data where a backup was done (since we cannot prevent internal commits from OpenThread) Testing done: - Added unit test to FabricTable to generate the condition - Did manual testing of all-clusters-app/chip-tool Linux that aborted on second commissioning, during commit. Found that cleanup occurred as expected on restart - Integration/cert testing (including Cirque that validates fail-safe) still pass
* Properly handle crashes/reboots during FabricTable commit - Since #19819, commits are very small and safer. There is less surface to fail during commit. The previous large-scale fail-safe behavior stored too much data, for too long and could cause larger reverts even if nothing was committed yet. FabricTable data no longer is ever persisted without commit. - The existing code deleted fabrics unwittingly when not required, such as when powering off a light during a fail-safe for an update when there was nothing committed yet, assuming we still committed immediately. This change: - Detects failed commits - Only deletes data on failed commits - Fixes Thread driver to detect stale data where a backup was done (since we cannot prevent internal commits from OpenThread) Testing done: - Added unit test to FabricTable to generate the condition - Did manual testing of all-clusters-app/chip-tool Linux that aborted on second commissioning, during commit. Found that cleanup occurred as expected on restart - Integration/cert testing (including Cirque that validates fail-safe) still pass * Restyled by clang-format * Restyled by gn * Clear commit token after deferred cleanup * Applied code review comments * Restyled by clang-format * Fix CI * Fix CI * Remove revert at restart of openthread after other changes * Apply review comments * Try to fix nRF CI * Another attempt to fix nrfconnect CI * Fix CI some more * Fix CI again Co-authored-by: Restyled.io <[email protected]>
- In project-chip#19819, TODOs were left for a couple tests of things "anecdotally" known to work, or tested outside of SDK, but not tested yet in official SDK. - This PR implements TestAddMultipleSameRootDifferentFabricId and TestInvalidChaining Fixes project-chip#21946 Fixes project-chip#21947 Testing done: - Added 2 new unit tests, both pass - Did some messing about to make assertions fail on purpose and validated the the wrong behavior was not passing
- In #19819, TODOs were left for a couple tests of things "anecdotally" known to work, or tested outside of SDK, but not tested yet in official SDK. - This PR implements TestAddMultipleSameRootDifferentFabricId and TestInvalidChaining Fixes #21946 Fixes #21947 Testing done: - Added 2 new unit tests, both pass - Did some messing about to make assertions fail on purpose and validated the the wrong behavior was not passing
- In project-chip#19819, TODOs were left for a couple tests of things "anecdotally" known to work, or tested outside of SDK, but not tested yet in official SDK. - This PR implements TestAddMultipleSameRootDifferentFabricId and TestInvalidChaining Fixes project-chip#21946 Fixes project-chip#21947 Testing done: - Added 2 new unit tests, both pass - Did some messing about to make assertions fail on purpose and validated the the wrong behavior was not passing
Problem
handle the fact that committing needs to only be done on
CommissioningComplete, which prevented the AddTrustedRootCertificate,
UpdateNOC and AddNOC command semantics to be implemented properly
and prevented proper state observation of operational credential
clusters server
Fixes #7695
Issue #8905
Fixes #18633
Issue #17208
Fixes #15585
Change overview
possibly stale FabricInfo references during commissioning.
proper partial state during fail-safe, by using the shadow data
capabilities of OperationalCertificateStore and by updates to
FabricInfo
lifecycle public APIs to test every modality of the commissioning flow
allow proper external lifecycle management of the operational cert
chain.
direct accessors where needed
flows
Testing
noting that current state has not regressed on existing test coverage,
and that new usage of OperationalCertificateStore class in FabricTable
gains a large amount of additional coverage transitively via some
of the existing tests making use of FabricTable.