Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chip-tool] Chip tool event subscriptions times out #17764

Closed
isiu-apple opened this issue Apr 26, 2022 · 8 comments
Closed

[chip-tool] Chip tool event subscriptions times out #17764

isiu-apple opened this issue Apr 26, 2022 · 8 comments
Assignees
Labels

Comments

@isiu-apple
Copy link
Contributor

isiu-apple commented Apr 26, 2022

Problem

When subscribing to events using chip tool, the command times out.

Proposed Solution

Chip tool event subscriptions should work without timing out.

out/apps/chip-tool/chip-tool otasoftwareupdaterequestor subscribe-event state-transition 5 10 0x1234567890 0

[1650992689511] [32397:1415601] CHIP: [TOO] Run command failure: ../../../examples/chip-tool/commands/common/CHIPCommand.cpp:392: CHIP Error 0x00000032: Timeout
zsh: segmentation fault  out/apps/chip-tool/chip-tool otasoftwareupdaterequestor subscribe-event  5 10

Issue 17764 - Chiptool.txt
Issue 17764 - OTA-P.txt
Issue 17764 - OTA-R.txt

@yunhanw-google
Copy link
Contributor

Hi @isiu-apple could you also try to use gdb to run this command, and dump the backtrace, the crash may be another different issue.

from the log, the subscription has been established, looks good for a period, then server seems lose the connection to client for unknown reason, it go through mrp restransmission
[1650992689724] [32396:1415609] CHIP: [DMG] IM RH moving to [AwaitingReportResponse]
[1650992689724] [32396:1415609] CHIP: [IN] Prepared secure message 0x10bd75560 to 0x000000000001B669 (1) of type 0x5 and protocolId (0, 1) on exchange 62629i with MessageCounter:4939000.
[1650992689724] [32396:1415609] CHIP: [IN] Sending encrypted msg 0x10bd75560 with MessageCounter:4939000 to 0x000000000001B669 (1) at monotonic time: 00000000147D4A9B msec
[1650992689725] [32396:1415609] CHIP: [DMG] Refresh Subscribe Sync Timer with max 10 seconds
[1650992689725] [32396:1415609] CHIP: [DMG] ReportsInFlight = 1 with readHandler 0, RE has no more messages
[1650992689725] [32396:1415609] CHIP: [DMG] All ReadHandler-s are clean, clear GlobalDirtySet
[1650992694727] [32396:1415399] CHIP: [DMG] Unblock report hold after min 5 seconds
[1650992694769] [32396:1415609] CHIP: [EM] Retransmitting MessageCounter:4939000 on exchange 62629i Send Cnt 0
[1650992694769] [32396:1415609] CHIP: [IN] Sending encrypted msg 0x10bd75560 with MessageCounter:4939000 to 0x000000000001B669 (1) at monotonic time: 00000000147D5E4F msec
[1650992699727] [32396:1415399] CHIP: [DMG] Refresh subscribe timer sync after 5 seconds
[1650992699727] [32396:1415399] CHIP: [DMG] All ReadHandler-s are clean, clear GlobalDirtySet
[1650992700237] [32396:1415399] CHIP: [EM] Retransmitting MessageCounter:4939000 on exchange 62629i Send Cnt 1
[1650992700237] [32396:1415399] CHIP: [IN] Sending encrypted msg 0x10bd75560 with MessageCounter:4939000 to 0x000000000001B669 (1) at monotonic time: 00000000147D73AC msec
[1650992701727] [32396:1415609] CHIP: [DMG] Time out! failed to receive status response from Exchange: 62629i
[1650992701727] [32396:1415609] CHIP: [DMG] OnReportConfirm: NumReports = 0
[1650992701727] [32396:1415609] CHIP: [DMG] IM RH moving to [AwaitingDestruction]
[1650992706043] [32396:1415399] CHIP: [EM] Retransmitting MessageCounter:4939000 on exchange 62629i Send Cnt 2

@isiu-apple isiu-apple self-assigned this Apr 27, 2022
@isiu-apple
Copy link
Contributor Author

isiu-apple commented Apr 27, 2022

Discussed this issue with @bzbarsky-apple and found that Chiptool is timing out after the default of 30 seconds. If I pass in the max value for the optional --timeout chiptool param, then I do not see the timeout error shortly after event subscriptions anymore.

out/apps/chip-tool/chip-tool otasoftwareupdaterequestor subscribe-event state-transition 5 10 0x1234567890 0 --timeout 65535

@vivien-apple, any suggestions on what to do about this and whether event subscriptions should run forever with no default timeout?

@isiu-apple
Copy link
Contributor Author

isiu-apple commented Apr 27, 2022

Actually, even with --timeout 65535, I am seeing a liveness timeout in chip tool after a while:

[1651083054076] [2873:97118] CHIP: [DMG] Subscription Liveness timeout with SubscriptionID = 0x9E05A4691FEA3B5E, Peer = 01:0000001234567890
[1651083054076] [2873:97118] CHIP: [DMG] mResubscribePolicy is null
[1651083054439] [2873:97118] CHIP: [TOO] Error: ../../../examples/chip-tool/third_party/connectedhomeip/src/app/ReadClient.cpp:782: CHIP Error 0x00000032: Timeout

Issue 17764 - Chiptool with timeout.txt
Issue 17764 - OTA-P with timeout.txt
Issue 17764 - OTA-R with timeout.txt

@isiu-apple
Copy link
Contributor Author

Same subscription liveness timeout issue even in interactive mode:

out/apps/chip-tool/chip-tool interactive start
otasoftwareupdaterequestor subscribe-event state-transition 5 10 0x1234567890 0 --timeout 65535
[1651097185499] [4617:217786] CHIP: [DMG] Subscription Liveness timeout with SubscriptionID = 0x1071F570A57570AE, Peer = 01:0000001234567890
[1651097185499] [4617:217786] CHIP: [DMG] mResubscribePolicy is null
[1651097185865] [4617:217786] CHIP: [TOO] Error: ../../../examples/chip-tool/third_party/connectedhomeip/src/app/ReadClient.cpp:782: CHIP Error 0x00000032: Timeout

@bzbarsky-apple
Copy link
Contributor

Those new logs show us setting up a subscription with min-interval 5, max-interval 10. Then on the server we have, toward the end:

[1651081744936] [2860:78703] CHIP: [DMG] Building Reports for ReadHandler with LastReportGeneration = 7 DirtyGeneration = 0
...
[1651081744941] [2860:78703] CHIP: [DMG] StatusResponseMessage =
...
[1651081749939] [2860:78473] CHIP: [DMG] Unblock report hold after min 5 seconds
[1651083054076] [2860:78703] CHIP: [DMG] Refresh subscribe timer sync after 5 seconds
...
[1651083054076] [2860:78703] CHIP: [DMG] Building Reports for ReadHandler with LastReportGeneration = 7 DirtyGeneration = 0
...
[1651083054077] [2860:78703] CHIP: [IN] Prepared secure message 0x10dec1560 to 0x000000000001B669 (1)  of type 0x5 and protocolId (0, 1) on exchange 29601i with MessageCounter:4655140.
[1651083054077] [2860:78703] CHIP: [IN] Sending encrypted msg 0x10dec1560 with MessageCounter:4655140 to 0x000000000001B669 (1) at monotonic time: 00000000006FB80A msec
[1651083054077] [2860:78703] CHIP: [DMG] Refresh Subscribe Sync Timer with max 10 seconds
[1651083054077] [2860:78703] CHIP: [DMG] <RE> ReportsInFlight = 1 with readHandler 0, RE has no more messages
[1651083054110] [2860:78703] CHIP: [DMG] All ReadHandler-s are clean, clear GlobalDirtySet
[1651083054440] [2860:78703] CHIP: [EM] Received message of type 0x10 with protocolId (0, 0) and MessageCounter:11528012 on exchange 29601i
[1651083054440] [2860:78703] CHIP: [EM] Found matching exchange: 29601i, Delegate: 0x7fe7a4d04080
[1651083054440] [2860:78703] CHIP: [EM] Rxd Ack; Removing MessageCounter:4655140 from Retrans Table on exchange 29601i
[1651083054440] [2860:78703] CHIP: [EM] Removed CHIP MessageCounter:4655140 from RetransTable on exchange 29601i
[1651083059078] [2860:78473] CHIP: [DMG] Unblock report hold after min 5 seconds
[1651083064079] [2860:78703] CHIP: [DMG] Refresh subscribe timer sync after 5 seconds
[1651083064079] [2860:78703] CHIP: [DMG] All ReadHandler-s are clean, clear GlobalDirtySet
[1651083066078] [2860:78703] CHIP: [DMG] Time out! failed to receive status response from Exchange: 29601i

So we sent a report, got the expected status response. Then after 5 seconds the min interval has passed and we unblock reporting. Then 1304 (!) seconds are claimed to pass before our 10s timer fires? At that point we go head and send another data report, get an ack (that's the "Received message of type 0x10 with protocolId (0, 0)") but no status response. Then another 12 seconds later we decided that we have timed out and killed the subscription.

On the client side, we have:

[1651081744937] [2873:97118] CHIP: [DMG] ReportDataMessage =
...
[1651081744937] [2873:97118] CHIP: [IN] Prepared secure message 0x7fa2b9abaa68 to 0x0000001234567890 (1)  of type 0x1 and protocolId (0, 1) on exchange 29600r with MessageCounter:11528011.
...
[1651083054076] [2873:97118] CHIP: [DMG] Subscription Liveness timeout with SubscriptionID = 0x9E05A4691FEA3B5E, Peer = 01:0000001234567890
...
[1651083054439] [2873:97118] CHIP: [EM] Received message of type 0x5 with protocolId (0, 1) and MessageCounter:4655140 on exchange 29601r

So we got the report data that was the next-to-last one the server sent us. We sent a response. Then we got nothing for 1309 s (!) and decided we have timed out. Soon after that we get that last data report and ack it, but we are already shutting down at that point.

So did 20-some minutes really pass there? Or was there some sort of clock adjustment that happened (which should not matter, if we are using monotonic clocks), or the device(s) involved going to sleep or something?

@isiu-apple
Copy link
Contributor Author

Okay, yes it's likely the laptop going to sleep then. I left the test running and then came back to check on it after a while each time to see if there were any errors or was running fine. I tried again monitoring it the whole time and I don't see the liveness error anymore.

@vivien-apple
Copy link
Contributor

Discussed this issue with @bzbarsky-apple and found that Chiptool is timing out after the default of 30 seconds. If I pass in the max value for the optional --timeout chiptool param, then I do not see the timeout error shortly after event subscriptions anymore.

out/apps/chip-tool/chip-tool otasoftwareupdaterequestor subscribe-event state-transition 5 10 0x1234567890 0 --timeout 65535

@vivien-apple, any suggestions on what to do about this and whether event subscriptions should run forever with no default timeout?

The correct behaviour is to run chip-tool interactive start and then inside the interactive mode: otasoftwareupdaterequestor subscribe-event state-transition 5 10 0x1234567890 0
This way it will never times out and you can run other commands.

@isiu-apple
Copy link
Contributor Author

isiu-apple commented Apr 28, 2022

Closing this ticket because event subscriptions work without timeout crash when run in interactive mode.

Created #17847 to track updating documentation to reflect chiptool timeout behavior.

@isiu-apple isiu-apple changed the title Chip tool event subscriptions times out [chip-tool] Chip tool event subscriptions times out Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants