-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Missed Declare from the collator #2362
Description
During my localnet tests I've observed this situation.
There are 4 validators and 4 parachains. Each parachain has 1 collator.
At the beginning, all four collators declare that they are collators. However, for some reason, one validator missed the declaration from one of the collators. Therefore, all advertisements from that collator were discarded by that validator.
On the other hand, the collator was sure that it sent the declaration and there were no disconnects, so it kept sending advertisements and didn't re-send the declaration so it never realized that nobody can hear him.
I am not well versed in the semantics of the network bridge so I am not sure if that is supposed to work.
- If the network bridge can drop a message/notification and still carry on without that peer disconnected, then we should address this on higher level (e.g., naively, resend declaration once in a while).
- If the network bridge guarantees us that the notifications for the same peer are delivered in order (e.g. A & B sent to the same peer, then either none of messages are delivered, only A delivered, or A & B delivered. It's not possible that only B is delivered) then potentially there is a bug in the network code
Details:
polkadot: 689baf2
cumulus: f920fed2c82c019ef3067b70a889b1bb4e5212b7
substrate: dab0178a5152ba75e00646251b6b99d32206ca21
the failing collator was phala-blockchain, built at:
0b8ec3444809fa0dc6479b25fb2a8db3fcc8d27a, using
cumulus: f920fed2c82c019ef3067b70a889b1bb4e5212b7
substrate: dab0178a5152ba75e00646251b6b99d32206ca21
polkadot: bba836b
logs https://drive.google.com/file/d/15C-WE0ebIvcspvoXy6kOdlQYXiJnib-H/view?usp=sharing