Fix duplicate channel_updates in auditDb when restarting #1918

thomash-acinq · 2021-08-23T12:39:29Z

Because the previous channel update is never part of the channel state when restarting eclair, we always think that they have changed and add them to the database.
We now check that the channel update has changed when restoring it.

t-bast

This is a dangerous change, because it impacts downstream components and how we re-emit channel_updates to our peers, which may lead to spec violations if we make a mistake.

eclair-core/src/main/scala/fr/acinq/eclair/channel/Channel.scala

t-bast · 2021-08-23T13:05:05Z

eclair-core/src/main/scala/fr/acinq/eclair/channel/Channel.scala

      (state, nextState, stateData, nextStateData) match {
        // ORDER MATTERS!
        case (WAIT_FOR_INIT_INTERNAL, OFFLINE, _, normal: DATA_NORMAL) =>
+          // LocalChannelUpdate is already published when restoring the channel


I preferred having it here, it was clearer when it was explicitly done in the transition phases...we've spent a lot of time ensuring the retransmit logic worked properly, I'd avoid changing that.

I understand but the problem is that at this place we don't have enough information to know if the channel update changed or not.

I've added a channelUpdateBeforeRestore_opt in DATA_NORMAL so that we have access to it in the transition function and put everything back there.

eclair-core/src/main/scala/fr/acinq/eclair/channel/ChannelEvents.scala

pm47 · 2021-08-23T13:18:17Z

I still don't get what caused the issue precisely. Didn't the following code ensure that we wouldn't emit duplicate updates?

val channelUpdate1 = if (Announcements.areSame(candidateChannelUpdate, normal.channelUpdate)) {
  // if there was no configuration change we keep the existing channel update
  normal.channelUpdate
} else {
  log.info("refreshing channel_update due to configuration changes old={} new={}", normal.channelUpdate, candidateChannelUpdate)
  candidateChannelUpdate
}

thomash-acinq · 2021-08-23T13:43:41Z

I still don't get what caused the issue precisely. Didn't the following code ensure that we wouldn't emit duplicate updates?

val channelUpdate1 = if (Announcements.areSame(candidateChannelUpdate, normal.channelUpdate)) {
  // if there was no configuration change we keep the existing channel update
  normal.channelUpdate
} else {
  log.info("refreshing channel_update due to configuration changes old={} new={}", normal.channelUpdate, candidateChannelUpdate)
  candidateChannelUpdate
}

This code only reuses the previous channel update if it is still valid. But after that, without knowing what was the previous update we can't know if it has changed or not.

codecov-commenter · 2021-08-23T13:51:41Z

Codecov Report

Merging #1918 (ef31de1) into master (9a0fc14) will increase coverage by 0.22%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1918      +/-   ##
==========================================
+ Coverage   87.52%   87.74%   +0.22%     
==========================================
  Files         159      159              
  Lines       12104    12103       -1     
  Branches      502      506       +4     
==========================================
+ Hits        10594    10620      +26     
+ Misses       1510     1483      -27

Impacted Files	Coverage Δ
...in/scala/fr/acinq/eclair/channel/ChannelData.scala	`100.00% <ø> (ø)`
...ire/internal/channel/version0/ChannelCodecs0.scala	`96.03% <ø> (ø)`
...ire/internal/channel/version1/ChannelCodecs1.scala	`98.94% <ø> (ø)`
...ire/internal/channel/version2/ChannelCodecs2.scala	`100.00% <ø> (ø)`
...ire/internal/channel/version3/ChannelCodecs3.scala	`100.00% <ø> (ø)`
...c/main/scala/fr/acinq/eclair/channel/Channel.scala	`88.13% <100.00%> (+2.01%)`	⬆️
...main/scala/fr/acinq/eclair/db/DbEventHandler.scala	`92.53% <100.00%> (-0.43%)`	⬇️
...n/scala/fr/acinq/eclair/router/Announcements.scala	`100.00% <100.00%> (ø)`
...clair/channel/publish/ReplaceableTxPublisher.scala	`87.05% <0.00%> (+0.07%)`	⬆️
...lockchain/bitcoind/rpc/ExtendedBitcoinClient.scala	`91.42% <0.00%> (+0.25%)`	⬆️
... and 3 more

eclair-core/src/main/scala/fr/acinq/eclair/wire/internal/channel/version3/ChannelCodecs3.scala

pm47 · 2021-08-24T07:45:28Z

Here is an idea. We move the "update channel_update according to conf" code (https://github.com/ACINQ/eclair/blob/master/eclair-core/src/main/scala/fr/acinq/eclair/channel/Channel.scala#L289-L306) to the SYNCING state (there is already related code here: https://github.com/ACINQ/eclair/blob/c5046582649832248bc858522e4ebc18502e5759/eclair-core/src/main/scala/fr/acinq/eclair/channel/Channel.scala#L1668-1676).

This has the following benefits:

the change in channel_update would be caught by the onTransition handler
we do not change the general logic of emitting channel_updates (I agree with @t-bast's concerns)
it prevents us from generating thousands of new channel_update at startup when we update the conf, and will instead to it at the first reconnection.

OTOH, it also means that we would run this code at each reconnection.

thomash-acinq · 2021-08-24T08:09:21Z

the change in channel_update would be caught by the onTransition handler

we do not change the general logic of emitting channel_updates (I agree with @t-bast's concerns)

This would emit an additional LocalChannelUpdate compared to the current situation. Unless we stop emitting it on WAIT_FOR_INIT_INTERNAL -> OFFLINE which would actually simplify things.

pm47 · 2021-08-24T09:53:45Z

the change in channel_update would be caught by the onTransition handler
we do not change the general logic of emitting channel_updates (I agree with @t-bast's concerns)

This would emit an additional LocalChannelUpdate compared to the current situation.

Only if the conf changes, right? Alternatively, instead of updating the channel_update manually, we could have the channel send a CMD_UPDATE_RELAY_FEE to itself, which ensures we go to the normal flow, and is simpler to reason about. There are details though (e.g. need to add the cltv expiry delta to the command), and we still need to update channel_updates inline, e.g. in handleAddDisconnected.

Unless we stop emitting it on WAIT_FOR_INIT_INTERNAL -> OFFLINE which would actually simplify things.

That is a more risky change, even if sending a LocalChannelUpdate at this transition does feel hacky. At first glance, removing this shouldn't impact direct peers, since we are disconnected at this step. The router is another story though: LocalChannelUpdates are how it learns about private channels.

thomash-acinq · 2021-08-24T10:46:20Z

This would emit an additional LocalChannelUpdate compared to the current situation.

Only if the conf changes, right?

Yes, only if it changes. And in that case it will emit a first channel update that is outdated and then the correct one.

If we want to not change the current flow between states, I think adding channelUpdateBeforeRestore_opt to DATA_NORMAL is the best solution. Anything else will probably change some behavior of the system.

thomash-acinq · 2021-08-24T13:04:27Z

I've tried multiple ways doesn't seem possible to make it work by updating the channelUpdate in the SYNCING state.
The problem is the transition WAIT_FOR_INIT_INTERNAL -> OFFLINE. If we stop emitting the LocalChannelUpdate from there, then if there is no change, no LocalChannelUpdate will be emitted at all which is bad for the router. If we continue emitting the LocalChannelUpdate from there, then it may be an outdated one if there was a config change which leads to useless work.
Also we do additional work every time we reconnect even though it's only needed after a restart. Here is how that would look like : #1920

This is an alternative to #1918 and #1920. It's very close to the latter, except that we do check the conf only once in the `WAIT_FOR_INIT_INTERNAL` state, as opposed to at each reconnection in `SYNCING`. We do not change the `channel_update` in `WAIT_FOR_INIT_INTERNAL`, which allows us to set `previousChannelUpdate_opt=Some(normal.channelUpdate)` in the transition and fix the duplicate bug in the audit db. If there is a change in conf, there will be an additional `LocalChannelUpdate` emitted, but only at reconnection, and following the regular update flow, which should protect us against regressions. We do handle `CMD_UPDATE_RELAY_FEES` in both `OFFLINE` and `SYNCING`, because there may be a race between `CMD_UPDATE_RELAY_FEES` and `ChannelRestablish` if the conf change at restore. And there was no good reason to behave differently in those states anyway.

This is very similar to #1918 (ef31de1), but we use the internal actor state instead of the `ChannelData`. Pros: - conceptually simple, low risk of regression - `ChannelData` stays untouched Cons: - now there is a `var` in the channel Those are the minimal changes to have the simplest diff, but we can include some improvements made in ef31de1.

thomash-acinq · 2021-09-02T10:00:55Z

#1934

Fix duplicate channel_updates in auditDb when restarting

97666e8

thomash-acinq requested a review from pm47 August 23, 2021 12:39

t-bast reviewed Aug 23, 2021

View reviewed changes

Fix bad copy paste

1ead78f

thomash-acinq added 2 commits August 23, 2021 16:14

Use previousChannelUpdate_opt

8bded11

Put previousChannelUpdate_opt in DATA_NORMAL

ef31de1

thomash-acinq requested a review from t-bast August 23, 2021 14:53

t-bast reviewed Aug 23, 2021

View reviewed changes

eclair-core/src/main/scala/fr/acinq/eclair/wire/internal/channel/version3/ChannelCodecs3.scala Show resolved Hide resolved

pm47 mentioned this pull request Aug 25, 2021

Send CMD_UPDATE_RELAY_FEES to self at restore #1922

Closed

pm47 mentioned this pull request Sep 2, 2021

Keep former channel_update in channel state #1934

Closed

thomash-acinq closed this Sep 2, 2021

thomash-acinq deleted the no-duplicate-channel-update-on-restart branch September 9, 2021 13:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix duplicate channel_updates in auditDb when restarting #1918

Fix duplicate channel_updates in auditDb when restarting #1918

thomash-acinq commented Aug 23, 2021

t-bast left a comment

t-bast Aug 23, 2021 •

edited

Loading

thomash-acinq Aug 23, 2021

thomash-acinq Aug 23, 2021

pm47 commented Aug 23, 2021

thomash-acinq commented Aug 23, 2021

codecov-commenter commented Aug 23, 2021 •

edited

Loading

pm47 commented Aug 24, 2021 •

edited

Loading

thomash-acinq commented Aug 24, 2021 •

edited

Loading

pm47 commented Aug 24, 2021 •

edited

Loading

thomash-acinq commented Aug 24, 2021

thomash-acinq commented Aug 24, 2021

thomash-acinq commented Sep 2, 2021

Fix duplicate channel_updates in auditDb when restarting #1918

Fix duplicate channel_updates in auditDb when restarting #1918

Conversation

thomash-acinq commented Aug 23, 2021

t-bast left a comment

Choose a reason for hiding this comment

t-bast Aug 23, 2021 • edited Loading

Choose a reason for hiding this comment

thomash-acinq Aug 23, 2021

Choose a reason for hiding this comment

thomash-acinq Aug 23, 2021

Choose a reason for hiding this comment

pm47 commented Aug 23, 2021

thomash-acinq commented Aug 23, 2021

codecov-commenter commented Aug 23, 2021 • edited Loading

Codecov Report

pm47 commented Aug 24, 2021 • edited Loading

thomash-acinq commented Aug 24, 2021 • edited Loading

pm47 commented Aug 24, 2021 • edited Loading

thomash-acinq commented Aug 24, 2021

thomash-acinq commented Aug 24, 2021

thomash-acinq commented Sep 2, 2021

t-bast Aug 23, 2021 •

edited

Loading

codecov-commenter commented Aug 23, 2021 •

edited

Loading

pm47 commented Aug 24, 2021 •

edited

Loading

thomash-acinq commented Aug 24, 2021 •

edited

Loading

pm47 commented Aug 24, 2021 •

edited

Loading