multi: resend shutdown on reestablish #8447
multi: resend shutdown on reestablish #8447ellemouton wants to merge 5 commits intolightningnetwork:masterfrom
Conversation
This commit adds an itest to demonstrate that the following bug exists: If channel Shutdown is initiated but then a re-connect is done before the shutdown is complete, then the initiating node currently does not properly resend the Shutdown message as required by the spec. This will be fixed in an upcoming commit.
This method updates the channel status to ChanStatusShutdownSent to indicate that shutdown of the channel has started.
In this commit, we start using the new MarkShutdownSent before we send a Shutdown message and we use the DeliveryScript method to check if there is a persisted delivery script that we should use if we do a reconnect. With this commit, we can also fix the added itest to show that shutdown now correctly continues after a reconnect.
WalkthroughThis update introduces critical enhancements to the Lightning Network Daemon (LND) aimed at improving the channel shutdown process, specifically ensuring compliance with BOLT2 requirements for retransmitting Changes
Assessment against linked issues
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
09ba12f to
f3be6c0
Compare
|
@coderabbitai review |
There was a problem hiding this comment.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (11)
- channeldb/channel.go (6 hunks)
- channeldb/channel_test.go (1 hunks)
- docs/release-notes/release-notes-0.18.0.md (1 hunks)
- htlcswitch/interfaces.go (1 hunks)
- htlcswitch/link.go (1 hunks)
- itest/lnd_coop_close_with_htlcs_test.go (2 hunks)
- lnwallet/chancloser/chancloser.go (1 hunks)
- lnwallet/chancloser/chancloser_test.go (1 hunks)
- lnwallet/chancloser/interface.go (1 hunks)
- lnwallet/channel.go (2 hunks)
- peer/brontide.go (4 hunks)
Files not reviewed due to errors (2)
- peer/brontide.go (Error: unable to parse review)
- channeldb/channel.go (Error: unable to parse review)
Files skipped from review due to trivial changes (1)
- htlcswitch/link.go
Additional comments: 14
lnwallet/chancloser/interface.go (1)
- 38-40: The addition of the
MarkShutdownSentmethod to theChannelinterface is a necessary change to support the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and the BOLT2 requirements.itest/lnd_coop_close_with_htlcs_test.go (2)
- 18-34: The restructuring of test functions and the addition of new test cases (
no restartandwith restart) to verify cooperative channel closure with pending HTLCs, including restart scenarios, are well-implemented. These changes effectively enhance the test coverage for the new functionality, ensuring that the system behaves as expected under conditions that previously led to spec non-compliance.- 125-253: The addition of the
coopCloseWithHTLCsWithRestarttest function, which tests the cooperative close flow when an HTLC is still pending on the channel and ensures that the shutdown process continues as expected even if a channel re-establish happens after one party has already initiated the shutdown, is a critical enhancement. This test directly addresses the PR's objectives by demonstrating compliance with the BOLT2 requirements regarding the retransmission ofshutdownmessages.docs/release-notes/release-notes-0.18.0.md (1)
- 76-79: The release notes accurately describe the changes made in the PR, specifically mentioning the handling of in-flight HTLCs during channel shutdown and the re-initialization of the shutdown exchange upon reconnection. This documentation is crucial for users and developers to understand the improvements and how they comply with BOLT2 requirements.
lnwallet/chancloser/chancloser_test.go (1)
- 157-159: The addition of the
MarkShutdownSentmethod to themockChannelstruct in the test file is a necessary change to support testing the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and ensures that the tests can accurately simulate the behavior of the system with the new changes.htlcswitch/interfaces.go (1)
- 142-142: Corrected typo in method name from
DiableAddstoDisableAdds. This change improves readability and accuracy of the method name.lnwallet/chancloser/chancloser.go (1)
- 356-366: Persisting the delivery script before sending the shutdown message ensures that the same delivery script is used if a reconnect happens, aligning with the PR's objectives to comply with BOLT2 specifications for message retransmission. This change enhances the robustness of the channel shutdown process.
channeldb/channel_test.go (6)
- 1161-1196: The test function
TestDeliveryScriptcorrectly tests the functionality of setting and retrieving a channel's delivery script when its status is updated toChanStatusShutdownSent. The test follows a logical sequence: creating a test channel, verifying the absence of a delivery script initially, updating the channel status and setting the delivery script, and finally asserting that the status and script are correctly updated and retrieved. The use ofrequirestatements ensures that the test will halt on the first error, which is appropriate for this context.- 1167-1169: The error handling for
MakeTestDBis correctly implemented usingrequire.NoError, ensuring that the test halts immediately if the test database cannot be created. This is a best practice for handling setup errors in tests.- 1186-1186: The method
MarkShutdownSentis correctly called with the expected parameters, including the delivery script and the boolean indicating the initiator status. This is key to testing the behavior under review.- 1192-1195: The retrieval and assertion of the delivery script using
DeliveryScriptandrequire.Equalare correctly implemented, ensuring that the script set earlier in the test is accurately retrieved and matches the expected value.- 1176-1177: The use of
require.ErrorwithErrNoDeliveryScriptcorrectly tests the expected behavior when attempting to retrieve a delivery script before one has been set. This ensures the method behaves as expected in error conditions.- 1189-1190: The assertions using
require.Trueto check the channel status after callingMarkShutdownSentare correctly placed and validate that the channel status is updated as expected. This is crucial for ensuring the method's side effects are as intended.lnwallet/channel.go (1)
- 7139-7139: An empty line was added in the
newOutgoingHtlcResolutionfunction. This change seems to be for readability or formatting purposes.
|
|
||
| status := ChanStatusShutdownSent | ||
| if locallyInitiated { | ||
| status |= ChanStatusLocalCloseInitiator |
There was a problem hiding this comment.
We can't add another channel status here because on restart, execution will hit this if statement and the link won't be loaded:
Lines 881 to 882 in cf4f468
The reason the itest works is that coop close then proceeds via restartCoopClose, but when the coop close transaction is being made it will just delete the HTLC and burn it to fees. This can be seen if you log the coop close transaction and proposedFee here:
Lines 7821 to 7825 in cf4f468
It can also be observed because no UpdateFulfillHTLC is sent during the itest.
Instead, we should just be able to use the existence of a delivery script to determine whether or not we need to continue coop close
There was a problem hiding this comment.
suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.
There was a problem hiding this comment.
We'd still need the checks for older nodes, so this would be adding a special case and duplicating some logic. Ultimately, loading the link with a non-ChanStatusDefault state means refactoring the way. the channel status is used because the link performs the isBorked check before trying to update anything. The way the check is written currently means it would fail for ChanStatusShutdownSent. It's certainly doable, but I don't think changing it is worth the risk
There was a problem hiding this comment.
The issue is that it isn't default though. We're definitely in a different state that isn't normal operation and can't accept new HTLC adds. Can you clarify what "the risk" is? I'm trying to weigh the cost of having increasingly fragmented logic (where consequences of changing stuff leaks like it is doing now) vs the cost of fixing things so the code straightforwardly reflects what is supposed to happen.
There was a problem hiding this comment.
There's no issue with using ChanStatusDefault here, the link just needs to be aware that it should send shutdown. The risk is that we make a costly mistake when changing how the status field is used
There was a problem hiding this comment.
I think there's a way to navigate this. For one, it's not really functioning as a "status" as it's a bit-vector. Instead it's a collection of flags. So really what the ChanStatusDefault check is doing is trying to check if specific flags are unset, and act accordingly. I think perhaps inverting the check and rather than checking "is it default", we should check "is it not have any one of these conditions". We can always push it to an extra field but I do think that the way we handle ChannelStatus right now is very fragile and should be reworked to be more structurally correct. Maybe an issue for a less tactical PR though.
ProofOfKeags
left a comment
There was a problem hiding this comment.
suggestion: Seems straightforward enough. I am glad you put in the ChanStatusShutdownSent, but in order to reconcile it with some more architectural shifts I'm trying to make, I'd recommend renaming it to ChanStatusTerminal or something like that. The reasons for this are a few. The spec merely states that we MUST do it if we've sent shutdown. It doesn't say we can't do it if we've received but not yet sent shutdown. By marking the channel as terminal the moment a valid shutdown is sent either direction, we can make the stored state more representative of channel state as opposed to link state. Having the message passing details leak into our persistent state without being symmetric (having state for each half of the duplex connection) is sus to me. So if we don't want to separate the persistence from the link state details we should probably also include state that tracks whether we've received shutdown as well.
Other than that, I think the implementation seems fine except for the two blocking comments I have which should be very straightforward to address.
| // Show that the channel is seen as active again by Alice and Bob. | ||
| // | ||
| // NOTE: This is a bug and will be fixed in an upcoming commit. | ||
| ht.AssertChannelActive(alice, chanPoint) | ||
| ht.AssertChannelActive(bob, chanPoint) |
There was a problem hiding this comment.
Is it normal for us to include "wrong" tests in intermediate commits? My gut tells me we want to assert the channel is inactive here knowing full well that the test fails. I'm not insisting we change it here but I probably wouldn't do things this way if it were me.
There was a problem hiding this comment.
I really like this way of telling a story of a PR. So it's mostly for review & to make every commit "correct" meaning that any commit can be reverted to & it wont break the build of the main code or the pass rate of the tests
There was a problem hiding this comment.
I agree with the storyline thing 100%, I just usually start with the correct-but-failing test. What I'm concerned about is that you don't want every commit to be right, do you? I think including a test that tests the wrong semantics is worse than no test at all and if you happen to bisect your way to this commit without realizing it then you have an "anti-test" lurking. Maybe there's no feasible way this would happen IRL and it's not a concern but I just want to state that if a wrong test were to enter the codebase it is far worse than no test at all.
fwiw TDD recommends writing the test so that it fails first.
There was a problem hiding this comment.
continuing this discussion here: #8464 (comment)
|
|
||
| status := ChanStatusShutdownSent | ||
| if locallyInitiated { | ||
| status |= ChanStatusLocalCloseInitiator |
There was a problem hiding this comment.
suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.
There was a problem hiding this comment.
The link needs to be put in shutdown mode (see usage of OnCommitOnce). The restartCoopClose logic should change here to accommodate that(or maybe there's another way of accomplishing the same thing). Additionally, if we send shutdown before the link is loaded and we owe update_add_htlc + commit_sig and then send those after, we've violated the spec. I like the approach of adding a channel status since it would make some link processing easier, but I am very hesitant about changing the semantics of the status field since if we mess up, channels get borked
Summary
In this PR we ensure that if a re-establish happens after
shutdownissent but before fee negotiation starts, then
shutdownis correctlyre-sent and the coop closure continues. This includes ensuring that
the delivery address sent in the first
shutdownis the one that isused in the final co-op close tx.
The issue
The issue is that we only mark a channel as
ChanStatusCoopBroadcastedat the time of actually doing the broadcast. This means that if there is a
re-connect between the
shutdownmessage being sent and the coopclose tx being finalised then we will forget that we were in the middle of a
shutdown and will continue with normal operation on restart. This is not
compliant with the spec which says that on re-connect, if we previously
sent a
shutdownthen we MUST resend it again.Fix overview
The fix is done by adding a new channel status:
ChanStatusShutdownSentwhich we activate when we are about to send
shutdownto our peer. Wealso persist our delivery address with this status so that we use the same
delivery address on restart.
PR flow
a test for these.
behaviour
Fixes #8397