multi: resend shutdown on reestablish by ellemouton · Pull Request #8447 · lightningnetwork/lnd

ellemouton · 2024-01-31T12:50:57Z

Summary

In this PR we ensure that if a re-establish happens after shutdown is
sent but before fee negotiation starts, then shutdown is correctly
re-sent and the coop closure continues. This includes ensuring that
the delivery address sent in the first shutdown is the one that is
used in the final co-op close tx.

The issue

The issue is that we only mark a channel as ChanStatusCoopBroadcasted
at the time of actually doing the broadcast. This means that if there is a
re-connect between the shutdown message being sent and the coop
close tx being finalised then we will forget that we were in the middle of a
shutdown and will continue with normal operation on restart. This is not
compliant with the spec which says that on re-connect, if we previously
sent a shutdown then we MUST resend it again.

Fix overview

The fix is done by adding a new channel status: ChanStatusShutdownSent
which we activate when we are about to send shutdown to our peer. We
also persist our delivery address with this status so that we use the same
delivery address on restart.

PR flow

First, the bug is recreated in an itest.
Then, we a add the new status along with DB write and read methods &
a test for these.
Finally, we make use of the new methods & fix the itest to show the correct
behaviour

Fixes #8397

This commit adds an itest to demonstrate that the following bug exists: If channel Shutdown is initiated but then a re-connect is done before the shutdown is complete, then the initiating node currently does not properly resend the Shutdown message as required by the spec. This will be fixed in an upcoming commit.

This method updates the channel status to ChanStatusShutdownSent to indicate that shutdown of the channel has started.

In this commit, we start using the new MarkShutdownSent before we send a Shutdown message and we use the DeliveryScript method to check if there is a persisted delivery script that we should use if we do a reconnect. With this commit, we can also fix the added itest to show that shutdown now correctly continues after a reconnect.

coderabbitai · 2024-01-31T12:51:03Z

Walkthrough

This update introduces critical enhancements to the Lightning Network Daemon (LND) aimed at improving the channel shutdown process, specifically ensuring compliance with BOLT2 requirements for retransmitting shutdown messages upon reconnection. The changes include the introduction of a new channel status to mark when a shutdown has been initiated, the ability to persist and retrieve a delivery script for a channel's shutdown, and various corrections and improvements across tests and documentation to support these features.

Changes

Files	Change Summaries
`channeldb/channel.go`, `.../channel_test.go`	Added delivery script key, new channel status `ChanStatusShutdownSent`, and related methods/tests.
`docs/release-notes/release-notes-0.18.0.md`	Updated with changes on HTLCs handling and AMP struct population during shutdown.
`htlcswitch/interfaces.go`, `.../link.go`	Corrected typo from `DiableAdds` to `DisableAdds`.
`itest/.../lnd_coop_close_with_htlcs_test.go`	New test cases for channel closure with HTLCs and improved test structure.
`lnwallet/chancloser/...`, `lnwallet/channel.go`	Introduced methods for marking shutdown initiation and handling delivery script persistence.
`peer/brontide.go`	Adjusted cooperative close process and channel status handling.

Assessment against linked issues

Objective	Addressed	Explanation
Ensure retransmission of `shutdown` on reestablish as per BOLT2 (#8397)	✅
Address issue of not retransmitting `shutdown` post #8167 if `closing_signed` not yet sent (#8397)	✅
Persist state before sending `shutdown` for restarting cooperative close on reestablish (#8397)	✅
Ensure the pkscript re-transmitted in `shutdown` is identical to the prior one (#8397)	✅
Maintain original ordering of owed `update_*`, `commit_sig`, and `shutdown` messages (#8397)	✅

Poem

In the realm of lightning, a change was sown,
Through fields of code, new seeds have grown.
🐇 With a hop and a skip, the shutdown's sent,
In harmony, channels close, our efforts well spent.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository from git and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

ellemouton · 2024-01-31T12:52:24Z

@coderabbitai review

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between cf4f468 and f3be6c0.

Files selected for processing (11)

channeldb/channel.go (6 hunks)
channeldb/channel_test.go (1 hunks)
docs/release-notes/release-notes-0.18.0.md (1 hunks)
htlcswitch/interfaces.go (1 hunks)
htlcswitch/link.go (1 hunks)
itest/lnd_coop_close_with_htlcs_test.go (2 hunks)
lnwallet/chancloser/chancloser.go (1 hunks)
lnwallet/chancloser/chancloser_test.go (1 hunks)
lnwallet/chancloser/interface.go (1 hunks)
lnwallet/channel.go (2 hunks)
peer/brontide.go (4 hunks)

Files not reviewed due to errors (2)

peer/brontide.go (Error: unable to parse review)
channeldb/channel.go (Error: unable to parse review)

Files skipped from review due to trivial changes (1)

htlcswitch/link.go

Additional comments: 14

lnwallet/chancloser/interface.go (1)

38-40: The addition of the MarkShutdownSent method to the Channel interface is a necessary change to support the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and the BOLT2 requirements.

itest/lnd_coop_close_with_htlcs_test.go (2)

18-34: The restructuring of test functions and the addition of new test cases (no restart and with restart) to verify cooperative channel closure with pending HTLCs, including restart scenarios, are well-implemented. These changes effectively enhance the test coverage for the new functionality, ensuring that the system behaves as expected under conditions that previously led to spec non-compliance.

125-253: The addition of the coopCloseWithHTLCsWithRestart test function, which tests the cooperative close flow when an HTLC is still pending on the channel and ensures that the shutdown process continues as expected even if a channel re-establish happens after one party has already initiated the shutdown, is a critical enhancement. This test directly addresses the PR's objectives by demonstrating compliance with the BOLT2 requirements regarding the retransmission of shutdown messages.

docs/release-notes/release-notes-0.18.0.md (1)

76-79: The release notes accurately describe the changes made in the PR, specifically mentioning the handling of in-flight HTLCs during channel shutdown and the re-initialization of the shutdown exchange upon reconnection. This documentation is crucial for users and developers to understand the improvements and how they comply with BOLT2 requirements.

lnwallet/chancloser/chancloser_test.go (1)

157-159: The addition of the MarkShutdownSent method to the mockChannel struct in the test file is a necessary change to support testing the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and ensures that the tests can accurately simulate the behavior of the system with the new changes.

htlcswitch/interfaces.go (1)

142-142: Corrected typo in method name from DiableAdds to DisableAdds. This change improves readability and accuracy of the method name.

lnwallet/chancloser/chancloser.go (1)

356-366: Persisting the delivery script before sending the shutdown message ensures that the same delivery script is used if a reconnect happens, aligning with the PR's objectives to comply with BOLT2 specifications for message retransmission. This change enhances the robustness of the channel shutdown process.

channeldb/channel_test.go (6)

1161-1196: The test function TestDeliveryScript correctly tests the functionality of setting and retrieving a channel's delivery script when its status is updated to ChanStatusShutdownSent. The test follows a logical sequence: creating a test channel, verifying the absence of a delivery script initially, updating the channel status and setting the delivery script, and finally asserting that the status and script are correctly updated and retrieved. The use of require statements ensures that the test will halt on the first error, which is appropriate for this context.

1167-1169: The error handling for MakeTestDB is correctly implemented using require.NoError, ensuring that the test halts immediately if the test database cannot be created. This is a best practice for handling setup errors in tests.

1186-1186: The method MarkShutdownSent is correctly called with the expected parameters, including the delivery script and the boolean indicating the initiator status. This is key to testing the behavior under review.

1192-1195: The retrieval and assertion of the delivery script using DeliveryScript and require.Equal are correctly implemented, ensuring that the script set earlier in the test is accurately retrieved and matches the expected value.

1176-1177: The use of require.Error with ErrNoDeliveryScript correctly tests the expected behavior when attempting to retrieve a delivery script before one has been set. This ensures the method behaves as expected in error conditions.

1189-1190: The assertions using require.True to check the channel status after calling MarkShutdownSent are correctly placed and validate that the channel status is updated as expected. This is crucial for ensuring the method's side effects are as intended.

lnwallet/channel.go (1)

7139-7139: An empty line was added in the newOutgoingHtlcResolution function. This change seems to be for readability or formatting purposes.

lnwallet/channel.go

Crypt-iQ · 2024-01-31T15:30:33Z

channeldb/channel.go

+
+	status := ChanStatusShutdownSent
+	if locallyInitiated {
+		status |= ChanStatusLocalCloseInitiator


We can't add another channel status here because on restart, execution will hit this if statement and the link won't be loaded:

lnd/peer/brontide.go

Lines 881 to 882 in cf4f468

if !dbChan.HasChanStatus(channeldb.ChanStatusDefault) &&

!dbChan.HasChanStatus(channeldb.ChanStatusRestored) {

The reason the itest works is that coop close then proceeds via restartCoopClose, but when the coop close transaction is being made it will just delete the HTLC and burn it to fees. This can be seen if you log the coop close transaction and proposedFee here:

lnd/lnwallet/channel.go

Lines 7821 to 7825 in cf4f468

closeTx := CreateCooperativeCloseTx(

fundingTxIn(lc.channelState), lc.channelState.LocalChanCfg.DustLimit,

lc.channelState.RemoteChanCfg.DustLimit, ourBalance, theirBalance,

localDeliveryScript, remoteDeliveryScript, closeTxOpts...,

)

It can also be observed because no UpdateFulfillHTLC is sent during the itest.

Instead, we should just be able to use the existence of a delivery script to determine whether or not we need to continue coop close

suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.

We'd still need the checks for older nodes, so this would be adding a special case and duplicating some logic. Ultimately, loading the link with a non-ChanStatusDefault state means refactoring the way. the channel status is used because the link performs the isBorked check before trying to update anything. The way the check is written currently means it would fail for ChanStatusShutdownSent. It's certainly doable, but I don't think changing it is worth the risk

The issue is that it isn't default though. We're definitely in a different state that isn't normal operation and can't accept new HTLC adds. Can you clarify what "the risk" is? I'm trying to weigh the cost of having increasingly fragmented logic (where consequences of changing stuff leaks like it is doing now) vs the cost of fixing things so the code straightforwardly reflects what is supposed to happen.

There's no issue with using ChanStatusDefault here, the link just needs to be aware that it should send shutdown. The risk is that we make a costly mistake when changing how the status field is used

I think there's a way to navigate this. For one, it's not really functioning as a "status" as it's a bit-vector. Instead it's a collection of flags. So really what the ChanStatusDefault check is doing is trying to check if specific flags are unset, and act accordingly. I think perhaps inverting the check and rather than checking "is it default", we should check "is it not have any one of these conditions". We can always push it to an extra field but I do think that the way we handle ChannelStatus right now is very fragile and should be reworked to be more structurally correct. Maybe an issue for a less tactical PR though.

ProofOfKeags

suggestion: Seems straightforward enough. I am glad you put in the ChanStatusShutdownSent, but in order to reconcile it with some more architectural shifts I'm trying to make, I'd recommend renaming it to ChanStatusTerminal or something like that. The reasons for this are a few. The spec merely states that we MUST do it if we've sent shutdown. It doesn't say we can't do it if we've received but not yet sent shutdown. By marking the channel as terminal the moment a valid shutdown is sent either direction, we can make the stored state more representative of channel state as opposed to link state. Having the message passing details leak into our persistent state without being symmetric (having state for each half of the duplex connection) is sus to me. So if we don't want to separate the persistence from the link state details we should probably also include state that tracks whether we've received shutdown as well.

Other than that, I think the implementation seems fine except for the two blocking comments I have which should be very straightforward to address.

ProofOfKeags · 2024-01-31T18:56:08Z

itest/lnd_coop_close_with_htlcs_test.go

+	// Show that the channel is seen as active again by Alice and Bob.
+	//
+	// NOTE: This is a bug and will be fixed in an upcoming commit.
+	ht.AssertChannelActive(alice, chanPoint)
+	ht.AssertChannelActive(bob, chanPoint)


Is it normal for us to include "wrong" tests in intermediate commits? My gut tells me we want to assert the channel is inactive here knowing full well that the test fails. I'm not insisting we change it here but I probably wouldn't do things this way if it were me.

I really like this way of telling a story of a PR. So it's mostly for review & to make every commit "correct" meaning that any commit can be reverted to & it wont break the build of the main code or the pass rate of the tests

I agree with the storyline thing 100%, I just usually start with the correct-but-failing test. What I'm concerned about is that you don't want every commit to be right, do you? I think including a test that tests the wrong semantics is worse than no test at all and if you happen to bisect your way to this commit without realizing it then you have an "anti-test" lurking. Maybe there's no feasible way this would happen IRL and it's not a concern but I just want to state that if a wrong test were to enter the codebase it is far worse than no test at all.

fwiw TDD recommends writing the test so that it fails first.

continuing this discussion here: #8464 (comment)

channeldb/channel.go

ProofOfKeags · 2024-01-31T19:02:55Z

channeldb/channel.go

+
+	status := ChanStatusShutdownSent
+	if locallyInitiated {
+		status |= ChanStatusLocalCloseInitiator


suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.

channeldb/channel.go

itest/lnd_coop_close_with_htlcs_test.go

Crypt-iQ

The link needs to be put in shutdown mode (see usage of OnCommitOnce). The restartCoopClose logic should change here to accommodate that(or maybe there's another way of accomplishing the same thing). Additionally, if we send shutdown before the link is loaded and we owe update_add_htlc + commit_sig and then send those after, we've violated the spec. I like the approach of adding a channel status since it would make some link processing easier, but I am very hesitant about changing the semantics of the status field since if we mess up, channels get borked

ellemouton added 4 commits January 31, 2024 12:09

multi: fix various typos

1bae43f

channeldb+lnwallet: add MarkShutdownSent

ff3555c

This method updates the channel status to ChanStatusShutdownSent to indicate that shutdown of the channel has started.

docs: update release notes

f3be6c0

ellemouton force-pushed the resend-shutdown branch from 09ba12f to f3be6c0 Compare January 31, 2024 12:52

ellemouton self-assigned this Jan 31, 2024

ellemouton added spec bug fix labels Jan 31, 2024

ellemouton added this to the v0.18.0 milestone Jan 31, 2024

coderabbitai bot reviewed Jan 31, 2024

View reviewed changes

lnwallet/channel.go Show resolved Hide resolved

Crypt-iQ reviewed Jan 31, 2024

View reviewed changes

ProofOfKeags suggested changes Jan 31, 2024

View reviewed changes

Crypt-iQ reviewed Feb 1, 2024

View reviewed changes

ellemouton mentioned this pull request Feb 6, 2024

multi: resend shutdown on reestablish #8464

Merged

ellemouton closed this Feb 6, 2024

saubyk removed this from the v0.18.0 milestone Feb 6, 2024

	if !dbChan.HasChanStatus(channeldb.ChanStatusDefault) &&
	!dbChan.HasChanStatus(channeldb.ChanStatusRestored) {

	closeTx := CreateCooperativeCloseTx(
	fundingTxIn(lc.channelState), lc.channelState.LocalChanCfg.DustLimit,
	lc.channelState.RemoteChanCfg.DustLimit, ourBalance, theirBalance,
	localDeliveryScript, remoteDeliveryScript, closeTxOpts...,
	)

Conversation

ellemouton commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The issue

Fix overview

PR flow

Uh oh!

coderabbitai bot commented Jan 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Assessment against linked issues

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

Uh oh!

ellemouton commented Jan 31, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ProofOfKeags left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Crypt-iQ left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ellemouton commented Jan 31, 2024 •

edited

Loading

coderabbitai bot commented Jan 31, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

Crypt-iQ left a comment •

edited

Loading