Skip to content

multi: resend shutdown on reestablish #8447

Closed
ellemouton wants to merge 5 commits intolightningnetwork:masterfrom
ellemouton:resend-shutdown
Closed

multi: resend shutdown on reestablish #8447
ellemouton wants to merge 5 commits intolightningnetwork:masterfrom
ellemouton:resend-shutdown

Conversation

@ellemouton
Copy link
Copy Markdown
Collaborator

@ellemouton ellemouton commented Jan 31, 2024

Summary

In this PR we ensure that if a re-establish happens after shutdown is
sent but before fee negotiation starts, then shutdown is correctly
re-sent and the coop closure continues. This includes ensuring that
the delivery address sent in the first shutdown is the one that is
used in the final co-op close tx.

The issue

The issue is that we only mark a channel as ChanStatusCoopBroadcasted
at the time of actually doing the broadcast. This means that if there is a
re-connect between the shutdown message being sent and the coop
close tx being finalised then we will forget that we were in the middle of a
shutdown and will continue with normal operation on restart. This is not
compliant with the spec which says that on re-connect, if we previously
sent a shutdown then we MUST resend it again.

Fix overview

The fix is done by adding a new channel status: ChanStatusShutdownSent
which we activate when we are about to send shutdown to our peer. We
also persist our delivery address with this status so that we use the same
delivery address on restart.

PR flow

  1. First, the bug is recreated in an itest.
  2. Then, we a add the new status along with DB write and read methods &
    a test for these.
  3. Finally, we make use of the new methods & fix the itest to show the correct
    behaviour

Fixes #8397

This commit adds an itest to demonstrate that the following bug exists:
If channel Shutdown is initiated but then a re-connect is done before
the shutdown is complete, then the initiating node currently does not
properly resend the Shutdown message as required by the spec. This will
be fixed in an upcoming commit.
This method updates the channel status to ChanStatusShutdownSent to
indicate that shutdown of the channel has started.
In this commit, we start using the new MarkShutdownSent before we send a
Shutdown message and we use the DeliveryScript method to check if there
is a persisted delivery script that we should use if we do a reconnect.

With this commit, we can also fix the added itest to show that shutdown
now correctly continues after a reconnect.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 31, 2024

Walkthrough

This update introduces critical enhancements to the Lightning Network Daemon (LND) aimed at improving the channel shutdown process, specifically ensuring compliance with BOLT2 requirements for retransmitting shutdown messages upon reconnection. The changes include the introduction of a new channel status to mark when a shutdown has been initiated, the ability to persist and retrieve a delivery script for a channel's shutdown, and various corrections and improvements across tests and documentation to support these features.

Changes

Files Change Summaries
channeldb/channel.go, .../channel_test.go Added delivery script key, new channel status ChanStatusShutdownSent, and related methods/tests.
docs/release-notes/release-notes-0.18.0.md Updated with changes on HTLCs handling and AMP struct population during shutdown.
htlcswitch/interfaces.go, .../link.go Corrected typo from DiableAdds to DisableAdds.
itest/.../lnd_coop_close_with_htlcs_test.go New test cases for channel closure with HTLCs and improved test structure.
lnwallet/chancloser/..., lnwallet/channel.go Introduced methods for marking shutdown initiation and handling delivery script persistence.
peer/brontide.go Adjusted cooperative close process and channel status handling.

Assessment against linked issues

Objective Addressed Explanation
Ensure retransmission of shutdown on reestablish as per BOLT2 (#8397)
Address issue of not retransmitting shutdown post #8167 if closing_signed not yet sent (#8397)
Persist state before sending shutdown for restarting cooperative close on reestablish (#8397)
Ensure the pkscript re-transmitted in shutdown is identical to the prior one (#8397)
Maintain original ordering of owed update_*, commit_sig, and shutdown messages (#8397)

Poem

In the realm of lightning, a change was sown,
Through fields of code, new seeds have grown.
🐇 With a hop and a skip, the shutdown's sent,
In harmony, channels close, our efforts well spent.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository from git and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@ellemouton
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@ellemouton ellemouton self-assigned this Jan 31, 2024
@ellemouton ellemouton added this to the v0.18.0 milestone Jan 31, 2024
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between cf4f468 and f3be6c0.
Files selected for processing (11)
  • channeldb/channel.go (6 hunks)
  • channeldb/channel_test.go (1 hunks)
  • docs/release-notes/release-notes-0.18.0.md (1 hunks)
  • htlcswitch/interfaces.go (1 hunks)
  • htlcswitch/link.go (1 hunks)
  • itest/lnd_coop_close_with_htlcs_test.go (2 hunks)
  • lnwallet/chancloser/chancloser.go (1 hunks)
  • lnwallet/chancloser/chancloser_test.go (1 hunks)
  • lnwallet/chancloser/interface.go (1 hunks)
  • lnwallet/channel.go (2 hunks)
  • peer/brontide.go (4 hunks)
Files not reviewed due to errors (2)
  • peer/brontide.go (Error: unable to parse review)
  • channeldb/channel.go (Error: unable to parse review)
Files skipped from review due to trivial changes (1)
  • htlcswitch/link.go
Additional comments: 14
lnwallet/chancloser/interface.go (1)
  • 38-40: The addition of the MarkShutdownSent method to the Channel interface is a necessary change to support the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and the BOLT2 requirements.
itest/lnd_coop_close_with_htlcs_test.go (2)
  • 18-34: The restructuring of test functions and the addition of new test cases (no restart and with restart) to verify cooperative channel closure with pending HTLCs, including restart scenarios, are well-implemented. These changes effectively enhance the test coverage for the new functionality, ensuring that the system behaves as expected under conditions that previously led to spec non-compliance.
  • 125-253: The addition of the coopCloseWithHTLCsWithRestart test function, which tests the cooperative close flow when an HTLC is still pending on the channel and ensures that the shutdown process continues as expected even if a channel re-establish happens after one party has already initiated the shutdown, is a critical enhancement. This test directly addresses the PR's objectives by demonstrating compliance with the BOLT2 requirements regarding the retransmission of shutdown messages.
docs/release-notes/release-notes-0.18.0.md (1)
  • 76-79: The release notes accurately describe the changes made in the PR, specifically mentioning the handling of in-flight HTLCs during channel shutdown and the re-initialization of the shutdown exchange upon reconnection. This documentation is crucial for users and developers to understand the improvements and how they comply with BOLT2 requirements.
lnwallet/chancloser/chancloser_test.go (1)
  • 157-159: The addition of the MarkShutdownSent method to the mockChannel struct in the test file is a necessary change to support testing the new functionality of persistently marking that the shutdown of a channel has been initiated. This aligns with the PR's objectives and ensures that the tests can accurately simulate the behavior of the system with the new changes.
htlcswitch/interfaces.go (1)
  • 142-142: Corrected typo in method name from DiableAdds to DisableAdds. This change improves readability and accuracy of the method name.
lnwallet/chancloser/chancloser.go (1)
  • 356-366: Persisting the delivery script before sending the shutdown message ensures that the same delivery script is used if a reconnect happens, aligning with the PR's objectives to comply with BOLT2 specifications for message retransmission. This change enhances the robustness of the channel shutdown process.
channeldb/channel_test.go (6)
  • 1161-1196: The test function TestDeliveryScript correctly tests the functionality of setting and retrieving a channel's delivery script when its status is updated to ChanStatusShutdownSent. The test follows a logical sequence: creating a test channel, verifying the absence of a delivery script initially, updating the channel status and setting the delivery script, and finally asserting that the status and script are correctly updated and retrieved. The use of require statements ensures that the test will halt on the first error, which is appropriate for this context.
  • 1167-1169: The error handling for MakeTestDB is correctly implemented using require.NoError, ensuring that the test halts immediately if the test database cannot be created. This is a best practice for handling setup errors in tests.
  • 1186-1186: The method MarkShutdownSent is correctly called with the expected parameters, including the delivery script and the boolean indicating the initiator status. This is key to testing the behavior under review.
  • 1192-1195: The retrieval and assertion of the delivery script using DeliveryScript and require.Equal are correctly implemented, ensuring that the script set earlier in the test is accurately retrieved and matches the expected value.
  • 1176-1177: The use of require.Error with ErrNoDeliveryScript correctly tests the expected behavior when attempting to retrieve a delivery script before one has been set. This ensures the method behaves as expected in error conditions.
  • 1189-1190: The assertions using require.True to check the channel status after calling MarkShutdownSent are correctly placed and validate that the channel status is updated as expected. This is crucial for ensuring the method's side effects are as intended.
lnwallet/channel.go (1)
  • 7139-7139: An empty line was added in the newOutgoingHtlcResolution function. This change seems to be for readability or formatting purposes.


status := ChanStatusShutdownSent
if locallyInitiated {
status |= ChanStatusLocalCloseInitiator
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add another channel status here because on restart, execution will hit this if statement and the link won't be loaded:

lnd/peer/brontide.go

Lines 881 to 882 in cf4f468

if !dbChan.HasChanStatus(channeldb.ChanStatusDefault) &&
!dbChan.HasChanStatus(channeldb.ChanStatusRestored) {

The reason the itest works is that coop close then proceeds via restartCoopClose, but when the coop close transaction is being made it will just delete the HTLC and burn it to fees. This can be seen if you log the coop close transaction and proposedFee here:

lnd/lnwallet/channel.go

Lines 7821 to 7825 in cf4f468

closeTx := CreateCooperativeCloseTx(
fundingTxIn(lc.channelState), lc.channelState.LocalChanCfg.DustLimit,
lc.channelState.RemoteChanCfg.DustLimit, ourBalance, theirBalance,
localDeliveryScript, remoteDeliveryScript, closeTxOpts...,
)

It can also be observed because no UpdateFulfillHTLC is sent during the itest.

Instead, we should just be able to use the existence of a delivery script to determine whether or not we need to continue coop close

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd still need the checks for older nodes, so this would be adding a special case and duplicating some logic. Ultimately, loading the link with a non-ChanStatusDefault state means refactoring the way. the channel status is used because the link performs the isBorked check before trying to update anything. The way the check is written currently means it would fail for ChanStatusShutdownSent. It's certainly doable, but I don't think changing it is worth the risk

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that it isn't default though. We're definitely in a different state that isn't normal operation and can't accept new HTLC adds. Can you clarify what "the risk" is? I'm trying to weigh the cost of having increasingly fragmented logic (where consequences of changing stuff leaks like it is doing now) vs the cost of fixing things so the code straightforwardly reflects what is supposed to happen.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no issue with using ChanStatusDefault here, the link just needs to be aware that it should send shutdown. The risk is that we make a costly mistake when changing how the status field is used

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a way to navigate this. For one, it's not really functioning as a "status" as it's a bit-vector. Instead it's a collection of flags. So really what the ChanStatusDefault check is doing is trying to check if specific flags are unset, and act accordingly. I think perhaps inverting the check and rather than checking "is it default", we should check "is it not have any one of these conditions". We can always push it to an extra field but I do think that the way we handle ChannelStatus right now is very fragile and should be reworked to be more structurally correct. Maybe an issue for a less tactical PR though.

Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Seems straightforward enough. I am glad you put in the ChanStatusShutdownSent, but in order to reconcile it with some more architectural shifts I'm trying to make, I'd recommend renaming it to ChanStatusTerminal or something like that. The reasons for this are a few. The spec merely states that we MUST do it if we've sent shutdown. It doesn't say we can't do it if we've received but not yet sent shutdown. By marking the channel as terminal the moment a valid shutdown is sent either direction, we can make the stored state more representative of channel state as opposed to link state. Having the message passing details leak into our persistent state without being symmetric (having state for each half of the duplex connection) is sus to me. So if we don't want to separate the persistence from the link state details we should probably also include state that tracks whether we've received shutdown as well.

Other than that, I think the implementation seems fine except for the two blocking comments I have which should be very straightforward to address.

Comment on lines +199 to +203
// Show that the channel is seen as active again by Alice and Bob.
//
// NOTE: This is a bug and will be fixed in an upcoming commit.
ht.AssertChannelActive(alice, chanPoint)
ht.AssertChannelActive(bob, chanPoint)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal for us to include "wrong" tests in intermediate commits? My gut tells me we want to assert the channel is inactive here knowing full well that the test fails. I'm not insisting we change it here but I probably wouldn't do things this way if it were me.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this way of telling a story of a PR. So it's mostly for review & to make every commit "correct" meaning that any commit can be reverted to & it wont break the build of the main code or the pass rate of the tests

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the storyline thing 100%, I just usually start with the correct-but-failing test. What I'm concerned about is that you don't want every commit to be right, do you? I think including a test that tests the wrong semantics is worse than no test at all and if you happen to bisect your way to this commit without realizing it then you have an "anti-test" lurking. Maybe there's no feasible way this would happen IRL and it's not a concern but I just want to state that if a wrong test were to enter the codebase it is far worse than no test at all.

fwiw TDD recommends writing the test so that it fails first.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continuing this discussion here: #8464 (comment)


status := ChanStatusShutdownSent
if locallyInitiated {
status |= ChanStatusLocalCloseInitiator
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we should adjust the checks that you're referencing @Crypt-iQ as opposed to refraining from updating statuses according to the events that transpire.

Copy link
Copy Markdown
Collaborator

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link needs to be put in shutdown mode (see usage of OnCommitOnce). The restartCoopClose logic should change here to accommodate that(or maybe there's another way of accomplishing the same thing). Additionally, if we send shutdown before the link is loaded and we owe update_add_htlc + commit_sig and then send those after, we've violated the spec. I like the approach of adding a channel status since it would make some link processing easier, but I am very hesitant about changing the semantics of the status field since if we mess up, channels get borked

@ellemouton ellemouton closed this Feb 6, 2024
@saubyk saubyk removed this from the v0.18.0 milestone Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spec]: retransmit shutdown on reestablish

4 participants