Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't send init closing_signed too early after final HTLC removal #2529

Conversation

TheBlueMatt
Copy link
Collaborator

If we remove an HTLC (or fee update), commit, and receive our counterparty's revoke_and_ack, we remove all knowledge of said HTLC (or fee update). However, the latest local commitment transaction that we can broadcast still contains the HTLC (or old fee), thus we are not eligible for initiating the closing_signed negotiation if we're shutting down and are generally expecting a counterparty commitment_signed immediately.

Because we don't have any tracking of these updates in the Channel (only the ChannelMonitor is aware of the HTLC being in our latest local commitment transaction), we'd previously send a closing_signed too early, causing LDK<->LDK channels with an HTLC pending towards the channel initiator at the time of shutdown to always fail to cooperatively close.

To fix this race, we add an additional unpersisted bool to Channel and use that to gate sending the initial closing_signed.

@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2023

Codecov Report

Attention: 3 lines in your changes are missing coverage. Please review.

Comparison is base (281a0ae) 88.81% compared to head (70b1866) 89.16%.
Report is 12 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2529      +/-   ##
==========================================
+ Coverage   88.81%   89.16%   +0.34%     
==========================================
  Files         113      113              
  Lines       89116    91552    +2436     
  Branches    89116    91552    +2436     
==========================================
+ Hits        79152    81629    +2477     
+ Misses       7722     7703      -19     
+ Partials     2242     2220      -22     
Files Coverage Δ
lightning/src/ln/channel.rs 88.75% <100.00%> (+0.09%) ⬆️
lightning/src/ln/shutdown_tests.rs 96.66% <96.55%> (-0.01%) ⬇️

... and 13 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


#[test]
fn htlc_removal_no_early_closing_signed() {
// Previously, if we have a pending inbound HTLC on a channel which has initiated shutdown,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test coverage should also encompass a “pending outbound HTLC” to check if final HTLC removal works well in both directions. And current test is send_payment(&nodes[0], &[&nodes[1]]), so it’s more an outbound HTLC from the viewpoint of nodes[0] ? Nodes 1 receives the SendClosingSigned L1277

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have other coverage of HTLCs in the other direction in shutdown_tests.rs

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the test coverage of the other HTLC in shutdown_tests.rs would have to be test, though note I still think you’re testing the pending outbound HTLC here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what you're asking for.

lightning/src/ln/channel.rs Show resolved Hide resolved
lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
@@ -3363,6 +3379,7 @@ impl<SP: Deref> Channel<SP> where
if let OutboundHTLCState::LocalAnnounced(_) = htlc.state {
log_trace!(logger, " ...promoting outbound LocalAnnounced {} to Committed", &htlc.payment_hash);
htlc.state = OutboundHTLCState::Committed;
*expecting_peer_cs = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also need to set this on channel reestablish if we only get to process the RAA but not the CS?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in theory, but by the time we get to a reestablish (if we got the RAA before shutdown/disconnect) we have forgotten about the HTLC so there's nothing we can do (except track the HTLC longer). Even after this patch we still risk force-closing if we restart right after the RAA but before the CS, but I'm not really sure that's worth fixing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could affect mobile nodes given the higher frequency of restarts, but I'm fine with delaying the full fix for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think any peer should be sending those two messages back-to-back. So, yes, it could impact someone on mobile who races, but I'm still kinda meh on persisting this bool? I guess we could, but I really want to avoid persisting new things and move to deriving them from monitors universally.

wpaulino
wpaulino previously approved these changes Aug 28, 2023
Copy link
Contributor

@alecchendev alecchendev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much LGTM

lightning/src/ln/channel.rs Show resolved Hide resolved
lightning/src/ln/shutdown_tests.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to squash

lightning/src/ln/shutdown_tests.rs Outdated Show resolved Hide resolved
@TheBlueMatt TheBlueMatt force-pushed the 2023-08-shutdown-remove-early-sign branch from 20bfd30 to 52836ff Compare September 7, 2023 18:41
@TheBlueMatt
Copy link
Collaborator Author

Squashed with a trivial test cleanup:

$ git diff-tree -U1 20bfd30c3 52836ffdf
diff --git a/lightning/src/ln/shutdown_tests.rs b/lightning/src/ln/shutdown_tests.rs
index 2711faaae..7725740d2 100644
--- a/lightning/src/ln/shutdown_tests.rs
+++ b/lightning/src/ln/shutdown_tests.rs
@@ -1238,6 +1238,3 @@ fn do_outbound_update_no_early_closing_signed(use_htlc: bool) {
 	} else {
-		{
-			let mut feerate_lock = chanmon_cfgs[0].fee_estimator.sat_per_kw.lock().unwrap();
-			*feerate_lock *= 10;
-		}
+		*chanmon_cfgs[0].fee_estimator.sat_per_kw.lock().unwrap() *= 10;
 		nodes[0].node.timer_tick_occurred();
$ 

Copy link
Contributor

@jbesraa jbesraa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

wpaulino
wpaulino previously approved these changes Sep 7, 2023
// If we're waiting on a monitor persistence, that implies we're also waiting to send some
// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
// initiate `closing_signed` negotiation until we're clear of all pending messages.
if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a hidden intention for this check being added right now along with below check?

If so, we should add a small comment in the check below. Otherwise these two look like and are two perfectly fine independent checks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what you're asking here - its not a "hidden intention", but rather an extension of the below check - if we think we're ready to go but are actually waiting on our own monitor update to send some message, we're not really ready to go.

lightning/src/ln/channel.rs Outdated Show resolved Hide resolved
lightning/src/ln/shutdown_tests.rs Outdated Show resolved Hide resolved
@TheBlueMatt
Copy link
Collaborator Author

Rebased to address use conflict.

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry this somehow fell through, feel free to squash

@@ -3334,6 +3349,7 @@ impl<SP: Deref> Channel<SP> where
if let &InboundHTLCRemovalReason::Fulfill(_) = reason {
value_to_self_msat_diff += htlc.amount_msat as i64;
}
*expecting_peer_commitment_signed = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we use the existing mark_awaiting_response here? This will also make sure we disconnect them if they take too long or don't send it at all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this more generally? Its an indication of an HTLC hang on the remote end which could still cause issues. I'd kinda like to do that as a followup, though, since its a separate concept and a lot more stuff to test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good

@TheBlueMatt TheBlueMatt force-pushed the 2023-08-shutdown-remove-early-sign branch from dfef4a7 to 168899f Compare October 31, 2023 18:08
@wpaulino
Copy link
Contributor

Needs a rebase, CI is not happy

@TheBlueMatt
Copy link
Collaborator Author

Rebased.

@TheBlueMatt TheBlueMatt force-pushed the 2023-08-shutdown-remove-early-sign branch from 168899f to 0fe7423 Compare November 5, 2023 04:11
wpaulino
wpaulino previously approved these changes Nov 6, 2023
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking, LGTM

use crate::chain::transaction::OutPoint;
use crate::events::{MessageSendEvent, MessageSendEventsProvider, ClosureReason};
use crate::events::{MessageSendEvent, HTLCDestination, MessageSendEventsProvider, ClosureReason};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: imports out of order

// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
// initiate `closing_signed` negotiation until we're clear of all pending messages.
if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 {
return Ok((None, None, None));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage here, also seems logically separate from the rest of the commit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, its actually unreachable, closing_negotiation_ready (the first check we do) already checks that MonitorUpdateInProgress is not set. The test as written does hit this case. For clarity I went ahead and removed the redundant case and moved the comment to the top.

@TheBlueMatt TheBlueMatt force-pushed the 2023-08-shutdown-remove-early-sign branch from 0fe7423 to 02c1070 Compare November 11, 2023 20:23
If we remove an HTLC (or fee update), commit, and receive our
counterparty's `revoke_and_ack`, we remove all knowledge of said
HTLC (or fee update). However, the latest local commitment
transaction that we can broadcast still contains the HTLC (or old
fee), thus we are not eligible for initiating the `closing_signed`
negotiation if we're shutting down and are generally expecting a
counterparty `commitment_signed` immediately.

Because we don't have any tracking of these updates in the `Channel`
(only the `ChannelMonitor` is aware of the HTLC being in our latest
local commitment transaction), we'd previously send a
`closing_signed` too early, causing LDK<->LDK channels with an HTLC
pending towards the channel initiator at the time of `shutdown` to
always fail to cooperatively close.

To fix this race, we add an additional unpersisted bool to
`Channel` and use that to gate sending the initial `closing_signed`.
@TheBlueMatt TheBlueMatt force-pushed the 2023-08-shutdown-remove-early-sign branch from 02c1070 to 70b1866 Compare November 11, 2023 20:25
@TheBlueMatt
Copy link
Collaborator Author

Removed the redundant check so that we don't screw up and get confused later.

$ git diff-tree -U1 0fe7423d 70b18663
diff --git a/lightning/src/ln/channel.rs b/lightning/src/ln/channel.rs
index 5c341b264..f36777467 100644
--- a/lightning/src/ln/channel.rs
+++ b/lightning/src/ln/channel.rs
@@ -4400,2 +4400,6 @@ impl<SP: Deref> Channel<SP> where
 	{
+		// If we're waiting on a monitor persistence, that implies we're also waiting to send some
+		// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
+		// initiate `closing_signed` negotiation until we're clear of all pending messages. Note
+		// that closing_negotiation_ready checks this case (as well as a few others).
 		if self.context.last_sent_closing_fee.is_some() || !self.closing_negotiation_ready() {
@@ -4411,9 +4415,2 @@ impl<SP: Deref> Channel<SP> where

-		// If we're waiting on a monitor persistence, that implies we're also waiting to send some
-		// message to our counterparty (probably a `revoke_and_ack`). In such a case, we shouldn't
-		// initiate `closing_signed` negotiation until we're clear of all pending messages.
-		if (self.context.channel_state & (ChannelState::MonitorUpdateInProgress as u32)) != 0 {
-			return Ok((None, None, None));
-		}
-
 		// If we're waiting on a counterparty `commitment_signed` to clear some updates from our

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can land since it's not too different from when it was last ACK'd by @wpaulino

@TheBlueMatt
Copy link
Collaborator Author

I think this can land since it's not too different from when it was last ACK'd by @wpaulino

So land it :p

@TheBlueMatt TheBlueMatt merged commit 5d187f6 into lightningdevkit:main Nov 14, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants