Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delay RAA-after-next processing until PaymentSent is are handled #2112

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

In 0ad1f4c943bdc9037d0c43d1b74c745befa065f0 we fixed a nasty bug
where a failure to persist a `ChannelManager` faster than a
`ChannelMonitor` could result in the loss of a `PaymentSent` event,
eventually resulting in a `PaymentFailed` instead!

As noted in that commit, there's still some risk, though its been
substantially reduced - if we receive an `update_fulfill_htlc`
message for an outbound payment, and persist the initial removal
`ChannelMonitorUpdate`, then respond with our own
`commitment_signed` + `revoke_and_ack`, followed by receiving our
peer's final `revoke_and_ack`, and then persist the
`ChannelMonitorUpdate` generated from that, all prior to completing
a `ChannelManager` persistence, we'll still forget the HTLC and
eventually trigger a `PaymentFailed` rather than the correct
`PaymentSent`.

Here we fully fix the issue by delaying the final
`ChannelMonitorUpdate` persistence until the `PaymentSent` event
has been processed and document the fact that a spurious
`PaymentFailed` event can still be generated for a sent payment.

The original fix in 0ad1f4c943bdc9037d0c43d1b74c745befa065f0 is
still incredibly useful here, allowing us to avoid blocking the
first `ChannelMonitorUpdate` until the event processing completes,
as this would cause us to add event-processing delay in our general
commitment update latency. Instead, we ultimately race the user
handling the `PaymentSent` event with how long it takes our
`revoke_and_ack` + `commitment_signed` to make it to our
counterparty and receive the response `revoke_and_ack`. This should
give the user plenty of time to handle the event before we need to
make progress.

Sadly, because we change our `ChannelMonitorUpdate` semantics, this
change requires a number of test changes, avoiding checking for a
post-RAA `ChannelMonitorUpdate` until after we process a
`PaymentSent` event. Note that this does not apply to payments we
learned the preimage for on-chain - ensuring `PaymentSent` events
from such resolutions will be addressed in a future PR. Thus, tests
which resolve payments on-chain switch to a direct call to the
`expect_payment_sent` function with the claim-expected flag unset.

Depends on #2111, is just the last commit on it. See discussion there for merge-ordering.

@TheBlueMatt TheBlueMatt force-pushed the 2023-02-sent-persist-order branch 5 times, most recently from 6789818 to 6cb3b5e Compare March 17, 2023 07:04
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2023

Codecov Report

Patch coverage: 99.53% and project coverage change: +0.39% 🎉

Comparison is base (d4ad826) 90.40% compared to head (097fc94) 90.79%.

❗ Current head 097fc94 differs from pull request most recent head 31049ed. Consider uploading reports for the commit 31049ed to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2112      +/-   ##
==========================================
+ Coverage   90.40%   90.79%   +0.39%     
==========================================
  Files         106      106              
  Lines       56268    59332    +3064     
  Branches    56268    59332    +3064     
==========================================
+ Hits        50868    53871    +3003     
- Misses       5400     5461      +61     
Files Changed Coverage Δ
lightning/src/events/mod.rs 44.44% <ø> (+1.92%) ⬆️
lightning/src/ln/channelmanager.rs 88.37% <97.22%> (+2.84%) ⬆️
lightning-invoice/src/utils.rs 97.67% <100.00%> (-0.03%) ⬇️
lightning/src/chain/chainmonitor.rs 95.01% <100.00%> (-0.03%) ⬇️
lightning/src/ln/chanmon_update_fail_tests.rs 98.71% <100.00%> (+0.01%) ⬆️
lightning/src/ln/channel.rs 91.42% <100.00%> (+1.95%) ⬆️
lightning/src/ln/functional_test_utils.rs 89.30% <100.00%> (+0.48%) ⬆️
lightning/src/ln/functional_tests.rs 98.26% <100.00%> (+0.11%) ⬆️
lightning/src/ln/monitor_tests.rs 98.52% <100.00%> (-0.01%) ⬇️
lightning/src/ln/outbound_payment.rs 91.93% <100.00%> (+1.68%) ⬆️
... and 3 more

... and 27 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me pending a second reviewer!

@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes.

@@ -1152,7 +1152,7 @@ impl OutboundPayments {
payment_id,
payment_hash,
path,
}, None));
}, Some(ev_completion_action)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does PaymentPathSuccessful also need the completion action? There's no coverage here, setting it to None and all tests pass.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it there means we won't lose PaymentPathSuccessful events on a rare restart edge case, but currently isn't externally visible and will only have an impact if we change event handling (eg where event handling can return an err) - because users always have to handle all events in one big batch before we remove them, you either fully handle events or you don't handle any events. If we move to event handling being able to return an err, a user could handle one event and not the other and lose PaymentPathSuccessful events, maybe. It doesn't matter much, really, but I'd prefer to leave it.

wpaulino
wpaulino previously approved these changes Aug 14, 2023
@@ -5928,10 +5938,16 @@ where
let peer_state = &mut *peer_state_lock;
match peer_state.channel_by_id.entry(msg.channel_id) {
hash_map::Entry::Occupied(mut chan) => {
let funding_txo = chan.get().context.get_funding_txo();
let (htlcs_to_fail, monitor_update_opt) = try_chan_entry!(self, chan.get_mut().revoke_and_ack(&msg, &self.fee_estimator, &self.logger), chan);
let funding_txo_opt = chan.get().context.get_funding_txo();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: use expect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should have been added to this line, no? No big deal though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unwraping on this line is unsafe and can panic, as we could receive an RAA message for a channel that isnt yet funded, and wouldnt fail until we call into channel...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, or would have, in the version of LDK when this code was written - before the channel split :)

wpaulino
wpaulino previously approved these changes Aug 16, 2023
@wpaulino
Copy link
Contributor

Ah, needs a rebase to fix some test compile errors.

@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Aug 16, 2023

Rebased with the following additional diff:

$ git diff
diff --git a/lightning/src/ln/functional_tests.rs b/lightning/src/ln/functional_tests.rs
index 2aecf72d4..440d9df3e 100644
--- a/lightning/src/ln/functional_tests.rs
+++ b/lightning/src/ln/functional_tests.rs
@@ -10096,8 +10096,8 @@ fn do_test_multi_post_event_actions(do_reload: bool) {
                nodes[1].node.peer_disconnected(&nodes[0].node.get_our_node_id());
                nodes[2].node.peer_disconnected(&nodes[0].node.get_our_node_id());

-               reconnect_nodes(&nodes[0], &nodes[1], (false, false), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (false, false));
-               reconnect_nodes(&nodes[0], &nodes[2], (false, false), (0, 0), (0, 0), (0, 0), (0, 0), (0, 0), (false, false));
+               reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[1]));
+               reconnect_nodes(ReconnectArgs::new(&nodes[0], &nodes[2]));
        }

        let events = nodes[0].node.get_and_clear_pending_events();
diff --git a/lightning/src/ln/payment_tests.rs b/lightning/src/ln/payment_tests.rs
index 44d7566b0..fe5bd37df 100644
--- a/lightning/src/ln/payment_tests.rs
+++ b/lightning/src/ln/payment_tests.rs
@@ -3746,7 +3746,7 @@ fn do_test_custom_tlvs_consistency(first_tlvs: Vec<(u64, Vec<u8>)>, second_tlvs:
                }

                do_claim_payment_along_route(&nodes[0], &[&[&nodes[1], &nodes[3]], &[&nodes[2], &nodes[3]]], false, our_payment_preimage);
-               expect_payment_sent(&nodes[0], our_payment_preimage, Some(Some(2000)), true);
+               expect_payment_sent(&nodes[0], our_payment_preimage, Some(Some(2000)), true, true);
        } else {
                // Expect fail back
                let expected_destinations = vec![HTLCDestination::FailedPayment { payment_hash: our_payment_hash }];

wpaulino
wpaulino previously approved these changes Aug 16, 2023
@valentinewallace
Copy link
Contributor

Needs rebase :(

This is a trivial refactor which will be used in the next commit.
In 0ad1f4c we fixed a nasty bug
where a failure to persist a `ChannelManager` faster than a
`ChannelMonitor` could result in the loss of a `PaymentSent` event,
eventually resulting in a `PaymentFailed` instead!

As noted in that commit, there's still some risk, though its been
substantially reduced - if we receive an `update_fulfill_htlc`
message for an outbound payment, and persist the initial removal
`ChannelMonitorUpdate`, then respond with our own
`commitment_signed` + `revoke_and_ack`, followed by receiving our
peer's final `revoke_and_ack`, and then persist the
`ChannelMonitorUpdate` generated from that, all prior to completing
a `ChannelManager` persistence, we'll still forget the HTLC and
eventually trigger a `PaymentFailed` rather than the correct
`PaymentSent`.

Here we fully fix the issue by delaying the final
`ChannelMonitorUpdate` persistence until the `PaymentSent` event
has been processed and document the fact that a spurious
`PaymentFailed` event can still be generated for a sent payment.

The original fix in 0ad1f4c is
still incredibly useful here, allowing us to avoid blocking the
first `ChannelMonitorUpdate` until the event processing completes,
as this would cause us to add event-processing delay in our general
commitment update latency. Instead, we ultimately race the user
handling the `PaymentSent` event with how long it takes our
`revoke_and_ack` + `commitment_signed` to make it to our
counterparty and receive the response `revoke_and_ack`. This should
give the user plenty of time to handle the event before we need to
make progress.

Sadly, because we change our `ChannelMonitorUpdate` semantics, this
change requires a number of test changes, avoiding checking for a
post-RAA `ChannelMonitorUpdate` until after we process a
`PaymentSent` event. Note that this does not apply to payments we
learned the preimage for on-chain - ensuring `PaymentSent` events
from such resolutions will be addressed in a future PR. Thus, tests
which resolve payments on-chain switch to a direct call to the
`expect_payment_sent` function with the claim-expected flag unset.
@TheBlueMatt
Copy link
Collaborator Author

Rebased.

@TheBlueMatt TheBlueMatt merged commit cb923c6 into lightningdevkit:main Aug 21, 2023
12 of 14 checks passed
Async Support automation moved this from In Progress to Done Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

None yet

4 participants