Skip to content

peer: add new abstract message router#8434

Merged
Roasbeef merged 0 commit intolightningnetwork:protofsmfrom
Roasbeef:peer-msg-router
Mar 5, 2024
Merged

peer: add new abstract message router#8434
Roasbeef merged 0 commit intolightningnetwork:protofsmfrom
Roasbeef:peer-msg-router

Conversation

@Roasbeef
Copy link
Copy Markdown
Member

@Roasbeef Roasbeef commented Jan 27, 2024

In this commit, we add a new abstract message router. Over time, the goal is that this message router replaces the logic we currently have in the readHandler (the giant switch for each message).

With this new abstraction, can reduce the responsibilities of the readHandler to just reading messages off the wire and handing them off to the msg router. The readHandler no longer needs to know where the messages should go, or how they should be dispatched.

This will be used in tandem with the new protofsm module in an upcoming PR implementing the new rbf-coop close.

Summary by CodeRabbit

  • New Features
    • Introduced a new message routing system to enhance communication and handling of messages between peers, improving network reliability and performance.

@Roasbeef Roasbeef added this to the v0.18.0 milestone Jan 27, 2024
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 27, 2024

Important

Auto Review Skipped

Auto reviews are disabled on base/target branches other than the default branch. Please add the base/target branch pattern to the list of additional branches to be reviewed in the settings.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Walkthrough

This update introduces a sophisticated message router designed to efficiently handle incoming wire messages within a network of peers. The implementation encompasses interfaces for message endpoints and the router, alongside a concrete peer-specific router version. Complementing the implementation, a comprehensive test suite ensures the router's functionality, focusing on adding and removing endpoints, message routing, and error management, is thoroughly verified.

Changes

Files Change Summary
peer/msg_router.go Added a new message router with interfaces for endpoints and the router.
peer/msg_router_test.go Introduced tests for the message router, including endpoint management, message routing, and error handling.

🐇✨
In the realm of code, where messages flow,
A router arises, with logic aglow.
Through tests, it's tried, a steadfast guide,
Ensuring messages, to their end, glide.
🚀🐾

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit-tests for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit tests for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit tests.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

@Roasbeef
Copy link
Copy Markdown
Member Author

@coderabbitai review

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 0d37621 and 6391d0a.
Files selected for processing (2)
  • peer/msg_router.go (1 hunks)
  • peer/msg_router_test.go (1 hunks)
Additional comments: 4
peer/msg_router_test.go (1)
  • 11-31: The mockEndpoint struct and its methods (Name, CanRouteTo, SendMessage) are correctly implemented using the testify mock framework. This setup is appropriate for testing the behavior of the message router with mocked endpoints.
peer/msg_router.go (3)
  • 12-19: The error definitions ErrDuplicateEndpoint and ErrUnableToRouteMsg are clear and appropriately scoped to the functionality of the message router. This is a good practice for error handling, making it easier to identify and handle specific errors.
  • 26-40: The MsgEndpoint interface is well-defined, encapsulating the necessary operations for a message endpoint. The separation of concerns is evident, with methods for identifying the endpoint, checking message routability, and handling messages. This design adheres to the SOLID principles, particularly the Single Responsibility Principle.
  • 42-59: The MsgRouter interface definition is concise and captures the essence of a message router's responsibilities: registering/unregistering endpoints and routing messages. This abstraction allows for flexibility in implementation, which is a good design choice.

Comment on lines +33 to +146
// TestMessageRouterOperation tests the basic operation of the message router:
// add new endpoints, route to them, remove, them, etc.
func TestMessageRouterOperation(t *testing.T) {
msgRouter := NewPeerMsgRouter()
msgRouter.Start()
defer msgRouter.Stop()

openChanMsg := &lnwire.OpenChannel{}
commitSigMsg := &lnwire.CommitSig{}

errorMsg := &lnwire.Error{}

// For this test, we'll have two endpoints, each with distinct names.
// One endpoint will only handle OpenChannel, while the other will
// handle the CommitSig message.
fundingEndpoint := &mockEndpoint{}
fundingEndpointName := "funding"
fundingEndpoint.On("Name").Return(fundingEndpointName)
fundingEndpoint.On("CanRouteTo", openChanMsg).Return(true)
fundingEndpoint.On("CanRouteTo", errorMsg).Return(false)
fundingEndpoint.On("CanRouteTo", commitSigMsg).Return(false)
fundingEndpoint.On("SendMessage", openChanMsg).Return(true)

commitEndpoint := &mockEndpoint{}
commitEndpointName := "commit"
commitEndpoint.On("Name").Return(commitEndpointName)
commitEndpoint.On("CanRouteTo", commitSigMsg).Return(true)
commitEndpoint.On("CanRouteTo", openChanMsg).Return(false)
commitEndpoint.On("CanRouteTo", errorMsg).Return(false)
commitEndpoint.On("SendMessage", commitSigMsg).Return(true)

t.Run("add endpoints", func(t *testing.T) {
// First, we'll add the funding endpoint to the router.
require.NoError(t, msgRouter.RegisterEndpoint(fundingEndpoint))

// There should be a single endpoint registered.
require.Len(t, msgRouter.Endpoints(), 1)

// The name of the registered endpoint should be "funding".
require.Equal(t, "funding", msgRouter.Endpoints()[0].Name())
})

t.Run("duplicate endpoint reject", func(t *testing.T) {
// Next, we'll attempt to add the funding endpoint again. This
// should return an ErrDuplicateEndpoint error.
require.ErrorIs(
t, msgRouter.RegisterEndpoint(fundingEndpoint),
ErrDuplicateEndpoint,
)
})

t.Run("route to endpoint", func(t *testing.T) {
// Next, we'll add our other endpoint, then attempt to route a
// message.
require.NoError(t, msgRouter.RegisterEndpoint(commitEndpoint))

// If we try to route an message none of the endpoints know of,
// we should get an error.
require.ErrorIs(
t, msgRouter.RouteMsg(errorMsg), ErrUnableToRouteMsg,
)

fundingEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)

// Next, we'll route the open channel message. Only the
// fundingEndpoint should be used.
require.NoError(t, msgRouter.RouteMsg(openChanMsg))

fundingEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)

fundingEndpoint.AssertCalled(t, "SendMessage", openChanMsg)
commitEndpoint.AssertNotCalled(t, "SendMessage", openChanMsg)

// We'll do the same for the commit sig message.
require.NoError(t, msgRouter.RouteMsg(commitSigMsg))

fundingEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)

commitEndpoint.AssertCalled(t, "SendMessage", commitSigMsg)
fundingEndpoint.AssertNotCalled(t, "SendMessage", commitSigMsg)
})

t.Run("remove endpoints", func(t *testing.T) {
// Finally, we'll remove both endpoints.
require.NoError(
t, msgRouter.UnregisterEndpoint(fundingEndpointName),
)
require.NoError(
t, msgRouter.UnregisterEndpoint(commitEndpointName),
)

// There should be no endpoints registered.
require.Len(t, msgRouter.Endpoints(), 0)

// Trying to route a message should fail.
require.ErrorIs(
t, msgRouter.RouteMsg(openChanMsg),
ErrUnableToRouteMsg,
)
require.ErrorIs(
t, msgRouter.RouteMsg(commitSigMsg),
ErrUnableToRouteMsg,
)
})

commitEndpoint.AssertExpectations(t)
fundingEndpoint.AssertExpectations(t)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test TestMessageRouterOperation is well-structured and covers several scenarios including adding endpoints, rejecting duplicate endpoints, routing to the correct endpoint, and removing endpoints. However, there are a few areas for improvement:

  1. The test could benefit from more detailed comments explaining the purpose of each sub-test for better readability.
  2. Consider adding a test case for the scenario where an endpoint can route to multiple message types, ensuring the router's flexibility and correctness in handling such cases.
+ // Test case for an endpoint that can route to multiple message types.
+ t.Run("multi-message endpoint", func(t *testing.T) {
+     // Setup and assertions here...
+ })

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
// TestMessageRouterOperation tests the basic operation of the message router:
// add new endpoints, route to them, remove, them, etc.
func TestMessageRouterOperation(t *testing.T) {
msgRouter := NewPeerMsgRouter()
msgRouter.Start()
defer msgRouter.Stop()
openChanMsg := &lnwire.OpenChannel{}
commitSigMsg := &lnwire.CommitSig{}
errorMsg := &lnwire.Error{}
// For this test, we'll have two endpoints, each with distinct names.
// One endpoint will only handle OpenChannel, while the other will
// handle the CommitSig message.
fundingEndpoint := &mockEndpoint{}
fundingEndpointName := "funding"
fundingEndpoint.On("Name").Return(fundingEndpointName)
fundingEndpoint.On("CanRouteTo", openChanMsg).Return(true)
fundingEndpoint.On("CanRouteTo", errorMsg).Return(false)
fundingEndpoint.On("CanRouteTo", commitSigMsg).Return(false)
fundingEndpoint.On("SendMessage", openChanMsg).Return(true)
commitEndpoint := &mockEndpoint{}
commitEndpointName := "commit"
commitEndpoint.On("Name").Return(commitEndpointName)
commitEndpoint.On("CanRouteTo", commitSigMsg).Return(true)
commitEndpoint.On("CanRouteTo", openChanMsg).Return(false)
commitEndpoint.On("CanRouteTo", errorMsg).Return(false)
commitEndpoint.On("SendMessage", commitSigMsg).Return(true)
t.Run("add endpoints", func(t *testing.T) {
// First, we'll add the funding endpoint to the router.
require.NoError(t, msgRouter.RegisterEndpoint(fundingEndpoint))
// There should be a single endpoint registered.
require.Len(t, msgRouter.Endpoints(), 1)
// The name of the registered endpoint should be "funding".
require.Equal(t, "funding", msgRouter.Endpoints()[0].Name())
})
t.Run("duplicate endpoint reject", func(t *testing.T) {
// Next, we'll attempt to add the funding endpoint again. This
// should return an ErrDuplicateEndpoint error.
require.ErrorIs(
t, msgRouter.RegisterEndpoint(fundingEndpoint),
ErrDuplicateEndpoint,
)
})
t.Run("route to endpoint", func(t *testing.T) {
// Next, we'll add our other endpoint, then attempt to route a
// message.
require.NoError(t, msgRouter.RegisterEndpoint(commitEndpoint))
// If we try to route an message none of the endpoints know of,
// we should get an error.
require.ErrorIs(
t, msgRouter.RouteMsg(errorMsg), ErrUnableToRouteMsg,
)
fundingEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)
// Next, we'll route the open channel message. Only the
// fundingEndpoint should be used.
require.NoError(t, msgRouter.RouteMsg(openChanMsg))
fundingEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)
fundingEndpoint.AssertCalled(t, "SendMessage", openChanMsg)
commitEndpoint.AssertNotCalled(t, "SendMessage", openChanMsg)
// We'll do the same for the commit sig message.
require.NoError(t, msgRouter.RouteMsg(commitSigMsg))
fundingEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)
commitEndpoint.AssertCalled(t, "SendMessage", commitSigMsg)
fundingEndpoint.AssertNotCalled(t, "SendMessage", commitSigMsg)
})
t.Run("remove endpoints", func(t *testing.T) {
// Finally, we'll remove both endpoints.
require.NoError(
t, msgRouter.UnregisterEndpoint(fundingEndpointName),
)
require.NoError(
t, msgRouter.UnregisterEndpoint(commitEndpointName),
)
// There should be no endpoints registered.
require.Len(t, msgRouter.Endpoints(), 0)
// Trying to route a message should fail.
require.ErrorIs(
t, msgRouter.RouteMsg(openChanMsg),
ErrUnableToRouteMsg,
)
require.ErrorIs(
t, msgRouter.RouteMsg(commitSigMsg),
ErrUnableToRouteMsg,
)
})
commitEndpoint.AssertExpectations(t)
fundingEndpoint.AssertExpectations(t)
}
// TestMessageRouterOperation tests the basic operation of the message router:
// add new endpoints, route to them, remove, them, etc.
func TestMessageRouterOperation(t *testing.T) {
msgRouter := NewPeerMsgRouter()
msgRouter.Start()
defer msgRouter.Stop()
openChanMsg := &lnwire.OpenChannel{}
commitSigMsg := &lnwire.CommitSig{}
errorMsg := &lnwire.Error{}
// For this test, we'll have two endpoints, each with distinct names.
// One endpoint will only handle OpenChannel, while the other will
// handle the CommitSig message.
fundingEndpoint := &mockEndpoint{}
fundingEndpointName := "funding"
fundingEndpoint.On("Name").Return(fundingEndpointName)
fundingEndpoint.On("CanRouteTo", openChanMsg).Return(true)
fundingEndpoint.On("CanRouteTo", errorMsg).Return(false)
fundingEndpoint.On("CanRouteTo", commitSigMsg).Return(false)
fundingEndpoint.On("SendMessage", openChanMsg).Return(true)
commitEndpoint := &mockEndpoint{}
commitEndpointName := "commit"
commitEndpoint.On("Name").Return(commitEndpointName)
commitEndpoint.On("CanRouteTo", commitSigMsg).Return(true)
commitEndpoint.On("CanRouteTo", openChanMsg).Return(false)
commitEndpoint.On("CanRouteTo", errorMsg).Return(false)
commitEndpoint.On("SendMessage", commitSigMsg).Return(true)
t.Run("add endpoints", func(t *testing.T) {
// First, we'll add the funding endpoint to the router.
require.NoError(t, msgRouter.RegisterEndpoint(fundingEndpoint))
// There should be a single endpoint registered.
require.Len(t, msgRouter.Endpoints(), 1)
// The name of the registered endpoint should be "funding".
require.Equal(t, "funding", msgRouter.Endpoints()[0].Name())
})
t.Run("duplicate endpoint reject", func(t *testing.T) {
// Next, we'll attempt to add the funding endpoint again. This
// should return an ErrDuplicateEndpoint error.
require.ErrorIs(
t, msgRouter.RegisterEndpoint(fundingEndpoint),
ErrDuplicateEndpoint,
)
})
t.Run("route to endpoint", func(t *testing.T) {
// Next, we'll add our other endpoint, then attempt to route a
// message.
require.NoError(t, msgRouter.RegisterEndpoint(commitEndpoint))
// If we try to route an message none of the endpoints know of,
// we should get an error.
require.ErrorIs(
t, msgRouter.RouteMsg(errorMsg), ErrUnableToRouteMsg,
)
fundingEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", errorMsg)
// Next, we'll route the open channel message. Only the
// fundingEndpoint should be used.
require.NoError(t, msgRouter.RouteMsg(openChanMsg))
fundingEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", openChanMsg)
fundingEndpoint.AssertCalled(t, "SendMessage", openChanMsg)
commitEndpoint.AssertNotCalled(t, "SendMessage", openChanMsg)
// We'll do the same for the commit sig message.
require.NoError(t, msgRouter.RouteMsg(commitSigMsg))
fundingEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)
commitEndpoint.AssertCalled(t, "CanRouteTo", commitSigMsg)
commitEndpoint.AssertCalled(t, "SendMessage", commitSigMsg)
fundingEndpoint.AssertNotCalled(t, "SendMessage", commitSigMsg)
})
t.Run("remove endpoints", func(t *testing.T) {
// Finally, we'll remove both endpoints.
require.NoError(
t, msgRouter.UnregisterEndpoint(fundingEndpointName),
)
require.NoError(
t, msgRouter.UnregisterEndpoint(commitEndpointName),
)
// There should be no endpoints registered.
require.Len(t, msgRouter.Endpoints(), 0)
// Trying to route a message should fail.
require.ErrorIs(
t, msgRouter.RouteMsg(openChanMsg),
ErrUnableToRouteMsg,
)
require.ErrorIs(
t, msgRouter.RouteMsg(commitSigMsg),
ErrUnableToRouteMsg,
)
})
// Test case for an endpoint that can route to multiple message types.
t.Run("multi-message endpoint", func(t *testing.T) {
// Setup and assertions here...
})
commitEndpoint.AssertExpectations(t)
fundingEndpoint.AssertExpectations(t)
}

@Roasbeef Roasbeef requested review from ellemouton and removed request for yyforyongyu January 27, 2024 03:31
Copy link
Copy Markdown
Collaborator

@ellemouton ellemouton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Main question is around if we need to make this MsgHandler peer specific? Looks to me like this implementation of MsgHandler is very generic and can be re-used as is anywhere where a MsgHandler would be needed.

Also - should we maybe start using this in this PR by replacing the readHandler?

// sub-system capable of routing any incoming wire message to a set of
// registered endpoints.
//
// TODO(roasbeef): move to diff sub-system?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah agreed. Then can rename peer.PeerMsgRouter to peer.MsgRouter and reduce stuttering

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can move to protofsm. Also the case that with the latest commits in that PR, a default state machine now also implements this new interface.

So in the case of the new rbf-coop state machine, the init process would be:

  • make state machine
  • add as endpoint
  • let readHandler handle sending the messages to it in an opaque manner

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually circling back to this, if we adhere to these guidelines: https://go.dev/wiki/CodeReviewComments#interfaces

Go interfaces generally belong in the package that uses values of the interface type, not the package that implements those values.

Then the interface is in the correct place, as the peer package uses the MsgEndpoint interface. The MsgRouter interface isn't strictly required, but I view it as sort of package level documentation. The PR that starts to hook it up uses the interface, mainly to not be tightly coupled to the current concrete impl: https://github.com/lightningnetwork/lnd/blob/8692815610e22710621ae1934f25c1e731f1a9d5/peer/brontide.go#L505-L507


// PeerMsgRouter is a type of message router that is capable of routing new
// incoming messages that have been read off the wire by the peer.
type PeerMsgRouter struct {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "peer" specific about this MsgRouter? Wouldnt this kinda be what all impls of MsgRouter would look like? and if so - do we need a msg router interface? this seems like a complete component that can just be plugged in anywhere that needs a MsgRouter

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah nothing peer specific (added mainly to differentiate between the interface and concrete impl). Interface in place mainly so the peer struct itself declares what it wants, allowing it to avoid depending on the concrete type.


// HandleMessage handles the target message, and returns true if the
// message was able to be processed.
SendMessage(msg lnwire.Message) bool
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to combine these two?

So like SendMessage(lnwire.Message) bool, bool

where the first bool is "success" and the second one is "handled"?

Just thinking about the case where the underlying impl is dynamic and could return true for CanRouteTo but then something changes between when that is called and when SendMessage is called

Copy link
Copy Markdown
Member Author

@Roasbeef Roasbeef Jan 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the type of behavior we want for the router: should it be cautious, or optimistic? This also depends on if we want to allow multiple endpoints to handle the same message or not.

Today it's cautious in that it tries to makes sure the endpoint can accept the message before routing to it. Without this, it would call YOLO call SendMessage on everything, then later see if it was able to be routed at all. I think this also allows us to implement more conservative modes in the future, where we enforce that only a single endpoint can accept an incoming message.

Just thinking about the case where the underlying impl is dynamic and could return true for CanRouteTo but then something changes between when that is called and when SendMessage is called

IIUC, I think the behavior is the same if we have it be two phase, vs the optimistic single shot:

Two phase:

  • We call CanRouteTo, that returns true.
  • We try to send send the message via SendMessage that returns false.
  • Fail back to caller of RouteMsg with an error.

Single shot:

  • We call SendMessage, that calls CanRouteTo internally (how else does it know if it can handle the message or not?). That returns false.
  • Fail back to caller of RouteMsg with an error.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be very resistant to having multiple subsystems "handle" a particular message. There are cases where it's fine but they are the exception and not the rule. I actually wonder if we want to invert the way we do this. Rather than asking the subsystem "can you handle message X". The subsystem lists all of the messages it handles. This would allow us to more aggressively snipe conflicts.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I don't have any use case in mind today where a multiple sub-systems would handle a single message.

However, if we want to explicitly prevent that, then I think we need to make a decision re ordering as:

  • With the way registration works today, you don't know ahead of time if two sub-systems would potentially want the same message.
  • You can't base it off just the message type as there'll be many co-op close state machines active for each channel undergoing shutdown.

Given the above, we'd either need to just short circuit things (first one that can route, gets it, need to maintain registration order now), or create some sort of unique keys for each (channel, msg) pair and base uniqueness off of that.

Given we don't yet have a scenario in which a multiple sub-systems would accept the same message, I'm inclined to stick with the current design, and update the godoc strings to note that there're no guarantees re unique handling.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The subsystem lists all of the messages it handles.

I don't think that's sufficient, as if we have N channels doing concurrent co-op closes, we'd have N endpoints that all want a shutdown message, but one scoped to a particular channel. So in this case, you'd have (shutdown, chanID) as the unique key.

I started to go down this line a bit to improve the routing performance (map look up vs iteration), but it felt very premature, as dispatching likely won't be the bottleneck, and wanted to see how things are used more in the wild before restricting the application space.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see what's going on now. You want the message router to not just route based off of message type but also based off of channel id too. In that case, I think you're right. However, I would love to get to a point where all messages regardless of type, but that are scoped to a channel id have one top level system they run through. This is the ChannelLifecycle idea again.

@Roasbeef
Copy link
Copy Markdown
Member Author

Main question is around if we need to make this MsgHandler peer specific? Looks to me like this implementation of MsgHandler is very generic and can be re-used as is anywhere where a MsgHandler would be needed.

By peer specific, do you mean localized to a given peer connection? The interface is more generic, but the current implementation is intended to be used in a localized version for each peer, while the endpoints may be global. An example of this would be creating a FundingManager endpoint (global), then having each peer router (local) funnel funding related messages to that.

You're right tho that the Peer name inclusion in the default router impl isn't 100% necessary. Initially I was going to put the interface in the protofsm package, but then decided to just leave it here for now to be moved later.

Also - should we maybe start using this in this PR by replacing the readHandler?

Goal in this PR is just to add it, then allow follow up PRs to clean up the sections as needed. With the next PR, we'll gain a new endpoint for the new rbf-coop state machine, with those messages being sent there directly, rather than into the channel in the peer, which then has the handle method to dispatch them.

Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main consideration is the idea of allowing multiple dispatch. If we want to have a high level of assurance of implementation, we need an "owner" or a system who is responsible for ensuring a message is properly handled. Splitting that responsibility across multiple systems at the router layer is going to be counterproductive towards that end. In the case that we want to split responsibility of handling messages that should happen internal to a single system that handles a message.

If you disagree and insist on having multiple dispatch, it is a requirement that we ensure that the different subsystems that handle a message have completely independent state. Those systems must be prohibited from talking to each other or sharing state in any other way. If we don't then we're in trouble. The reason I'm hesitant to go this way is that there's no real place to put that requirement either in testing or in documentation. It is a "lore" requirement. Less "lore" is shorter ramp times for new contributors, and lower defect rate.


// HandleMessage handles the target message, and returns true if the
// message was able to be processed.
SendMessage(msg lnwire.Message) bool
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be very resistant to having multiple subsystems "handle" a particular message. There are cases where it's fine but they are the exception and not the rule. I actually wonder if we want to invert the way we do this. Rather than asking the subsystem "can you handle message X". The subsystem lists all of the messages it handles. This would allow us to more aggressively snipe conflicts.

RegisterEndpoint(MsgEndpoint) error

// UnregisterEndpoint unregisters the target endpoint from the router.
UnregisterEndpoint(EndPointName) error
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a smell to me. Dynamically changing the router state is not something we want to imply is a good idea. I can be convinced otherwise but I think the message router should make sure all messages are handle-able at all times.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't support unregistering, then how would we ever clean up the resources allocated to being able to route a given endpoint (in this case, the entry in the map)?

If we think of the co-op close case, a new endpoint is only registered after we send/recv shutdown. Once the channel has closed on-chain, there's no reason for the endpoints to hang around.

I'd categorize this as the micro view: all unique execution instances have their own endpoint. We can contrast with the macro view wherein: all unique messages have their own endpoint, with another msg router instance existing at that layer to further demultiplex.

This PR implements micro routing. While I think if we want to have all message endpoints be registered (and permanent) at start up, we'd go with the macro route. At this point, not sure which is preferable. You can also build the macro version with the abstractions laid out in this PR (msg router hooks into another msg router).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm definitely partial to going the macro route. It helps with exhaustiveness analysis. If we allow unregistering it means that there are messages that are only handleable at some times. To take your coop close case, I'm taking you to mean that ClosingSigned or ClosingComplete is only valid for the lifetime of the negotiation process. The approach you lay out here makes the structure of that logic implicit in the runtime execution of the code rather than explicit in the source where we have a check that says if our channel is in state X, message Y is not allowed.

Implicit logic is error prone for living codebases. It means that I have to keep certain contextual facts in my mind that are not reflected in the source. These assumptions may or may not change as time goes on. As a result, if you have not laid eyes on the code for a significant period of time, learning of this hidden assumption is not guaranteed. Contrast with the macro route where I can say "hmm let's see how message X from the bolt spec gets handled throughout the code" and then following it.

I do think this improves on the state of affairs but the dynamic nature of it is something I'm suspicious of.

@Roasbeef Roasbeef changed the base branch from master to coop-close-v2-wire-messages February 1, 2024 03:53
@Roasbeef Roasbeef changed the base branch from coop-close-v2-wire-messages to master February 1, 2024 03:53
@Roasbeef Roasbeef changed the base branch from master to protofsm February 1, 2024 03:54
@Roasbeef Roasbeef changed the base branch from protofsm to master February 1, 2024 03:54
@Roasbeef Roasbeef changed the base branch from master to protofsm February 7, 2024 02:46
@Roasbeef Roasbeef requested a review from Crypt-iQ as a code owner February 7, 2024 02:46
@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Feb 7, 2024

Pushed up a new diff (r3): https://reviewable.io/reviews/lightningnetwork/lnd/8434

@morehouse
Copy link
Copy Markdown
Collaborator

I'm not sure I understand the motivation for this change. It seems like the goal is to simplify readHandler, but then this new abstract message router seems to increase complexity overall.

Today it's very easy to understand how messages get routed by looking at the switch statement in readHandler. If we start using this new message router, the actual behavior will be more obfuscated.

@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Feb 15, 2024

Today it's very easy to understand how messages get routed by looking at the switch statement in readHandler. If we start using this new message router, the actual behavior will be more obfuscated.

I think the same could be said for most instances where a new abstraction and/or layer of indirection is created to create looser coupling or allow logic to be moved elsewhere. Sure, breaking up that 1000 line function may at the surface level obfuscate things a bit more, but the refactoring process broke up the functions into smaller components that are easier to test, and also grok at a glance.

Stepping back, today peer/brontide.go is a 4045 line file. The file is so large, as initially it was peer.go at the top level package, and was mainly concerned with some basic initialization, but primarily p2p feature negotiation. However, over time the peer.go file grew to understand all relevant messages we want to send/recv, and also handle figuring out where to send these messages to, and in many cases actually handle the messages directly which means in-line protocol logic+awareness.

The goal of this PR is to be able to pare back the peer structs responsibilities significantly, reducing it to mainly upfront initialization of relevant sub-systems and a place to figure out which feature bits were negotiated. This PR takes an initial step towards that by removing the responsibility of figuring out which messages need to be handled and how to handle them. What remains is logic mostly related to the life cycle management of the link (new channel created, co-op close starting, handle channel reest, send channel as link to switch, etc). IMO, this can also be extracted as well into a new sub-system with the peer simply handing over relevant messages like channel_ready, and channel_reest, etc.

In my mind, a minimal peer struct only needs to handle:

  • writing messages to the wire
  • reading them from the wire, handing off elsewhere for dispatch
  • ping timeout logic
  • feature bit negotiation

@morehouse
Copy link
Copy Markdown
Collaborator

I'm not opposed to reducing the scope of Brontide, but I'm not sure this PR is a good tradeoff.

The switch statement in readHandler is currently ~100 lines of simple straightforward code. This PR introduces 274 lines of code for MultiMsgRouter and MsgEndpoint. To actually use the new interface, we will need to encapsulate the logic of each case of the switch statement in a MsgEndpoint plus extra boilerplate to declare the endpoint and implement all 3 methods. We will likely end up replacing ~100 lines of code with ~500 lines of code.

The new MultiMsgRouter also introduces new complexity and opportunities for bugs:

  • endpoints can be registered and unregistered dynamically while the router is running
  • messages can be routed to multiple endpoints

These features make it quite difficult to know what the routing behavior will be at any point in time, since we don't know what endpoints are currently registered and whether more than one endpoint will act on the same message. We have taken code that was easy to analyze statically and made it unnecessarily complicated.


We should be able to move the message routing logic to its own module without bloating the code or increasing complexity. The routing behavior could continue to be static and many potential bugs avoided.

@Crypt-iQ
Copy link
Copy Markdown
Collaborator

Is there a follow-up PR incoming that I could look at to see how this would fit together? Otherwise, imo it's hard to compare the original code vs this.

@lightninglabs-deploy
Copy link
Copy Markdown
Collaborator

@Crypt-iQ: review reminder
@Roasbeef, remember to re-request review from reviewers when ready

@Roasbeef
Copy link
Copy Markdown
Member Author

Is there a follow-up PR incoming that I could look at to see how this would fit together? Otherwise, imo it's hard to compare the original code vs this.

Yeah this is at #8453, cleaning up the commits still, but you can see how the main loop would look like eventually here:

@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Feb 29, 2024

@morehouse I think this PR needs to be viewed in context with the protofsm PR and the rbf-close PR above. For new sub-systems created using the protofsm pattern, the FSM executor already has all the methods needed to be used as a MsgEndpoint, so you actually don't need unique boiler plate for each new sub-system one wishes to route to. Instead, you implement the states, then make the FSM executor using those states, and you're already able to register the endpoint from that.

In terms of the existing messages, although there're about half a dozen messages handled in the main readHandler loop, they only go to around 3 unique sub-systems, so I think the amount of extra boiler plate would be minimal. Eventually this msg router logic can also replace the existing msg stream usage, so activeMsgStreams and the msgStream struct itself.

These features make it quite difficult to know what the routing behavior will be at any point in time, since we don't know what endpoints are currently registered and whether more than one endpoint will act on the same message

You'll still be able to analyze all the registered endpoints in a single place, as all registration will happen when the peer struct is created (if you don't register at the start, if a message comes along it'll be unhandled), so there can still be a single place where all the endpoints are registered (loadActiveChannels or w/e that ends up being refactored into).

messages can be routed to multiple endpoints

See above for the convo re how to restrict that if we wish. Given the general MsgRouter interface, one could create something like a UniqueMsgRouter that enforces a 1:1 mapping between messages and endpoints.


This PR would also allow users to re-instantiate the peer struct outside of lnd in a new Go program, and be able to add handlers for new custom messages or other protocol features without modifying the lnd code.

In your mind, how can we achieve something like that while also requiring all message handling to be statically declared in a single place? When you say statically declared, I take that to mean that:

  1. All the messages are enumerated via a switch statement with the core type.
  2. The peer then also needs to know where the message should be sent to, which means include a ref to that as an attribute in the peer struct.

I totally agree that at times tight coupling of components can mean less code, whereas loose coupling invariably means a layer of indirection which can add more code. Ultimately, I think the flexibility is worth it, as the peer struct has gotten so bloated over the past few years with the addition of new features.

@Roasbeef Roasbeef requested a review from morehouse as a code owner February 29, 2024 22:47
@Roasbeef Roasbeef merged commit b8a229f into lightningnetwork:protofsm Mar 5, 2024
@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Mar 5, 2024

Only pushed up commits I had locally, not sure why it shows the branch as being merged here in the UI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

6 participants