[1/4] - protofsm: add new package for driving generic protocol FSMs by Roasbeef · Pull Request #8337 · lightningnetwork/lnd

Roasbeef · 2024-01-03T01:39:52Z

In this PR, we create a new package, protofsm which is intended to
abstract away from something we've done dozens of time in the daemon:
create a new event-drive protocol FSM. One example of this is the co-op
close state machine, and also the channel state machine itself.

This packages picks out the common themes of:

clear states and transitions between them
calling out to special daemon adapters for I/O such as transaction
broadcast or sending a message to a peer
cleaning up after state machine execution
notifying relevant callers of updates to the state machine

The goal of this PR, is that devs can now implement a state machine
based off of this primary interface:

// State defines an abstract state along, namely its state transition function
// that takes as input an event and an environment, and returns a state
// transition (next state, and set of events to emit). As state can also either
// be terminal, or not, a terminal event causes state execution to halt.
type State[Event any, Env Environment] interface {
	// ProcessEvent takes an event and an environment, and returns a new
	// state transition. This will be iteratively called until either a
	// terminal state is reached, or no further internal events are
	// emitted.
	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)

	// IsTerminal returns true if this state is terminal, and false otherwise.
	IsTerminal() bool
}

With their focus being only on each state transition, rather than all
the boiler plate involved (processing new events, advancing to
completion, doing I/O, etc, etc).

Instead, they just make their states, then create the state machine
given the starting state and env. The only other custom component needed
is something capable of mapping wire messages or other events from the
"outside world" into the domain of the state machine.

The set of types is based on a pseudo sum type system wherein you
declare an interface, make the sole method private, then create other
instances based on that interface. This restricts call sites (must pass
in that interface) type, and with some tooling, exhaustive matching can
also be enforced via a linter.

The best way to get a hang of the pattern proposed here is to check out
the tests. They make a mock state machine, and then use the new executor
to drive it to completion. You'll also get a view of how the code will
actually look, with the focus being on the: input event, current state,
and output transition (can also emit events to drive itself forward).

github-actions · 2024-01-03T02:18:34Z

Pull reviewers stats

Stats of the last 30 days for lnd:

User	Total reviews	Time to review	Total comments
yyforyongyu 🥇	6 ▀▀▀	3d 19h 57m ▀▀▀▀▀	3 ▀
Roasbeef 🥈	5 ▀▀▀	1d 18h 46m ▀▀	18 ▀▀▀▀▀▀▀
guggero 🥉	4 ▀▀	1d 7h 51m ▀▀	3 ▀
bhandras	2 ▀	3h 27m	0
calvinrzachman	1 ▀	6m	1
ziggie1984	1 ▀	20h 14m ▀	0

yyforyongyu

Really like the uniformed StateMachine🤩 I think the blockbeat from #7951 can even fit in this picture - that there exists a set of universal events such as a new block event, and we force every state machine to process it. My main question is whether we could stop generalizing at ProcessEvent, and leave the implementations of executeDaemonEvent to specific subsystems? This naturally leads to the question of whether we need this DaemonAdapters interface, as it seems it's not a common functionality that's shared by all subsystems.

Still need to think through, but a few ideas,

we could make StateMachine an interface, and maybe add something like BaseMachine that has the minimal methods such as driveMachine.
I like that State is an interface which makes writing the tests much easier. It's just that the name is a bit confusing I guess, as it's sort like a processor, and each state has its own processor.

My understanding of the design is, an event-driven machine that's pipelined with state processors, the machine doesn't care about the specifics of the event, instead, it's the state processor's responsibility to handle the event and instruct a new state. I think we could stop here without distinguishing interval vs external events, apply it to a few subsystems to see its effect.

protofsm/daemon_events.go

protofsm/state_machine.go

yyforyongyu · 2024-01-03T12:07:33Z

protofsm/state_machine.go

+// executeDaemonEvent executes a daemon event, which is a special type of event
+// that can be emitted as part of the state transition function of the state
+// machine. An error is returned if the type of event is unknown.
+func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {


feels like it's leaking the implementation details from other subsystems

I think you'll have a better idea of the interaction once the new co-op close stuff is up, but the general idea is that:

All the state machine state transitions are pure functions

They emit events for the executor (prob should rename this struct slightly) to apply themselves

Something needs to be aware of the boundary between the pure state machine, and the daemon execution env it runs in

This thing handles that role of knowing all the global I/O or daemon actions to execute itself, and potentially emit an event back into the state machine (post execution hook)

Otherwise, what do you think should be handling the I/O between the daemon and the state machine?

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

So the idea is that the actual state transitions never need to concern themselves with any of these details. They just emit the event, then wait for w/e new event to be sent in. There's no leakage of implementation details at this StateMachine level, as we'll pass in a concrete implementation based on lnd later, here's an idea of what that looks like: ce75ef8.

This is to be considered universal, just like the POSIX interface we all know and love today. In this case, our processes re these FSMs, and the syscalls ways to interact with the chain or daemon.

protofsm/state_machine.go

protofsm/state_machine_test.go

Roasbeef · 2024-01-04T00:35:46Z

we could make StateMachine an interface,

Why do you think this should be an interface? The goal here is to provide a generic implementation that can drive any FSM, which is defined from that starting/initial state, and all the state transition functions. If you look at the test, it takes that mock state machine, and is able to drive that with the shared semantics of: terminal states, clean up functions, pure state transitions that emit any side effects as events, etc.

and leave the implementations of executeDaemonEvent to specific subsystems

The goal of those was to implement all the side effects we'd ever need in a single place. The daemon events added were just the ones I needed to implement the new co-op close state machine nearly from scratch. I think if we look at all the state machines we've written in the codebase, maybe there's ~10 daemon level adapters that are used continuously. One that's missing right now is requesting to be notified of something confirming.

protofsm/state_machine.go

protofsm/state_machine_test.go

ProofOfKeags

I did a no-nit high level review here. My biggest squint was around the SendWhen impure pseudo-predicate. Not gonna lie, I don't like it. However, I suspect that the reason you went this route is that making it pure would require hooks into the state changes of surrounding subsystems in ways that would require significant changes to the overall LND codebase before this could be inserted.

That said, there still may be no way around it. The main concern here is that the polling approach may miss the opportunities it needs to send the message out. The example here is OnCommit/OnFlush where we poll and still owe a commitment so we can't send, but then we do the commit and immediately follow up with another state change, thereby re-falsifying the SendWhen predicate before the next poll cycle.

In the case of shutdown and the coop close negotiations, this technically violates the spec. Idk what the practical consequences of that would be (they may be benign), but unless we can synchronize directly into the channel update lifecycle, we can't really be spec compliant.

On the other hand, you could make the argument that it isn't the state machine's responsibility to understand when a message should be synchronized into the message stream at all. It's job is simply to generate the response and the caller would queue it for sending at the next possible opportunity. This is the approach I took with the coop close v1: The ChanCloser is completely unaware of how the messages are dispatched, it just knows what to send, not when or how.

protofsm/daemon_events.go

protofsm/state_machine.go

ProofOfKeags · 2024-01-08T21:21:56Z

protofsm/state_machine.go

+// executeDaemonEvent executes a daemon event, which is a special type of event
+// that can be emitted as part of the state transition function of the state
+// machine. An error is returned if the type of event is unknown.
+func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {


I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

protofsm/state_machine.go

protofsm/state_machine_test.go

coderabbitai · 2024-01-24T02:59:50Z

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)

llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Roasbeef · 2024-02-06T03:07:13Z

PTAL.

protofsm/state_machine_test.go

ProofOfKeags

Main thing is I think we want to make state machines not able to "throw a disable" to another channel.

protofsm/daemon_events.go

protofsm/state_machine.go

Roasbeef · 2024-02-29T22:47:47Z

Pushed up a new set of commits with some bug fixes and some additional functionality that came in handy when starting to hook up the new RBF coop close state machine to the peer struct.

lightninglabs-deploy · 2024-11-07T03:51:20Z

@yyforyongyu: review reminder
@Crypt-iQ: review reminder
@morehouse: review reminder
@Roasbeef, remember to re-request review from reviewers when ready

ProofOfKeags

Updated comments

protofsm/daemon_events.go

protofsm/state_machine.go

protofsm/daemon_events.go

Crypt-iQ

LGTM once CI addressed

protofsm/state_machine.go

ProofOfKeags

I think this is good to go broadly speaking. There's some cleanup that needs to happen for CI and some nice simplifications using SpewLogClosure but we can get this approved today.

ProofOfKeags

Send it

In this PR, we create a new package, `protofsm` which is intended to abstract away from something we've done dozens of time in the daemon: create a new event-drive protocol FSM. One example of this is the co-op close state machine, and also the channel state machine itself. This packages picks out the common themes of: * clear states and transitions between them * calling out to special daemon adapters for I/O such as transaction broadcast or sending a message to a peer * cleaning up after state machine execution * notifying relevant callers of updates to the state machine The goal of this PR, is that devs can now implement a state machine based off of this primary interface: ```go // State defines an abstract state along, namely its state transition function // that takes as input an event and an environment, and returns a state // transition (next state, and set of events to emit). As state can also either // be terminal, or not, a terminal event causes state execution to halt. type State[Event any, Env Environment] interface { // ProcessEvent takes an event and an environment, and returns a new // state transition. This will be iteratively called until either a // terminal state is reached, or no further internal events are // emitted. ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error) // IsTerminal returns true if this state is terminal, and false otherwise. IsTerminal() bool } ``` With their focus being _only_ on each state transition, rather than all the boiler plate involved (processing new events, advancing to completion, doing I/O, etc, etc). Instead, they just make their states, then create the state machine given the starting state and env. The only other custom component needed is something capable of mapping wire messages or other events from the "outside world" into the domain of the state machine. The set of types is based on a pseudo sum type system wherein you declare an interface, make the sole method private, then create other instances based on that interface. This restricts call sites (must pass in that interface) type, and with some tooling, exhaustive matching can also be enforced via a linter. The best way to get a hang of the pattern proposed here is to check out the tests. They make a mock state machine, and then use the new executor to drive it to completion. You'll also get a view of how the code will actually look, with the focus being on the: input event, current state, and output transition (can also emit events to drive itself forward).

In this commit, we add an optional daemon event that can be specified to dispatch during init. This is useful for instances where before we start, we want to make sure we have a registered spend/conf notification before normal operation starts. We also add new unit tests to cover this, and the prior spend/conf event additions.

In this commit, we add the ability for the state machine to consume wire messages. This'll allow the creation of a new generic message router that takes the place of the current peer `readHandler` in an upcoming commit.

This'll be used later to uniquely identify state machines for routing/dispatch purposes.

We'll use this to be able to signal to a caller that a critical error occurred during the state transition.

Adding this makes a state machine easier to unit test, as the caller can specify a custom polling interval.

In this commit, we add the SpendMapper which allows callers to create custom spent events. Before this commit, the caller would be able to have an event sent to them in the case a spend happens, but that event wouldn't have any of the relevant spend details. With this new addition, the caller can specify how to take a generic spend event, and transform it into the state machine specific spend event.

In this commit, we update the execution logic to allow multiple internal events to be emitted. This is useful to handle potential out of order state transitions, as they can be cached, then emitted once the relevant pre-conditions have been met.

This fixes an isuse that can occur when we have concurrent calls to `Stop` while the state machine is driving forward.

Roasbeef added spec no-changelog protocol fsm labels Jan 3, 2024

Roasbeef added this to the v0.18.0 milestone Jan 3, 2024

Roasbeef requested a review from ProofOfKeags January 3, 2024 01:39

Roasbeef force-pushed the fn-module-goodies branch from 9c090b1 to f35b72e Compare January 3, 2024 02:35

Roasbeef force-pushed the protofsm branch from 1b5bd31 to 4aff35b Compare January 3, 2024 02:36

yyforyongyu reviewed Jan 3, 2024

View reviewed changes

Roasbeef commented Jan 4, 2024

View reviewed changes

protofsm/state_machine.go Outdated Show resolved Hide resolved

Roasbeef commented Jan 4, 2024

View reviewed changes

protofsm/state_machine_test.go Outdated Show resolved Hide resolved

saubyk assigned Roasbeef Jan 4, 2024

ProofOfKeags reviewed Jan 8, 2024

View reviewed changes

Roasbeef force-pushed the fn-module-goodies branch from f35b72e to 1d1c138 Compare January 24, 2024 03:12

Roasbeef changed the base branch from fn-module-goodies to master January 24, 2024 03:21

Roasbeef force-pushed the protofsm branch from 6c75f3e to b1a273c Compare January 24, 2024 03:24

Roasbeef force-pushed the protofsm branch from 66d9199 to 345bd6d Compare February 6, 2024 03:06

Roasbeef requested review from ProofOfKeags and yyforyongyu February 6, 2024 03:07

guggero reviewed Feb 6, 2024

View reviewed changes

protofsm/state_machine_test.go Outdated Show resolved Hide resolved

Roasbeef force-pushed the protofsm branch 2 times, most recently from e0265c1 to 057c481 Compare February 7, 2024 00:16

ProofOfKeags suggested changes Feb 8, 2024

View reviewed changes

protofsm/daemon_events.go Show resolved Hide resolved

protofsm/state_machine.go Show resolved Hide resolved

protofsm/state_machine.go Outdated Show resolved Hide resolved

Roasbeef force-pushed the protofsm branch from 057c481 to f8c9d29 Compare February 29, 2024 22:47

Roasbeef force-pushed the protofsm branch from 65f991b to bc513b1 Compare March 5, 2024 05:52

Roasbeef force-pushed the protofsm branch from 06c691e to dee8e41 Compare September 24, 2024 06:52

ProofOfKeags reviewed Nov 13, 2024

View reviewed changes

Roasbeef force-pushed the protofsm branch from dee8e41 to 4ffe409 Compare November 14, 2024 01:15

Crypt-iQ reviewed Nov 14, 2024

View reviewed changes

ProofOfKeags self-requested a review November 14, 2024 17:57

ProofOfKeags reviewed Nov 14, 2024

View reviewed changes