Skip to content

[1/4] - protofsm: add new package for driving generic protocol FSMs #8337

Merged
Roasbeef merged 12 commits intomasterfrom
protofsm
Nov 19, 2024
Merged

[1/4] - protofsm: add new package for driving generic protocol FSMs #8337
Roasbeef merged 12 commits intomasterfrom
protofsm

Conversation

@Roasbeef
Copy link
Copy Markdown
Member

@Roasbeef Roasbeef commented Jan 3, 2024

In this PR, we create a new package, protofsm which is intended to
abstract away from something we've done dozens of time in the daemon:
create a new event-drive protocol FSM. One example of this is the co-op
close state machine, and also the channel state machine itself.

This packages picks out the common themes of:

  • clear states and transitions between them
  • calling out to special daemon adapters for I/O such as transaction
    broadcast or sending a message to a peer
  • cleaning up after state machine execution
  • notifying relevant callers of updates to the state machine

The goal of this PR, is that devs can now implement a state machine
based off of this primary interface:

// State defines an abstract state along, namely its state transition function
// that takes as input an event and an environment, and returns a state
// transition (next state, and set of events to emit). As state can also either
// be terminal, or not, a terminal event causes state execution to halt.
type State[Event any, Env Environment] interface {
	// ProcessEvent takes an event and an environment, and returns a new
	// state transition. This will be iteratively called until either a
	// terminal state is reached, or no further internal events are
	// emitted.
	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)

	// IsTerminal returns true if this state is terminal, and false otherwise.
	IsTerminal() bool
}

With their focus being only on each state transition, rather than all
the boiler plate involved (processing new events, advancing to
completion, doing I/O, etc, etc).

Instead, they just make their states, then create the state machine
given the starting state and env. The only other custom component needed
is something capable of mapping wire messages or other events from the
"outside world" into the domain of the state machine.

The set of types is based on a pseudo sum type system wherein you
declare an interface, make the sole method private, then create other
instances based on that interface. This restricts call sites (must pass
in that interface) type, and with some tooling, exhaustive matching can
also be enforced via a linter.

The best way to get a hang of the pattern proposed here is to check out
the tests. They make a mock state machine, and then use the new executor
to drive it to completion. You'll also get a view of how the code will
actually look, with the focus being on the: input event, current state,
and output transition (can also emit events to drive itself forward).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 3, 2024

Pull reviewers stats

Stats of the last 30 days for lnd:

User Total reviews Time to review Total comments
yyforyongyu
🥇
6
▀▀▀
3d 19h 57m
▀▀▀▀▀
3
Roasbeef
🥈
5
▀▀▀
1d 18h 46m
▀▀
18
▀▀▀▀▀▀▀
guggero
🥉
4
▀▀
1d 7h 51m
▀▀
3
bhandras
2
3h 27m
0
calvinrzachman
1
6m
1
ziggie1984
1
20h 14m
0

Copy link
Copy Markdown
Member

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like the uniformed StateMachine🤩 I think the blockbeat from #7951 can even fit in this picture - that there exists a set of universal events such as a new block event, and we force every state machine to process it. My main question is whether we could stop generalizing at ProcessEvent, and leave the implementations of executeDaemonEvent to specific subsystems? This naturally leads to the question of whether we need this DaemonAdapters interface, as it seems it's not a common functionality that's shared by all subsystems.

Still need to think through, but a few ideas,

  • we could make StateMachine an interface, and maybe add something like BaseMachine that has the minimal methods such as driveMachine.
  • I like that State is an interface which makes writing the tests much easier. It's just that the name is a bit confusing I guess, as it's sort like a processor, and each state has its own processor.

My understanding of the design is, an event-driven machine that's pipelined with state processors, the machine doesn't care about the specifics of the event, instead, it's the state processor's responsibility to handle the event and instruct a new state. I think we could stop here without distinguishing interval vs external events, apply it to a few subsystems to see its effect.

// executeDaemonEvent executes a daemon event, which is a special type of event
// that can be emitted as part of the state transition function of the state
// machine. An error is returned if the type of event is unknown.
func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like it's leaking the implementation details from other subsystems

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll have a better idea of the interaction once the new co-op close stuff is up, but the general idea is that:

  • All the state machine state transitions are pure functions
  • They emit events for the executor (prob should rename this struct slightly) to apply themselves
  • Something needs to be aware of the boundary between the pure state machine, and the daemon execution env it runs in
  • This thing handles that role of knowing all the global I/O or daemon actions to execute itself, and potentially emit an event back into the state machine (post execution hook)

Otherwise, what do you think should be handling the I/O between the daemon and the state machine?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it could hidden behind currentState.ProcessEvent? Since it generates the transition, it might as well process it based on the new transition, like broadcast or send message.

So the idea is that the actual state transitions never need to concern themselves with any of these details. They just emit the event, then wait for w/e new event to be sent in. There's no leakage of implementation details at this StateMachine level, as we'll pass in a concrete implementation based on lnd later, here's an idea of what that looks like: ce75ef8.

This is to be considered universal, just like the POSIX interface we all know and love today. In this case, our processes re these FSMs, and the syscalls ways to interact with the chain or daemon.

@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Jan 4, 2024

we could make StateMachine an interface,

Why do you think this should be an interface? The goal here is to provide a generic implementation that can drive any FSM, which is defined from that starting/initial state, and all the state transition functions. If you look at the test, it takes that mock state machine, and is able to drive that with the shared semantics of: terminal states, clean up functions, pure state transitions that emit any side effects as events, etc.

and leave the implementations of executeDaemonEvent to specific subsystems

The goal of those was to implement all the side effects we'd ever need in a single place. The daemon events added were just the ones I needed to implement the new co-op close state machine nearly from scratch. I think if we look at all the state machines we've written in the codebase, maybe there's ~10 daemon level adapters that are used continuously. One that's missing right now is requesting to be notified of something confirming.

Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a no-nit high level review here. My biggest squint was around the SendWhen impure pseudo-predicate. Not gonna lie, I don't like it. However, I suspect that the reason you went this route is that making it pure would require hooks into the state changes of surrounding subsystems in ways that would require significant changes to the overall LND codebase before this could be inserted.

That said, there still may be no way around it. The main concern here is that the polling approach may miss the opportunities it needs to send the message out. The example here is OnCommit/OnFlush where we poll and still owe a commitment so we can't send, but then we do the commit and immediately follow up with another state change, thereby re-falsifying the SendWhen predicate before the next poll cycle.

In the case of shutdown and the coop close negotiations, this technically violates the spec. Idk what the practical consequences of that would be (they may be benign), but unless we can synchronize directly into the channel update lifecycle, we can't really be spec compliant.

On the other hand, you could make the argument that it isn't the state machine's responsibility to understand when a message should be synchronized into the message stream at all. It's job is simply to generate the response and the caller would queue it for sending at the next possible opportunity. This is the approach I took with the coop close v1: The ChanCloser is completely unaware of how the messages are dispatched, it just knows what to send, not when or how.

// executeDaemonEvent executes a daemon event, which is a special type of event
// that can be emitted as part of the state transition function of the state
// machine. An error is returned if the type of event is unknown.
func (s *StateMachine[Event, Env]) executeDaemonEvent(event DaemonEvent) error {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think putting more things behind ProcessEvent would negatively impact testability. With this construction we can test the state transitions themselves in a pure environment and then wire up the execution of the generated events separately.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 24, 2024

Important

Review skipped

Auto reviews are limited to specific labels.

🏷️ Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Roasbeef Roasbeef changed the base branch from fn-module-goodies to master January 24, 2024 03:21
@Roasbeef
Copy link
Copy Markdown
Member Author

Roasbeef commented Feb 6, 2024

PTAL.

@Roasbeef Roasbeef force-pushed the protofsm branch 2 times, most recently from e0265c1 to 057c481 Compare February 7, 2024 00:16
Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main thing is I think we want to make state machines not able to "throw a disable" to another channel.

@Roasbeef
Copy link
Copy Markdown
Member Author

Pushed up a new set of commits with some bug fixes and some additional functionality that came in handy when starting to hook up the new RBF coop close state machine to the peer struct.

@lightninglabs-deploy
Copy link
Copy Markdown
Collaborator

@yyforyongyu: review reminder
@Crypt-iQ: review reminder
@morehouse: review reminder
@Roasbeef, remember to re-request review from reviewers when ready

Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated comments

Copy link
Copy Markdown
Collaborator

@Crypt-iQ Crypt-iQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI addressed

@ProofOfKeags ProofOfKeags self-requested a review November 14, 2024 17:57
Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go broadly speaking. There's some cleanup that needs to happen for CI and some nice simplifications using SpewLogClosure but we can get this approved today.

@Roasbeef Roasbeef force-pushed the protofsm branch 2 times, most recently from e8c30d5 to 66815e5 Compare November 15, 2024 00:07
Copy link
Copy Markdown
Collaborator

@ProofOfKeags ProofOfKeags left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Send it

In this PR, we create a new package, `protofsm` which is intended to
abstract away from something we've done dozens of time in the daemon:
create a new event-drive protocol FSM. One example of this is the co-op
close state machine, and also the channel state machine itself.

This packages picks out the common themes of:

  * clear states and transitions between them
  * calling out to special daemon adapters for I/O such as transaction
    broadcast or sending a message to a peer
  * cleaning up after state machine execution
  * notifying relevant callers of updates to the state machine

The goal of this PR, is that devs can now implement a state machine
based off of this primary interface:
```go
// State defines an abstract state along, namely its state transition function
// that takes as input an event and an environment, and returns a state
// transition (next state, and set of events to emit). As state can also either
// be terminal, or not, a terminal event causes state execution to halt.
type State[Event any, Env Environment] interface {
	// ProcessEvent takes an event and an environment, and returns a new
	// state transition. This will be iteratively called until either a
	// terminal state is reached, or no further internal events are
	// emitted.
	ProcessEvent(event Event, env Env) (*StateTransition[Event, Env], error)

	// IsTerminal returns true if this state is terminal, and false otherwise.
	IsTerminal() bool
}
```

With their focus being _only_ on each state transition, rather than all
the boiler plate involved (processing new events, advancing to
completion, doing I/O, etc, etc).

Instead, they just make their states, then create the state machine
given the starting state and env. The only other custom component needed
is something capable of mapping wire messages or other events from the
"outside world" into the domain of the state machine.

The set of types is based on a pseudo sum type system wherein you
declare an interface, make the sole method private, then create other
instances based on that interface. This restricts call sites (must pass
in that interface) type, and with some tooling, exhaustive matching can
also be enforced via a linter.

The best way to get a hang of the pattern proposed here is to check out
the tests. They make a mock state machine, and then use the new executor
to drive it to completion. You'll also get a view of how the code will
actually look, with the focus being on the: input event, current state,
and output transition (can also emit events to drive itself forward).
In this commit, we add an optional daemon event that can be specified to
dispatch during init. This is useful for instances where before we
start, we want to make sure we have a registered spend/conf notification
before normal operation starts.

We also add new unit tests to cover this, and the prior spend/conf event
additions.
In this commit, we add the ability for the state machine to consume wire
messages. This'll allow the creation of a new generic message router
that takes the place of the current peer `readHandler` in an upcoming
commit.
This'll be used later to uniquely identify state machines for
routing/dispatch purposes.
We'll use this to be able to signal to a caller that a critical error
occurred during the state transition.
Adding this makes a state machine easier to unit test, as the caller can
specify a custom polling interval.
In this commit, we add the SpendMapper which allows callers to create
custom spent events. Before this commit, the caller would be able to
have an event sent to them in the case a spend happens, but that event
wouldn't have any of the relevant spend details.

With this new addition, the caller can specify how to take a generic
spend event, and transform it into the state machine specific spend
event.
In this commit, we update the execution logic to allow multiple internal
events to be emitted. This is useful to handle potential out of order
state transitions, as they can be cached, then emitted once the relevant
pre-conditions have been met.
This fixes an isuse that can occur when we have concurrent calls to
`Stop` while the state machine is driving forward.
@Roasbeef Roasbeef merged commit c3fac0e into master Nov 19, 2024
@guggero guggero deleted the protofsm branch February 4, 2025 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.

8 participants