Skip to content

Conversation

stevesg
Copy link
Contributor

@stevesg stevesg commented Mar 8, 2021

What this PR does:
Use BasicService in state_replication.go for replication loop.

Original discussion:
#3839 (comment)

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Copy link
Contributor Author

@stevesg stevesg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having done this I mentioned to @pstibrany whether we should convert the whole Alertmanager to be a Service, but that would be a much bigger commit, and has more risk than this change which is reasonably isolated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed like the least-worst option for the time being but open to suggestions, alternatives being:

  • Add Stop() method to State interface - Means we would need to a Stop method to Peer in Prometheus
  • Storing the service separately as stateService or something, again.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! LGTM (modulo a couple of nits). I left few comments about the design evolution, but I would keep it for a follow-up if we agree on it.

am.dispatcher.Stop()
}

if service, ok := am.state.(*state); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more clean if you cast to the Service interface.

Suggested change
if service, ok := am.state.(*state); ok {
if service, ok := am.state.(services.Service); ok {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, the next step of refactoring I see (but I would do it in a separate PR) is to require am.state to implement the services.Service interface. We can wrap cfg.Peer into our own struct which implements services.Service (using services.NewIdleService()) and add services.Service to NilPeer too. This way we keep all implementations consistent.

Thoughts @stevesg @pstibrany ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable. We should consider how to call Settle for the non-sharded case too, ideally that would be consistent (the Service wrapper for the Peer wrapper does the settling in starting, as for the replication).

The alternative to the road we've started on, is to have a single "PeerService" which wraps any type of Peer, though that would then need to be extended with some sort of "run" function, so you probably end back at making them all services...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you probably end back at making them all services...

I think having separate services would keep things cleaner at the end. Code duplication should be minimal.

}

func (s *state) handleBroadcasts() {
func (s *state) running(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me leave some comments here, but not related to running(). Something to discuss about (but I would keep it separated from this PR):

  1. Why not doing the Settle() in the service starting?
  2. If we do There is no code and I am sad. #1, Ready() then would be waiting until service is running?

Thoughts @stevesg @pstibrany ?

Copy link
Contributor Author

@stevesg stevesg Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of implementing WaitReady in terms of the waiting for the service to be running, seems like the service starting/running model fits nicely onto settling/broadcasting. Also means we can remove the readyc channel, which is nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I was just going to ask why do we still need readyc channel, good that you suggest removing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea as well!

Signed-off-by: Steve Simpson <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (modulo a couple of nits) as 1st refactoring step.

Signed-off-by: Steve Simpson <[email protected]>
am.Stop()

if service, ok := am.state.(services.Service); ok {
if err := service.AwaitTerminated(context.Background()); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t we accept a context to StopAndWait method, and use that for waiting? (It has “wait” in the name, so it would make sense to me) In such case, we should also return error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As much as I love putting context everywhere (getting good at it), I think it's better to address this in a separate PR, as this is a pre-existing problem with the code (e.g. the other Stop() functions all need to be updated, and the calling code).

}

func (s *state) handleBroadcasts() {
func (s *state) running(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. I was just going to ask why do we still need readyc channel, good that you suggest removing it.

Copy link
Contributor

@gotjosh gotjosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Way better than the previous design!

}

func (s *state) handleBroadcasts() {
func (s *state) running(ctx context.Context) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this idea as well!

Signed-off-by: Steve Simpson <[email protected]>
@pracucci pracucci merged commit b8d4761 into cortexproject:master Mar 10, 2021
roystchiang pushed a commit to roystchiang/cortex that referenced this pull request Apr 6, 2022
…xproject#3930)

* Use BasicService in state_replication.go for replication loop.

Signed-off-by: Steve Simpson <[email protected]>

* Review comments.

Signed-off-by: Steve Simpson <[email protected]>

* Review comments.

Signed-off-by: Steve Simpson <[email protected]>

* Review comments.

Signed-off-by: Steve Simpson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants