Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api) Added Status API #167

Closed
wants to merge 5 commits into from
Closed

Conversation

ab623
Copy link

@ab623 ab623 commented Aug 8, 2023

I have recently picked up learning Go and trying to learn to contribute back into open source. And thought that Sablier would be a good project to get involved in.

I have been putting in the early stages of a status page, which intends to show the status of the services Sablier manages, along with useful information about those services, mainly, the status, expiration time, and remaining time.

I had to make quite a few updates across the solution. I have documented the rationale below

  • Gin - Updates to include status page. The response payload is designed in a way so that it can be extended with global configs.
  • KV store - This required an update of a new method to be able to get a value along with expiration information
  • Instance - Updated structs to store additional state data
  • Providers - Updated providers with a new function to get the list of managed containers, using label enable = true
  • Session Manager - Few methods to get the service status

Output:
Calling /api/status :

{
  "managed_services": [
    {
      "name": "whoami-secondary",
      "currentReplicas": 1,
      "desiredReplicas": 1,
      "status": "ready",
      "expires_on": "0001-01-01T00:00:00Z",
      "expires_in": 0
    },
    {
      "name": "whoami-primary",
      "currentReplicas": 0,
      "desiredReplicas": 1,
      "status": "unrecoverable",
      "message": "ready"
      "expires_on": "2023-08-08T11:57:59.3412899Z",
      "expires_in": 260
    }
  ]
}

Potential non-blocking issues:

I haven't used Go before, nor any of the libraries such as Gin, so some good feedback and advice would be helpful. I have tired to stick close to the format in the project.

@ab623 ab623 mentioned this pull request Aug 8, 2023
@ab623 ab623 changed the title feat(api) Added Status Page feat(api) Added Status API Aug 8, 2023
@acouvreur
Copy link
Owner

Hey @ab623, thanks for your contribution, I will be looking it it

@ab623
Copy link
Author

ab623 commented Aug 8, 2023

Thanks @acouvreur. Additionally I just pushed a commit to fix #153. I did it on this branch as it requires the functionality in the initial commits.

This works by checking all managed containers on startup, and if the containers are running already, it just applies the default timeout to them. If a containers is already stopped it just skips it.

This way the containers can start first, and then Sabiler can be started after and the containers that it manages will be brought down after the default timeout.

@acouvreur
Copy link
Owner

So for the status page we have a few things to consider:

  • Instances that have been started by name
  • Instances that have been started by group (auto discovery)

If it is started by name, we will only see the status when it's up, because as soon as it's down, the instance will be out of the KV store.

For instances started with the group name, then you can already use the group values to build the status page.
So I think you do not need to add anything to the providers.

What do you think about that?

@ab623
Copy link
Author

ab623 commented Aug 10, 2023

My implementation already accounts for that and is independent of the calling method.

It looks for the enable label and uses that to track all monitored containers.

Then if it is up, it exists in the KV and pulls the expiration data, if not then the service is down, but the container name is still shown in the output, just with a exited status.

Copy link
Owner

@acouvreur acouvreur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I agree with your changes, but I think we need to generify the usage to instance instead of container.

Also, I believe the startup feature is not as expected. We should review it again. And it should also be enabled through a configuration flag.

DesiredReplicas int `json:"desiredReplicas"`
Status string `json:"status"`
Message string `json:"message,omitempty"`
ExpiresAt time.Time `json:"expires_on"`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The property name is ExpiresAt but the json is expires_on
We should use one exclusively

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any preference on this?

}

func (s *Status) ServeStatus(c *gin.Context) {
containers, _ := s.SessionsManager.GetManagedContainers()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always containers, it might be swarm services, kubernetes deployments, etc.
We should use instances instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense here. I only have docker experience. I will look into using instances.

log.Debugf("Could not get list of managed containers to apply default")
}

var startContainers []string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not necessarily containers, we should use instances

@@ -69,6 +75,30 @@ func (sm *SessionsManager) initWatchers() {
go sm.consumeInstanceStopped(instanceStopped)
}

func (sm *SessionsManager) initContainerManagementStart() {
output, err := sm.provider.GetManagedContainers()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of GetManagedContainers, you should use sm.groups which is already populated with the discovered instances

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay using sm.groups does make sense and simplified things. In that case we don't need to update the interface, and I can remove the "GetManagedContainers".

I think I will need to flatten a copy of the sm.groups, as this is a map with the group name as the key. And I believe the status API should be flat with the group as a property of the instance, rather than being grouped by groupname (which the API consumer can do on the client side)

}
}

sm.RequestSession(startContainers, sm.config.Sessions.DefaultDuration)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure about this. I believe the behavior is that we want to stop the instances started we found which are not in the KV.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this partially. I believe that if I have an already running containers, and I restart sablier, it would bring my instances down instantly. I prefer having it this way, but I also see your argument for instantly bringing them down.

Maybe we can create a setting that is like STARTUP_DEFAULT_EXPIRATION_DURATION which can be set to "0m" for instant or whatever the user sets. If the setting it blank it defaults to the SESSIONS_DEFAULT_DURATION.

Would that work for you?

@@ -273,6 +303,38 @@ func (s *SessionsManager) RequestReadySessionGroup(ctx context.Context, group st
return s.RequestReadySession(ctx, names, duration, timeout)
}

func (s *SessionsManager) GetManagedContainers() ([]string, error) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily containers, use instance instead

return output, nil
}

func (s *SessionsManager) GetContainerStatus(name string) *InstanceState {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily containers, use instance instead

if !ok {
return EntityWithTimeout{}, ok
}
if e.expired() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this method should do any check on the timeouts and should not delete from the KV.

I think the method should be GetExpiration for a key.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with this. If I wanted to get a value and an expiration, under your suggestion I would need to call 2 separate functions, which COULD cause an issue whereby, I manage to retrieve the value, but in the time between calling Get and GetExpiration the key expired, it would cause an error because the Key would have expired.

Keeping it as 1 call to get the value and expiration as a single struct makes more sense in my opinion. Why do you think we shouldn't be checking on timeout or deleting from KV like Get does?

@acouvreur acouvreur added enhancement New feature or request help wanted Extra attention is needed provider Issue related to a provider labels Aug 11, 2023
@acouvreur acouvreur self-assigned this Aug 11, 2023
@ab623
Copy link
Author

ab623 commented Aug 12, 2023

Overall I agree with your changes, but I think we need to generify the usage to instance instead of container.

Also, I believe the startup feature is not as expected. We should review it again. And it should also be enabled through a configuration flag.

Thanks for the feedback. Glad my first foray into the Go world and this project was somewhat successful.

I will refactor this to use the instances and not containers as per your suggestions and review the startup feature. I have left a few comments on both. Would be good to get your thoughts on them.

@acouvreur
Copy link
Owner

Overall I agree with your changes, but I think we need to generify the usage to instance instead of container.
Also, I believe the startup feature is not as expected. We should review it again. And it should also be enabled through a configuration flag.

Thanks for the feedback. Glad my first foray into the Go world and this project was somewhat successful.

I will refactor this to use the instances and not containers as per your suggestions and review the startup feature. I have left a few comments on both. Would be good to get your thoughts on them.

Take your time for the refactor by the way, if you need some help feel free to ask me for it.

Thanks again for your contribution by the way :)

@ab623
Copy link
Author

ab623 commented Aug 27, 2023

Thanks. I'm travelling for a few weeks. So won't have time to work on this, but will pick it up when I'm back.

Appreciate the support.

I left a few comments on the review can you take a look and let me know your thoughts.

@ab623
Copy link
Author

ab623 commented Nov 2, 2023

Hey @acouvreur. I'm back from some holidays. Should I continue to work on this, or wait until the rewrite? Or are you building this in as part of the rewrite 😄

@acouvreur
Copy link
Owner

Hey @acouvreur. I'm back from some holidays. Should I continue to work on this, or wait until the rewrite? Or are you building this in as part of the rewrite 😄

Hey! Welcome back!

I'll add this to the rewrite! No need to try to merge conflict.

Feel free to try and play with the rewrite branch, help is very appreciated!

@ab623
Copy link
Author

ab623 commented Nov 2, 2023

Alright. Good to know! I'll close this PR. I'll give the refactor branch a go also.

@ab623 ab623 closed this Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed provider Issue related to a provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants