Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous health checks #13

Open
uniqueg opened this issue Nov 13, 2020 · 4 comments
Open

Asynchronous health checks #13

uniqueg opened this issue Nov 13, 2020 · 4 comments
Labels
flag: good 1st issue Good for newcomers priority: medium Medium priority type: feature New feature or request type flag: spec change Proposed change requires spec changes workload: days Likely takes days to resolve

Comments

@uniqueg
Copy link
Member

uniqueg commented Nov 13, 2020

To give clients an idea of the stability of a given services, an (optional) daemon could be implemented in this service that periodically sends heartbeat requests to individual services (e.g., to their /GET service-info endpoints). In order to provide this information to clients effectively, the ExternalService schema could be extended with an object property that provides some or all of the following (and possibly more) information:

  • Status code of last heartbeat
  • Time of last heartbeat
  • Time of last non-success status
  • Time of last success status

The frequency of heartbeats (and timeout!) is probably something that the admin of the cloud registry should set up in the app configuration.

@uniqueg uniqueg added priority: medium Medium priority status flag: discuss Needs input from various people status flag: help wanted Extra attention is needed status: blocked Something prevents progress type: feature New feature or request workload: weeks Likely takes weeks to resolve type flag: spec change Proposed change requires spec changes labels Nov 13, 2020
@uniqueg uniqueg added workload: days Likely takes days to resolve and removed workload: weeks Likely takes weeks to resolve labels Mar 1, 2021
@uniqueg
Copy link
Member Author

uniqueg commented Mar 1, 2021

Eventually this is probably something that should be discussed with the GA4GH to be implemented globally in the specs. However, for now I think this can be implemented in a relatively simple way:

  • Provide status code and time of last heartbeat, and time of last (un)successful heartbeat request via a dedicated endpoint, e.g., GET services/{service_id}/health
  • Add parameter to set heartbeat frequency and timeout to app configuration in cloud_registry/config.yaml

@uniqueg uniqueg removed status: blocked Something prevents progress status flag: discuss Needs input from various people status flag: help wanted Extra attention is needed labels Mar 1, 2021
@uniqueg
Copy link
Member Author

uniqueg commented Mar 1, 2021

FYI, related discussion at GA4GH, but nothing concrete, so would go ahead as outlined

@uniqueg uniqueg added the flag: good 1st issue Good for newcomers label Mar 1, 2021
@uniqueg
Copy link
Member Author

uniqueg commented Mar 1, 2021

cwl-WES has an implementation of a daemon that runs tasks asynchronously in the background, although for a very different purpose. Perhaps there is a Python API for running something like cron jobs... In any case, it's important that these background checks are scalable over hundreds or even thousands (but certainly not millions) of services, so heartbeat frequency should probably have a reasonable minimum value of once every 30 minutes or so, with a max timeout of 3 seconds.

@uniqueg
Copy link
Member Author

uniqueg commented Mar 1, 2021

Related issue #20, could be implemented in coordination with this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flag: good 1st issue Good for newcomers priority: medium Medium priority type: feature New feature or request type flag: spec change Proposed change requires spec changes workload: days Likely takes days to resolve
Projects
None yet
Development

No branches or pull requests

1 participant