Skip to content

Remove cron dependency for worker health checker#1638

Merged
zachmargolis merged 1 commit intomasterfrom
margolis-health-check-without-cron
Aug 24, 2017
Merged

Remove cron dependency for worker health checker#1638
zachmargolis merged 1 commit intomasterfrom
margolis-health-check-without-cron

Conversation

@zachmargolis
Copy link
Contributor

Why: cron (leader host) is not compatible with autoscaling groups

@monfresh
Copy link
Contributor

Does this mean we can't use cron jobs for anything? If so, will we need to rework everything in config/schedule.rb? Also, what will be the new frequency of checking the worker health?

Copy link
Contributor

@brodygov brodygov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. It would also be good to put some comments in the commit or in the code to explain that we now have the health check kick off the dummy job.

**Why**: cron (leader host) is not compatible with autoscaling groups
Now, when our load balancer hits the worker health check endpoint
(which it already does) we will enqueue dummy jobs so make sure we're
healthy
@zachmargolis zachmargolis force-pushed the margolis-health-check-without-cron branch from 78caaa3 to a346500 Compare August 24, 2017 18:01
@zachmargolis zachmargolis merged commit 679a6ac into master Aug 24, 2017
@zachmargolis zachmargolis deleted the margolis-health-check-without-cron branch August 24, 2017 18:10
@brodygov
Copy link
Contributor

Right, in an auto scaling group all the instances are identical, so we don't have any way to ensure that cron jobs are only run on a single instance.

I've seen a few different strategies to deal with this most commonly:

  • Have a leader election to elect one app server leader, and have it run all the jobs
  • Run jobs from cron on all servers, but use a distributed lock system (could be backed by redis or the database) so that a server takes out a lock for the job before running it to ensure that it only gets run once per interval
  • Run jobs from a queueing system that is guaranteed to enqueue jobs at the rate you expect, either by triggering from an external system like new relic, or having a distributed queue system (which probably does a leader election or locks internally)

@zachmargolis
Copy link
Contributor Author

We're currently working on getting ahold of Sidekiq Enterprise to run our cronjobs for us for things like SFTPing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants