Remove cron dependency for worker health checker by zachmargolis · Pull Request #1638 · 18F/identity-idp

zachmargolis · 2017-08-24T17:33:13Z

Why: cron (leader host) is not compatible with autoscaling groups

monfresh · 2017-08-24T18:00:29Z

Does this mean we can't use cron jobs for anything? If so, will we need to rework everything in config/schedule.rb? Also, what will be the new frequency of checking the worker health?

brodygov

Looks good to me. It would also be good to put some comments in the commit or in the code to explain that we now have the health check kick off the dummy job.

**Why**: cron (leader host) is not compatible with autoscaling groups Now, when our load balancer hits the worker health check endpoint (which it already does) we will enqueue dummy jobs so make sure we're healthy

brodygov · 2017-08-24T19:54:40Z

Right, in an auto scaling group all the instances are identical, so we don't have any way to ensure that cron jobs are only run on a single instance.

I've seen a few different strategies to deal with this most commonly:

Have a leader election to elect one app server leader, and have it run all the jobs
Run jobs from cron on all servers, but use a distributed lock system (could be backed by redis or the database) so that a server takes out a lock for the job before running it to ensure that it only gets run once per interval
Run jobs from a queueing system that is guaranteed to enqueue jobs at the rate you expect, either by triggering from an external system like new relic, or having a distributed queue system (which probably does a leader election or locks internally)

zachmargolis · 2017-08-24T19:58:58Z

We're currently working on getting ahold of Sidekiq Enterprise to run our cronjobs for us for things like SFTPing

zachmargolis added the status - ready for review label Aug 24, 2017

zachmargolis self-assigned this Aug 24, 2017

zachmargolis requested review from brodygov, jmhooper and monfresh August 24, 2017 17:33

brodygov approved these changes Aug 24, 2017

View reviewed changes

Remove cron dependency for worker health checker

a346500

**Why**: cron (leader host) is not compatible with autoscaling groups Now, when our load balancer hits the worker health check endpoint (which it already does) we will enqueue dummy jobs so make sure we're healthy

zachmargolis force-pushed the margolis-health-check-without-cron branch from 78caaa3 to a346500 Compare August 24, 2017 18:01

zachmargolis merged commit 679a6ac into master Aug 24, 2017

zachmargolis deleted the margolis-health-check-without-cron branch August 24, 2017 18:10

zachmargolis mentioned this pull request Sep 11, 2017

Revert "Remove cron dependency for worker health checker" #1667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove cron dependency for worker health checker#1638

Remove cron dependency for worker health checker#1638
zachmargolis merged 1 commit intomasterfrom
margolis-health-check-without-cron

zachmargolis commented Aug 24, 2017

Uh oh!

monfresh commented Aug 24, 2017

Uh oh!

brodygov left a comment

Uh oh!

brodygov commented Aug 24, 2017

Uh oh!

zachmargolis commented Aug 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zachmargolis commented Aug 24, 2017

Uh oh!

monfresh commented Aug 24, 2017

Uh oh!

brodygov left a comment

Choose a reason for hiding this comment

Uh oh!

brodygov commented Aug 24, 2017

Uh oh!

zachmargolis commented Aug 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants