WIP - Add RedisSessionHealthChecker by pauldoomgov · Pull Request #7547 · 18F/identity-idp

pauldoomgov · 2022-12-28T23:51:27Z

🎫 Ticket

https://github.com/18F/identity-devops/issues/5629

🛠 Summary of changes

Add a Redis session check to the existing /api/health endpoint to ensure a working connection with Redis when assessing the health of a server.

📜 Testing Plan

TBD

Step 1
Step 2
Step 3

👀 Screenshots

TBD

If relevant, include a screenshot or screen capture of the changes.

Before:

After:

changelog: Internal, Platform, Improve health checking Adds a check to attempt writing to the Redis session store then reading the key back. A default TTL of 1 second is used. The check will fail if: * Redis is not writeable * The value can not be read back from Redis * It takes more than 1 second to read back from Redis

mitchellhenke · 2022-12-29T00:39:22Z

app/services/redis_session_health_checker.rb

+
+  # @api private
+  def health_write_and_read
+    REDIS_POOL.with do |client|


Not a blocker, but would we want to check all the Redis pools?

We could at least add a check for the rate limiting Redis. Attempts storage is not always enabled so seems like a conditional health check would be needed for that.

Should we combine it all here in one spot and call this RedisHealthChecker?

I think that makes sense, yeah.

I guess they could be separate too, I don't have strong feelings, whatever is easier.

mitchellhenke · 2022-12-29T00:42:35Z

app/services/redis_session_health_checker.rb

+
+  # @return [HealthCheckSummary]
+  def check
+    HealthCheckSummary.new(healthy: true, result: health_write_and_read)


Similar to the Outbound health check, would we want to cache this for a period of time so that we mitigate this as a potential vector for Denial of Service? I suppose we'd want to be careful to not use Redis as the cache as well.

We are multiproc now and may be multithread soon so getting a per-instance global might be tricky. (*For me, a Ruby n00b.)

That said we are talking hundreds of calls a minute for normal health checking. Some sort of caching is a good idea.

ooo when did we upgrade to multithreaded?

Haven't yet @zachmargolis... I updated my note.

That said we are talking hundreds of calls a minute for normal health checking. Some sort of caching is a good idea.

Maybe a whole separate convo, but if this health check is happening so frequently, should we break redis health into a separate endpoint? Like separate high frequency "can you serve traffic" health check endpoints from medium frequency "are all connections working 100% normal"?

Because if redis is down, typically it's down for all instances, so doing one quick check across all instances every so often (1/minute) is fine vs 1/instance/minute or whatever?

I am thinking of some nasty failure modes like if Redis is unavailable from one of the AZs - We want all the instances in that AZ to go bye-bye. A Redis call is cheap and made for nearly every page served, which makes me worry about the frequency this will run less.

We spoke a bit and I think we should defer caching unless it becomes a problem.

mitchellhenke

This is minor suggestion that moves it in the direction of potentially supporting multiple Redis checks here.

app/services/redis_session_health_checker.rb

Co-authored-by: Mitchell Henke <mitchell.henke@gsa.gov>

aduth · 2023-02-24T21:02:31Z

Checking in on the status of old pull requests. Is this still being worked on, or can we close it?

aduth · 2023-03-02T16:25:02Z

Closing due to inactivity. This can be restored / reopened in the future if needed.

pauldoomgov added the status - work in progress label Dec 28, 2022

pauldoomgov force-pushed the pauldoom/sick-cache branch from 47c788a to 9f822dc Compare December 28, 2022 23:59

mitchellhenke reviewed Dec 29, 2022

View reviewed changes

mitchellhenke mentioned this pull request Dec 29, 2022

Remove account reset health check #7548

Merged

mitchellhenke reviewed Dec 30, 2022

View reviewed changes

app/services/redis_session_health_checker.rb Outdated Show resolved Hide resolved

app/services/redis_session_health_checker.rb Outdated Show resolved Hide resolved

pauldoomgov and others added 2 commits December 30, 2022 11:47

Update app/services/redis_session_health_checker.rb

9770981

Co-authored-by: Mitchell Henke <mitchell.henke@gsa.gov>

Update app/services/redis_session_health_checker.rb

9c94c19

Co-authored-by: Mitchell Henke <mitchell.henke@gsa.gov>

mitchellhenke mentioned this pull request Jan 5, 2023

Raise exception when session store cannot connect to Redis #7583

Merged

aduth closed this Mar 2, 2023

aduth deleted the pauldoom/sick-cache branch March 2, 2023 16:25

Conversation

pauldoomgov commented Dec 28, 2022

🎫 Ticket

🛠 Summary of changes

📜 Testing Plan

👀 Screenshots

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pauldoomgov Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mitchellhenke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aduth commented Feb 24, 2023

Uh oh!

aduth commented Mar 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pauldoomgov Dec 29, 2022 •

edited

Loading