Skip to content

feat: use Redis read replicas#8405

Merged
bnjjj merged 84 commits intodevfrom
caroline/redis-replicas
Nov 21, 2025
Merged

feat: use Redis read replicas#8405
bnjjj merged 84 commits intodevfrom
caroline/redis-replicas

Conversation

@carodewig
Copy link
Copy Markdown
Contributor

@carodewig carodewig commented Oct 10, 2025

Sends commands to Redis read replicas when (a) the command is read-only and (b) Redis is running in clustered mode.

Changes include:

  • fix scheme checking (plus updated tests for that)
  • direct read-related commands to replicas

The changes apply to all Redis caches, including entity, response, and query plan caches.

NB: enabling this required me to turn lazy connections off. When enabled, the Redis commands would queue in memory rather than ever being sent to a node. RTF comparison dashboard below, using gcloud memorystore - the first round of queries had lazy connections off, the second had them on:
Screenshot 2025-11-14 at 11 59 12 AM


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link
Copy Markdown
Contributor

apollo-librarian bot commented Oct 10, 2025

✅ Docs preview has no changes

The preview was not built because there were no changes.

Build ID: 4c573836b916a52b3f2ac928
Build Logs: View logs

@github-actions

This comment has been minimized.

@carodewig carodewig changed the title [draft] redis read replicas feat: use Redis read replicas Nov 17, 2025
@carodewig carodewig marked this pull request as ready for review November 17, 2025 16:17
@carodewig carodewig requested a review from a team November 17, 2025 16:17
@carodewig carodewig requested a review from a team as a code owner November 17, 2025 16:17
Comment thread .circleci/config.yml
Comment thread apollo-router/src/cache/redis.rs
Comment thread apollo-router/src/cache/redis.rs
.inner
.set::<(), _, _>(key, value, expiration, None, false)

// NOTE: we need a writer, so don't use replicas() here
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we used .replicas() here ? is it typesafe or would we have an error ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would get a Redis error during runtime, it would still compile

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth adding a monitor on our dashboards to detect that kind of behavior because it would be really critical. What do you think ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added! Manually created the alert for now, will codify it in Terraform next week after making sure it's not too flaky

Comment thread apollo-router/src/cache/redis.rs
Comment thread apollo-router/tests/redis_monitor.rs
Comment thread apollo-router/tests/redis_monitor.rs
Comment thread docker-compose.yml
@carodewig carodewig requested a review from bnjjj November 19, 2025 15:32
Copy link
Copy Markdown
Contributor

@aaronArinder aaronArinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all makes sense; but, it's probably worth getting a second opinion too just in case; one thing that worries me (and that we haven't tested, I don't think?) is what happens when a node gets kicked out of the cluster. Do we handle MOVEDs/redirection correctly? I assume so, but who knows!

};

// PR-8405: must not use lazy connections or else commands will queue rather than being sent
config.replica.lazy_connections = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better safe than sorry!

}

// Backwords compatibility with old redis client
// Backwards compatibility with old redis client
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Backwards compatibility with old redis client
// Backwoods compatibility with old redis client

@carodewig
Copy link
Copy Markdown
Contributor Author

One thing that worries me (and that we haven't tested, I don't think?) is what happens when a node gets kicked out of the cluster. Do we handle MOVEDs/redirection correctly?

@aaronArinder This should work just fine! We've tested this in the internal router (although with redis, not redis-cluster) where a replica instance was promoted and the router re-established the connection with minimal interruption.

@bnjjj bnjjj merged commit 3b5d2ed into dev Nov 21, 2025
14 of 15 checks passed
@bnjjj bnjjj deleted the caroline/redis-replicas branch November 21, 2025 09:26
carodewig added a commit that referenced this pull request Nov 21, 2025
carodewig added a commit that referenced this pull request Nov 21, 2025
This reverts commit 3b5d2ed.

(cherry picked from commit 171c245)

# Conflicts:
#	apollo-router/tests/integration/redis.rs
abernix added a commit that referenced this pull request Dec 1, 2025
@abernix abernix mentioned this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants