Skip to content

fix: correct Redis client counting metric#8161

Merged
carodewig merged 13 commits intodevfrom
caroline/fix-redis-client-counter
Sep 19, 2025
Merged

fix: correct Redis client counting metric#8161
carodewig merged 13 commits intodevfrom
caroline/fix-redis-client-counter

Conversation

@carodewig
Copy link
Contributor

@carodewig carodewig commented Aug 28, 2025

This PR removes the apollo.router.cache.redis.connections metric and replaces it with a apollo.router.cache.redis.clients.

The connections metric was implemented with an up-down counter which would sometimes not be collected properly (i.e. it could go negative). The name *.connections was also inaccurate given that our Redis clients will each make multiple connections, one to each node in the Redis pool (if in clustered mode).

The new clients metric counts the number of clients across the router via an AtomicU64 and surfaces that value in a gauge.

Caveat: the old metric included a kind attribute to reflect the number of clients in each pool (ie entity caching, query planning). The new metric does not include this attribute as doing so would require we store a global Arc<RwLock<HashMap>> (to be able to track each pool type separately). I'm happy to change the implementation to do this, but as the purpose of the metric is to make sure the number of clients isn't growing unbounded (#7319), it seemed unnecessary.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link

apollo-librarian bot commented Aug 28, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 1 changed, 0 removed
* graphos/routing/(latest)/observability/telemetry/instrumentation/standard-instruments.mdx

Build ID: 39049c2d889800e1a1c8ef5a
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/39049c2d889800e1a1c8ef5a

@github-actions

This comment has been minimized.

@carodewig carodewig marked this pull request as ready for review September 2, 2025 16:00
@carodewig carodewig requested a review from a team September 2, 2025 16:00
Copy link
Contributor

@BrynCooke BrynCooke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs and changelog, but otherwise it looks OK.

bnjjj
bnjjj previously requested changes Sep 3, 2025
-1,
kind = caller
);
ACTIVE_CLIENT_COUNT.fetch_sub(1, Ordering::Relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It would be nice if the sub was in a Drop implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can put it in the DropSafeRedisPool Drop implementation, but I think this accomplishes the same thing: 547bd2a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for posterity: I think having the decrement in a Drop implementation would actually make the metric not useful, as the behavior in #7319 was that dropping the client didn't actually terminate the connection.

@carodewig
Copy link
Contributor Author

@BrynCooke Does #8174 eliminate the need for this PR? I prefer the existing UpDownCounter implementation to the global I had to add here.

@BrynCooke
Copy link
Contributor

@BrynCooke Does #8174 eliminate the need for this PR? I prefer the existing UpDownCounter implementation to the global I had to add here.

Not yet, there are other followup PRs. I'll deal with the conversion from gauges back to updown counters separately.

@BrynCooke
Copy link
Contributor

Do docs need updating?

@carodewig carodewig requested a review from a team as a code owner September 17, 2025 20:41
@carodewig
Copy link
Contributor Author

@BrynCooke Yep, good catch!

I marked the changeset as a feat because we're adding a metric and removing another, should it be marked as fix or breaking instead?

@BrynCooke
Copy link
Contributor

Seeing as the old metric was broken it can be a fix, but make sure to include wording in the changelog as to why the old metric has gone and what users need to do.

@carodewig carodewig dismissed bnjjj’s stale review September 19, 2025 13:28

Unit added to metric

@carodewig carodewig merged commit 8ad820d into dev Sep 19, 2025
15 checks passed
@carodewig carodewig deleted the caroline/fix-redis-client-counter branch September 19, 2025 13:28
@abernix abernix mentioned this pull request Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants