fix: correct Redis client counting metric#8161
Conversation
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 1 changed, 0 removedBuild ID: 39049c2d889800e1a1c8ef5a URL: https://www.apollographql.com/docs/deploy-preview/39049c2d889800e1a1c8ef5a |
This comment has been minimized.
This comment has been minimized.
BrynCooke
left a comment
There was a problem hiding this comment.
Needs docs and changelog, but otherwise it looks OK.
apollo-router/src/cache/redis.rs
Outdated
| -1, | ||
| kind = caller | ||
| ); | ||
| ACTIVE_CLIENT_COUNT.fetch_sub(1, Ordering::Relaxed); |
There was a problem hiding this comment.
Nit: It would be nice if the sub was in a Drop implementation.
There was a problem hiding this comment.
I don't think I can put it in the DropSafeRedisPool Drop implementation, but I think this accomplishes the same thing: 547bd2a
There was a problem hiding this comment.
Comment for posterity: I think having the decrement in a Drop implementation would actually make the metric not useful, as the behavior in #7319 was that dropping the client didn't actually terminate the connection.
|
@BrynCooke Does #8174 eliminate the need for this PR? I prefer the existing |
Not yet, there are other followup PRs. I'll deal with the conversion from gauges back to updown counters separately. |
|
Do docs need updating? |
|
@BrynCooke Yep, good catch! I marked the changeset as a |
|
Seeing as the old metric was broken it can be a fix, but make sure to include wording in the changelog as to why the old metric has gone and what users need to do. |
This PR removes the
apollo.router.cache.redis.connectionsmetric and replaces it with aapollo.router.cache.redis.clients.The
connectionsmetric was implemented with an up-down counter which would sometimes not be collected properly (i.e. it could go negative). The name*.connectionswas also inaccurate given that our Redis clients will each make multiple connections, one to each node in the Redis pool (if in clustered mode).The new
clientsmetric counts the number of clients across the router via anAtomicU64and surfaces that value in a gauge.Caveat: the old metric included a
kindattribute to reflect the number of clients in each pool (ie entity caching, query planning). The new metric does not include this attribute as doing so would require we store a globalArc<RwLock<HashMap>>(to be able to track each pool type separately). I'm happy to change the implementation to do this, but as the purpose of the metric is to make sure the number of clients isn't growing unbounded (#7319), it seemed unnecessary.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩