feat(metrics): add error counters for comprehensive monitoring coverage#3729
feat(metrics): add error counters for comprehensive monitoring coverage#3729
Conversation
|
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
|
📝 WalkthroughWalkthroughThis change refactors and extends Prometheus metrics across several internal packages. It introduces new error-tracking metrics, standardizes metric namespaces to "unkey," updates label sets and naming conventions, and removes or replaces legacy metrics. Some metrics are renamed for clarity, and error counters are consistently added for key, cache, circuit breaker, HTTP, rate-limit, batch, buffer, and database operations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Service
participant Prometheus
Client->>Service: Perform operation (e.g., DB, HTTP, Key, etc.)
Service->>Prometheus: Increment main metric (with "unkey" namespace)
alt Error occurs
Service->>Prometheus: Increment corresponding error counter
end
Service-->>Client: Return result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (13)
go/internal/services/keys/validation.go(0 hunks)go/pkg/circuitbreaker/lib.go(2 hunks)go/pkg/circuitbreaker/metrics.go(0 hunks)go/pkg/db/replica.go(5 hunks)go/pkg/prometheus/metrics/batch.go(1 hunks)go/pkg/prometheus/metrics/buffer.go(1 hunks)go/pkg/prometheus/metrics/cache.go(6 hunks)go/pkg/prometheus/metrics/chproxy.go(2 hunks)go/pkg/prometheus/metrics/circuitbreaker.go(1 hunks)go/pkg/prometheus/metrics/database.go(2 hunks)go/pkg/prometheus/metrics/http.go(4 hunks)go/pkg/prometheus/metrics/keys.go(2 hunks)go/pkg/prometheus/metrics/ratelimit.go(9 hunks)
💤 Files with no reviewable changes (2)
- go/pkg/circuitbreaker/metrics.go
- go/internal/services/keys/validation.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.go: Follow comprehensive documentation guidelines for Go code as described in go/GO_DOCUMENTATION_GUIDELINES.md
Every public function/type in Go code must be documented
Prefer interfaces for testability in Go code
Use AIDEV-* comments for complex/important code in Go services
Files:
go/pkg/db/replica.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/buffer.gogo/pkg/prometheus/metrics/batch.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/circuitbreaker.gogo/pkg/prometheus/metrics/ratelimit.gogo/pkg/prometheus/metrics/database.go
**/*.{env,js,ts,go}
📄 CodeRabbit Inference Engine (CLAUDE.md)
All environment variables must follow the format: UNKEY_<SERVICE_NAME>_VARNAME
Files:
go/pkg/db/replica.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/buffer.gogo/pkg/prometheus/metrics/batch.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/circuitbreaker.gogo/pkg/prometheus/metrics/ratelimit.gogo/pkg/prometheus/metrics/database.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.
📚 Learning: for debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark queryrowcontex...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.
Applied to files:
go/pkg/db/replica.gogo/pkg/prometheus/metrics/database.go
📚 Learning: in go packages, variables defined in one file within a package (like `latencybuckets` and `constlabe...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Applied to files:
go/pkg/db/replica.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/ratelimit.gogo/pkg/prometheus/metrics/database.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Agent Local / test_agent_local
- GitHub Check: Build / Build
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test Packages / Test
🔇 Additional comments (21)
go/pkg/prometheus/metrics/keys.go (3)
16-16: Approve the refined comment for better clarity.The updated comment focusing on "API traffic patterns" removes the previous reference to error rates, which is now properly handled by the separate
KeyVerificationErrorsTotalmetric. This improves semantic separation between total verifications and error tracking.
22-22: LGTM: Namespace standardization.Adding the "unkey" namespace aligns with the PR objective to ensure all metrics are placed within the consistent namespace.
31-47: Excellent documentation for the new error metric.The documentation clearly distinguishes between program functionality errors (tracked by this metric) and business logic validation errors like "FORBIDDEN" or "RATE_LIMITED". The example usage and relationship to the total verifications metric is well explained.
go/pkg/prometheus/metrics/database.go (3)
15-24: Approve metric rename and namespace addition.The pluralization from
DatabaseOperationLatencytoDatabaseOperationsLatencyimproves naming consistency, and adding the "unkey" namespace aligns with the PR objectives. The documentation remains comprehensive with clear usage examples.
43-52: Approve metric rename and namespace addition.The pluralization from
DatabaseOperationTotaltoDatabaseOperationsTotalmaintains consistency with the latency metric, and the namespace addition follows the standardization pattern.
54-67: Well-documented error tracking metric.The new
DatabaseOperationsErrorsTotalmetric follows the established pattern for error tracking with appropriate labels and clear documentation. The example usage demonstrates proper implementation.go/pkg/db/replica.go (2)
46-47: LGTM: Consistent metric variable updates.The metric variable names have been correctly updated to match the pluralized forms defined in
go/pkg/prometheus/metrics/database.go. This maintains consistency across the codebase.
72-73: LGTM: All database operations consistently updated.All database operation methods (
PrepareContext,QueryContext,QueryRowContext,Begin) have been consistently updated to use the pluralized metric variable names, maintaining alignment with the metric definitions.Also applies to: 98-99, 122-123, 146-147
go/pkg/circuitbreaker/lib.go (2)
12-12: LGTM: Import addition for centralized metrics.Adding the prometheus metrics package import enables usage of the centralized circuit breaker metrics, aligning with the metric consolidation effort.
202-202: Labels Match Updated Metric DefinitionThe
CircuitBreakerRequestsmetric is defined with labels["service", "action"], and the call on line 202 ofgo/pkg/circuitbreaker/lib.go:metrics.CircuitBreakerRequests.WithLabelValues(cb.config.name, string(cb.state)).Inc()correctly maps
service → cb.config.nameandaction → cb.state. No changes required.go/pkg/prometheus/metrics/buffer.go (1)
51-65: Well-designed error tracking metric.The new
BufferErrorsTotalmetric follows the established pattern for error tracking across the codebase. The documentation is comprehensive with clear example usage, and the label design (name,error_type) provides appropriate granularity for monitoring buffer-specific errors.go/pkg/prometheus/metrics/batch.go (1)
88-103: LGTM! Well-structured error tracking metric.The new
BatchItemsProcessedErrorsTotalmetric follows established patterns and conventions. The documentation is comprehensive with clear example usage, and the metric structure is consistent with the relatedBatchItemsProcessedTotalmetric.go/pkg/prometheus/metrics/circuitbreaker.go (1)
22-33: LGTM! Comprehensive error tracking metric.The new
CircuitBreakerErrorsTotalmetric is well-structured with appropriate labels for service and error type, following established patterns for error tracking metrics.go/pkg/prometheus/metrics/cache.go (2)
22-22: LGTM! Consistent namespace standardization.All existing cache metrics have been properly updated with the "unkey" namespace, maintaining consistency across the metrics system.
Also applies to: 39-39, 57-57, 73-73, 89-89, 105-105
114-160: LGTM! Comprehensive error tracking for cache operations.The three new error metrics (
CacheReadsErrorsTotal,CacheWritesErrorsTotal,CacheRevalidationsErrorsTotal) provide excellent coverage for cache error monitoring. The documentation is thorough with clear examples, and the metric structure is consistent with the related operational metrics.go/pkg/prometheus/metrics/http.go (2)
60-60: LGTM! Consistent namespace standardization.All HTTP metrics have been properly updated with the "unkey" namespace while maintaining their existing functionality and label structures.
Also applies to: 77-77, 110-110
86-100: LGTM! Well-designed error tracking metric.The new
HTTPRequestErrorTotalmetric uses the same label structure asHTTPRequestTotal, ensuring consistency for error rate calculations. The documentation includes clear usage examples.go/pkg/prometheus/metrics/chproxy.go (2)
21-21: LGTM! Consistent namespace standardization.The existing ClickHouse proxy metrics have been properly updated with the "unkey" namespace while maintaining their functionality.
Also applies to: 53-53
30-44: LGTM! Comprehensive error tracking for ClickHouse proxy.The two new error metrics (
ChproxyErrorsTotalandChproxyRowsErrorsTotal) provide excellent coverage for both general proxy errors and row processing errors. The consistent use of the "endpoint" label across all ClickHouse proxy metrics enables effective error rate analysis.Also applies to: 62-76
go/pkg/prometheus/metrics/ratelimit.go (2)
22-22: LGTM! Namespace standardization implemented correctly.The addition of
Namespace: "unkey"to all existing rate-limit metrics correctly implements the standardization objective. This ensures consistent metric naming across the monitoring infrastructure.Also applies to: 37-37, 52-52, 67-67, 82-82, 97-97, 114-114, 130-130, 148-148
157-170: LGTM! Well-implemented error counter with comprehensive documentation.The new
RatelimitRefreshFromOriginErrorsTotalcounter follows Prometheus best practices:
- Proper naming convention with "_total" suffix
- Comprehensive godoc documentation with example usage
- Consistent configuration with existing metrics
- Logical pairing with
RatelimitRefreshFromOriginfor complete observabilityThis enhances monitoring coverage as intended by the PR objectives.
chronark
left a comment
There was a problem hiding this comment.
I think most of the error metrics do not apply, or need to be changed
if you actually use them, you'll find out :)
|
My plan was going to merge this and then actually instrument them (and thus find out what was useful and what wasn't). Will fix 😄 |
a1ef985 to
4a15c3b
Compare
4a15c3b to
8f6d9ac
Compare
Add missing error counter metrics. Ensure everything is in the unkey namespace. Add/update godoc comments
8f6d9ac to
839d8ed
Compare
|
@chronark I think I addressed everything. |
|
hmm okay |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (14)
go/internal/services/keys/validation.go(0 hunks)go/pkg/circuitbreaker/lib.go(2 hunks)go/pkg/circuitbreaker/metrics.go(0 hunks)go/pkg/db/replica.go(5 hunks)go/pkg/prometheus/metrics/batch.go(1 hunks)go/pkg/prometheus/metrics/buffer.go(1 hunks)go/pkg/prometheus/metrics/cache.go(6 hunks)go/pkg/prometheus/metrics/chproxy.go(2 hunks)go/pkg/prometheus/metrics/circuitbreaker.go(1 hunks)go/pkg/prometheus/metrics/database.go(2 hunks)go/pkg/prometheus/metrics/http.go(4 hunks)go/pkg/prometheus/metrics/keys.go(2 hunks)go/pkg/prometheus/metrics/panic.go(1 hunks)go/pkg/prometheus/metrics/ratelimit.go(9 hunks)
💤 Files with no reviewable changes (2)
- go/pkg/circuitbreaker/metrics.go
- go/internal/services/keys/validation.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go
📄 CodeRabbit Inference Engine (CLAUDE.md)
**/*.go: Follow comprehensive documentation guidelines for Go code as described in go/GO_DOCUMENTATION_GUIDELINES.md
Every public function/type in Go code must be documented
Prefer interfaces for testability in Go code
Use AIDEV-* comments for complex/important code in Go services
Files:
go/pkg/prometheus/metrics/panic.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/circuitbreaker.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/db/replica.gogo/pkg/prometheus/metrics/batch.gogo/pkg/prometheus/metrics/buffer.gogo/pkg/prometheus/metrics/database.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/ratelimit.go
**/*.{env,js,ts,go}
📄 CodeRabbit Inference Engine (CLAUDE.md)
All environment variables must follow the format: UNKEY_<SERVICE_NAME>_VARNAME
Files:
go/pkg/prometheus/metrics/panic.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/circuitbreaker.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/db/replica.gogo/pkg/prometheus/metrics/batch.gogo/pkg/prometheus/metrics/buffer.gogo/pkg/prometheus/metrics/database.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/ratelimit.go
🧠 Learnings (7)
📓 Common learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.
📚 Learning: in the metrics package init function, panicking on initialization errors is acceptable since it occu...
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.
Applied to files:
go/pkg/prometheus/metrics/panic.go
📚 Learning: in go packages, variables defined in one file within a package (like `latencybuckets` and `constlabe...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Applied to files:
go/pkg/prometheus/metrics/panic.gogo/pkg/circuitbreaker/lib.gogo/pkg/prometheus/metrics/keys.gogo/pkg/prometheus/metrics/chproxy.gogo/pkg/db/replica.gogo/pkg/prometheus/metrics/database.gogo/pkg/prometheus/metrics/cache.gogo/pkg/prometheus/metrics/http.gogo/pkg/prometheus/metrics/ratelimit.go
📚 Learning: applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,...
Learnt from: CR
PR: unkeyed/unkey#0
File: go/deploy/CLAUDE.md:0-0
Timestamp: 2025-07-21T18:05:58.236Z
Learning: Applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,h,cpp,java,rb,rs,php,pl,sql} : Use `AIDEV-NOTE:`, `AIDEV-TODO:`, `AIDEV-BUSINESS_RULE:`, or `AIDEV-QUESTION:` (all-caps prefix) as anchor comments aimed at AI and developers.
Applied to files:
go/pkg/prometheus/metrics/circuitbreaker.go
📚 Learning: applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,...
Learnt from: CR
PR: unkeyed/unkey#0
File: go/deploy/CLAUDE.md:0-0
Timestamp: 2025-07-21T18:05:58.236Z
Learning: Applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,h,cpp,java,rb,rs,php,pl,sql} : Do not remove `AIDEV-*`s without explicit human instruction.
Applied to files:
go/pkg/prometheus/metrics/circuitbreaker.go
📚 Learning: for debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark queryrowcontex...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.
Applied to files:
go/pkg/db/replica.gogo/pkg/prometheus/metrics/database.go
📚 Learning: the `cloudflareratelimiter` type definition in `apps/api/src/pkg/env.ts` should not have its interfa...
Learnt from: chronark
PR: unkeyed/unkey#2544
File: apps/api/src/pkg/env.ts:4-6
Timestamp: 2024-10-23T12:05:31.121Z
Learning: The `cloudflareRatelimiter` type definition in `apps/api/src/pkg/env.ts` should not have its interface changed; it should keep the `limit` method returning `Promise<{ success: boolean }>` without additional error properties.
Applied to files:
go/pkg/prometheus/metrics/ratelimit.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Build / Build
- GitHub Check: Test Agent Local / test_agent_local
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Packages / Test
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (23)
go/pkg/prometheus/metrics/batch.go (1)
88-103: LGTM! Well-structured error tracking metric.The new
BatchItemsProcessedErrorsTotalmetric follows the established patterns with proper namespace, clear documentation, and appropriate labeling. The example usage and help text provide good guidance for implementation.go/pkg/prometheus/metrics/keys.go (3)
22-22: LGTM! Namespace standardization applied.Adding the "unkey" namespace aligns with the broader metrics standardization effort across the codebase.
16-16: LGTM! Comment updated appropriately.The comment update to focus solely on API traffic patterns is correct since error rates are now tracked by the separate error metric below.
31-47: LGTM! Clear separation of error types.The new
KeyVerificationErrorsTotalmetric properly distinguishes between program functionality errors and key validation outcomes (like "FORBIDDEN"). The documentation clearly explains this distinction and provides good example usage.go/pkg/prometheus/metrics/database.go (3)
15-15: LGTM! Metric name pluralization improves consistency.Renaming from singular to plural form (
DatabaseOperationLatency→DatabaseOperationsLatency) makes the naming more consistent with other metrics in the codebase.Also applies to: 24-24
26-26: LGTM! Namespace standardization applied.Adding the "unkey" namespace aligns with the broader metrics standardization effort across the codebase.
Also applies to: 45-45
54-66: LGTM! Well-designed error tracking metric.The new
DatabaseOperationsErrorsTotalmetric follows established patterns and appropriately omits the status label since it's specifically for errors. The documentation and example usage are clear and helpful.go/pkg/circuitbreaker/lib.go (2)
12-12: LGTM! Import added for centralized metrics.Adding the import for the centralized prometheus metrics package aligns with the refactoring effort to consolidate metric definitions.
202-202: Metric labels verified and usage is correctI’ve confirmed that in go/pkg/prometheus/metrics/circuitbreaker.go, CircuitBreakerRequests is defined with labels []string{"service", "action"}, and in go/pkg/circuitbreaker/lib.go we call
metrics.CircuitBreakerRequests.WithLabelValues(cb.config.name, string(cb.state)).Inc()which maps
service→cb.config.nameandaction→cb.state. No changes required.go/pkg/db/replica.go (1)
46-47: LGTM! Consistent metric variable renaming.The renaming from singular to plural form (
DatabaseOperationLatency→DatabaseOperationsLatency,DatabaseOperationTotal→DatabaseOperationsTotal) is applied consistently across all database operation methods. This aligns with the pluralized metric names in the metrics package.Also applies to: 72-73, 98-99, 122-123, 146-147
go/pkg/prometheus/metrics/buffer.go (1)
51-65: LGTM! Well-structured error metric addition.The new
BufferErrorsTotalmetric follows the established pattern with proper:
- Namespace consistency ("unkey")
- Appropriate metric type (Counter for error tracking)
- Consistent labeling with existing buffer metrics
- Clear documentation and example usage
go/pkg/prometheus/metrics/circuitbreaker.go (2)
15-15: LGTM! Namespace standardization applied.The addition of the "unkey" namespace aligns with the broader standardization effort across the metrics package.
20-20: LGTM! Consistent labeling and well-structured error metric.The label consistency between
CircuitBreakerRequestsand the newCircuitBreakerErrorsTotalmetric using["service", "action"]is excellent. The new error metric follows the established pattern with proper documentation and namespace consistency.Also applies to: 27-33
go/pkg/prometheus/metrics/cache.go (2)
22-22: LGTM! Consistent namespace standardization.The addition of the "unkey" namespace to all existing cache metrics maintains consistency across the metrics package.
Also applies to: 39-39, 57-57, 73-73, 89-89, 105-105
114-128: Approve error metrics definitions
- The new
CacheReadsErrorsTotalandCacheRevalidationsErrorsTotalingo/pkg/prometheus/metrics/cache.goadhere to the existing naming, namespace (“unkey”), subsystem (“cache”), and labeling conventions.- Inspection of
go/pkg/cache/cache.goshows reads report misses via theokreturn value and revalidations log errors (no panics), so these counters will capture meaningful failure events.Ready to merge.
go/pkg/prometheus/metrics/http.go (2)
60-60: LGTM! Consistent namespace standardization.The addition of the "unkey" namespace to all HTTP metrics maintains consistency with the broader metrics package standardization effort.
Also applies to: 77-77, 110-110
86-100: LGTM! Well-structured HTTP error metric.The new
HTTPRequestErrorTotalmetric excellently complements the existing HTTP metrics with:
- Consistent labeling using
["method", "path", "status"]- Proper namespace and subsystem alignment
- Clear documentation and usage example
- Appropriate metric type for error tracking
go/pkg/prometheus/metrics/chproxy.go (4)
21-21: LGTM! Namespace standardization implemented correctly.The addition of the "unkey" namespace aligns with the PR objectives for consistent metric namespacing across the codebase.
30-44: Excellent implementation of error tracking metric.The new
ChproxyErrorsTotalmetric follows established patterns with proper documentation, consistent labeling, and appropriate naming conventions. This enhances observability for ClickHouse proxy error monitoring.
53-53: LGTM! Consistent namespace addition.This change maintains consistency with the namespace standardization applied to other metrics in this file.
62-76: Well-designed row processing error metric.The
ChproxyRowsErrorsTotalmetric properly complements the existingChproxyRowsTotalmetric by providing specific error tracking for row processing operations. The implementation follows all established conventions and documentation standards.go/pkg/prometheus/metrics/ratelimit.go (2)
22-22: LGTM! Comprehensive namespace standardization across all rate-limit metrics.All existing rate-limit metrics have been consistently updated to include the "unkey" namespace, covering gauges, counters, counter vectors, and histograms. This aligns perfectly with the PR objectives for metric namespace standardization.
Also applies to: 37-37, 52-52, 67-67, 82-82, 97-97, 114-114, 130-130, 148-148
157-170: Excellent addition of refresh error tracking.The
RatelimitRefreshFromOriginErrorsTotalmetric properly complements the existingRatelimitRefreshFromOrigincounter by providing specific error tracking for origin refresh operations. The implementation follows all documentation standards and naming conventions.
|
then I’ll approve |
|
I guess it's kinda like tdd?? here I create the metrics.. then I go attempt to make use of them later to "exercise" what I would expect. e.g. the |
|
yeah sure, I just don't know why you want to lengthen your feedback cycle by doing a PR each time :D |
Graphite Automations"Post a GIF when PR approved" took an action on this PR • (08/05/25)1 gif was posted to this PR based on Andreas Thomas's automation. |
|
I assumed you'd assume I knew wtf I was doing (to an extent) 🤣 |
|
haha yeah I do, that’s why I approved it and just leave you to cook :) |
|
No good jeopardy gifs exist for this wait... |
|
CI is fucked right now |
|
We can start with "make it better" 😸 |

Add missing error counter metrics. Ensure everything is in the unkey namespace. Add/update godoc comments
What does this PR do?
Fixes # (issue)
If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtconsole.logsgit pull origin mainAppreciated
Summary by CodeRabbit