Skip to content

feat(metrics): add error counters for comprehensive monitoring coverage#3729

Merged
imeyer merged 1 commit intomainfrom
push-ooyormzzsosr
Aug 5, 2025
Merged

feat(metrics): add error counters for comprehensive monitoring coverage#3729
imeyer merged 1 commit intomainfrom
push-ooyormzzsosr

Conversation

@imeyer
Copy link
Contributor

@imeyer imeyer commented Aug 4, 2025

Add missing error counter metrics. Ensure everything is in the unkey namespace. Add/update godoc comments

What does this PR do?

Fixes # (issue)

If there is not an issue for this, please create one first. This is used to tracking purposes and also helps use understand why this PR exists

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features
    • Introduced new Prometheus metrics to track errors across batch processing, buffer operations, cache operations, ClickHouse proxy, circuit breaker, database operations, HTTP requests, key verifications, and rate-limit refreshes.
  • Enhancements
    • Added a unified "unkey" namespace to all Prometheus metrics for improved consistency.
    • Renamed several metrics for clarity (e.g., from singular to plural forms).
    • Updated metric labels for better granularity and monitoring.
  • Removals
    • Removed obsolete metrics and related code, including key credit usage and legacy circuit breaker request tracking.

@vercel
Copy link

vercel bot commented Aug 4, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 5, 2025 1:53pm
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 5, 2025 1:53pm

@vercel vercel bot temporarily deployed to Preview – engineering August 4, 2025 17:57 Inactive
@changeset-bot
Copy link

changeset-bot bot commented Aug 4, 2025

⚠️ No Changeset found

Latest commit: 839d8ed

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 4, 2025

📝 Walkthrough

Walkthrough

This change refactors and extends Prometheus metrics across several internal packages. It introduces new error-tracking metrics, standardizes metric namespaces to "unkey," updates label sets and naming conventions, and removes or replaces legacy metrics. Some metrics are renamed for clarity, and error counters are consistently added for key, cache, circuit breaker, HTTP, rate-limit, batch, buffer, and database operations.

Changes

Cohort / File(s) Change Summary
Key Credit Metrics Removal
go/internal/services/keys/validation.go
Removes all Prometheus metric emission for key credit usage in withCredits; deletes related imports.
Circuit Breaker Metrics Refactor
go/pkg/circuitbreaker/lib.go, go/pkg/circuitbreaker/metrics.go
Switches circuit breaker metric emission to updated global metric; deletes old metrics file.
Database Metrics Rename & Error Counter
go/pkg/db/replica.go, go/pkg/prometheus/metrics/database.go
Renames database operation metrics to plural forms, adds namespace, and introduces new error counter.
Batch, Buffer Metrics Add Error Counters
go/pkg/prometheus/metrics/batch.go, go/pkg/prometheus/metrics/buffer.go
Adds new Prometheus error counter metrics for batch and buffer operations.
Cache Metrics Namespace & Error Counters
go/pkg/prometheus/metrics/cache.go
Adds namespace to all cache metrics and introduces error counters for reads and revalidations.
ClickHouse Proxy Metrics Namespace & Errors
go/pkg/prometheus/metrics/chproxy.go
Adds namespace to all chproxy metrics and introduces error counters for requests and rows.
Circuit Breaker Metrics Update
go/pkg/prometheus/metrics/circuitbreaker.go
Updates request metric labels and namespace, adds error counter for circuit breaker errors.
HTTP Metrics Namespace & Error Counter
go/pkg/prometheus/metrics/http.go
Adds namespace to all HTTP metrics and introduces an error counter metric.
Key Verification Metrics Update
go/pkg/prometheus/metrics/keys.go
Adds namespace, introduces new key verification error counter, removes unrelated metric.
Rate Limit Metrics Namespace & Error Counter
go/pkg/prometheus/metrics/ratelimit.go
Adds namespace to all rate-limit metrics and introduces error counter for origin refresh errors.
Panic Metrics Subsystem Update
go/pkg/prometheus/metrics/panic.go
Changes the PanicsTotal metric subsystem label from "handler" to "internal".

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Service
    participant Prometheus

    Client->>Service: Perform operation (e.g., DB, HTTP, Key, etc.)
    Service->>Prometheus: Increment main metric (with "unkey" namespace)
    alt Error occurs
        Service->>Prometheus: Increment corresponding error counter
    end
    Service-->>Client: Return result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • perkinsjr
  • mcstepp

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch push-ooyormzzsosr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel vercel bot temporarily deployed to Preview – dashboard August 4, 2025 17:57 Inactive
@imeyer imeyer marked this pull request as ready for review August 4, 2025 18:24
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2025

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 67dcae0 and a1ef985.

📒 Files selected for processing (13)
  • go/internal/services/keys/validation.go (0 hunks)
  • go/pkg/circuitbreaker/lib.go (2 hunks)
  • go/pkg/circuitbreaker/metrics.go (0 hunks)
  • go/pkg/db/replica.go (5 hunks)
  • go/pkg/prometheus/metrics/batch.go (1 hunks)
  • go/pkg/prometheus/metrics/buffer.go (1 hunks)
  • go/pkg/prometheus/metrics/cache.go (6 hunks)
  • go/pkg/prometheus/metrics/chproxy.go (2 hunks)
  • go/pkg/prometheus/metrics/circuitbreaker.go (1 hunks)
  • go/pkg/prometheus/metrics/database.go (2 hunks)
  • go/pkg/prometheus/metrics/http.go (4 hunks)
  • go/pkg/prometheus/metrics/keys.go (2 hunks)
  • go/pkg/prometheus/metrics/ratelimit.go (9 hunks)
💤 Files with no reviewable changes (2)
  • go/pkg/circuitbreaker/metrics.go
  • go/internal/services/keys/validation.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.go: Follow comprehensive documentation guidelines for Go code as described in go/GO_DOCUMENTATION_GUIDELINES.md
Every public function/type in Go code must be documented
Prefer interfaces for testability in Go code
Use AIDEV-* comments for complex/important code in Go services

Files:

  • go/pkg/db/replica.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/buffer.go
  • go/pkg/prometheus/metrics/batch.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/circuitbreaker.go
  • go/pkg/prometheus/metrics/ratelimit.go
  • go/pkg/prometheus/metrics/database.go
**/*.{env,js,ts,go}

📄 CodeRabbit Inference Engine (CLAUDE.md)

All environment variables must follow the format: UNKEY_<SERVICE_NAME>_VARNAME

Files:

  • go/pkg/db/replica.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/buffer.go
  • go/pkg/prometheus/metrics/batch.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/circuitbreaker.go
  • go/pkg/prometheus/metrics/ratelimit.go
  • go/pkg/prometheus/metrics/database.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.
📚 Learning: for debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark queryrowcontex...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.

Applied to files:

  • go/pkg/db/replica.go
  • go/pkg/prometheus/metrics/database.go
📚 Learning: in go packages, variables defined in one file within a package (like `latencybuckets` and `constlabe...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.

Applied to files:

  • go/pkg/db/replica.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/ratelimit.go
  • go/pkg/prometheus/metrics/database.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Build / Build
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Test Packages / Test
🔇 Additional comments (21)
go/pkg/prometheus/metrics/keys.go (3)

16-16: Approve the refined comment for better clarity.

The updated comment focusing on "API traffic patterns" removes the previous reference to error rates, which is now properly handled by the separate KeyVerificationErrorsTotal metric. This improves semantic separation between total verifications and error tracking.


22-22: LGTM: Namespace standardization.

Adding the "unkey" namespace aligns with the PR objective to ensure all metrics are placed within the consistent namespace.


31-47: Excellent documentation for the new error metric.

The documentation clearly distinguishes between program functionality errors (tracked by this metric) and business logic validation errors like "FORBIDDEN" or "RATE_LIMITED". The example usage and relationship to the total verifications metric is well explained.

go/pkg/prometheus/metrics/database.go (3)

15-24: Approve metric rename and namespace addition.

The pluralization from DatabaseOperationLatency to DatabaseOperationsLatency improves naming consistency, and adding the "unkey" namespace aligns with the PR objectives. The documentation remains comprehensive with clear usage examples.


43-52: Approve metric rename and namespace addition.

The pluralization from DatabaseOperationTotal to DatabaseOperationsTotal maintains consistency with the latency metric, and the namespace addition follows the standardization pattern.


54-67: Well-documented error tracking metric.

The new DatabaseOperationsErrorsTotal metric follows the established pattern for error tracking with appropriate labels and clear documentation. The example usage demonstrates proper implementation.

go/pkg/db/replica.go (2)

46-47: LGTM: Consistent metric variable updates.

The metric variable names have been correctly updated to match the pluralized forms defined in go/pkg/prometheus/metrics/database.go. This maintains consistency across the codebase.


72-73: LGTM: All database operations consistently updated.

All database operation methods (PrepareContext, QueryContext, QueryRowContext, Begin) have been consistently updated to use the pluralized metric variable names, maintaining alignment with the metric definitions.

Also applies to: 98-99, 122-123, 146-147

go/pkg/circuitbreaker/lib.go (2)

12-12: LGTM: Import addition for centralized metrics.

Adding the prometheus metrics package import enables usage of the centralized circuit breaker metrics, aligning with the metric consolidation effort.


202-202: Labels Match Updated Metric Definition

The CircuitBreakerRequests metric is defined with labels ["service", "action"], and the call on line 202 of go/pkg/circuitbreaker/lib.go:

metrics.CircuitBreakerRequests.WithLabelValues(cb.config.name, string(cb.state)).Inc()

correctly maps service → cb.config.name and action → cb.state. No changes required.

go/pkg/prometheus/metrics/buffer.go (1)

51-65: Well-designed error tracking metric.

The new BufferErrorsTotal metric follows the established pattern for error tracking across the codebase. The documentation is comprehensive with clear example usage, and the label design (name, error_type) provides appropriate granularity for monitoring buffer-specific errors.

go/pkg/prometheus/metrics/batch.go (1)

88-103: LGTM! Well-structured error tracking metric.

The new BatchItemsProcessedErrorsTotal metric follows established patterns and conventions. The documentation is comprehensive with clear example usage, and the metric structure is consistent with the related BatchItemsProcessedTotal metric.

go/pkg/prometheus/metrics/circuitbreaker.go (1)

22-33: LGTM! Comprehensive error tracking metric.

The new CircuitBreakerErrorsTotal metric is well-structured with appropriate labels for service and error type, following established patterns for error tracking metrics.

go/pkg/prometheus/metrics/cache.go (2)

22-22: LGTM! Consistent namespace standardization.

All existing cache metrics have been properly updated with the "unkey" namespace, maintaining consistency across the metrics system.

Also applies to: 39-39, 57-57, 73-73, 89-89, 105-105


114-160: LGTM! Comprehensive error tracking for cache operations.

The three new error metrics (CacheReadsErrorsTotal, CacheWritesErrorsTotal, CacheRevalidationsErrorsTotal) provide excellent coverage for cache error monitoring. The documentation is thorough with clear examples, and the metric structure is consistent with the related operational metrics.

go/pkg/prometheus/metrics/http.go (2)

60-60: LGTM! Consistent namespace standardization.

All HTTP metrics have been properly updated with the "unkey" namespace while maintaining their existing functionality and label structures.

Also applies to: 77-77, 110-110


86-100: LGTM! Well-designed error tracking metric.

The new HTTPRequestErrorTotal metric uses the same label structure as HTTPRequestTotal, ensuring consistency for error rate calculations. The documentation includes clear usage examples.

go/pkg/prometheus/metrics/chproxy.go (2)

21-21: LGTM! Consistent namespace standardization.

The existing ClickHouse proxy metrics have been properly updated with the "unkey" namespace while maintaining their functionality.

Also applies to: 53-53


30-44: LGTM! Comprehensive error tracking for ClickHouse proxy.

The two new error metrics (ChproxyErrorsTotal and ChproxyRowsErrorsTotal) provide excellent coverage for both general proxy errors and row processing errors. The consistent use of the "endpoint" label across all ClickHouse proxy metrics enables effective error rate analysis.

Also applies to: 62-76

go/pkg/prometheus/metrics/ratelimit.go (2)

22-22: LGTM! Namespace standardization implemented correctly.

The addition of Namespace: "unkey" to all existing rate-limit metrics correctly implements the standardization objective. This ensures consistent metric naming across the monitoring infrastructure.

Also applies to: 37-37, 52-52, 67-67, 82-82, 97-97, 114-114, 130-130, 148-148


157-170: LGTM! Well-implemented error counter with comprehensive documentation.

The new RatelimitRefreshFromOriginErrorsTotal counter follows Prometheus best practices:

  • Proper naming convention with "_total" suffix
  • Comprehensive godoc documentation with example usage
  • Consistent configuration with existing metrics
  • Logical pairing with RatelimitRefreshFromOrigin for complete observability

This enhances monitoring coverage as intended by the PR objectives.

Copy link
Collaborator

@chronark chronark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the error metrics do not apply, or need to be changed
if you actually use them, you'll find out :)

@imeyer imeyer marked this pull request as draft August 4, 2025 20:24
@imeyer
Copy link
Contributor Author

imeyer commented Aug 4, 2025

My plan was going to merge this and then actually instrument them (and thus find out what was useful and what wasn't). Will fix 😄

@imeyer imeyer force-pushed the push-ooyormzzsosr branch from a1ef985 to 4a15c3b Compare August 5, 2025 13:44
@vercel vercel bot temporarily deployed to Preview – dashboard August 5, 2025 13:45 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering August 5, 2025 13:45 Inactive
@imeyer imeyer force-pushed the push-ooyormzzsosr branch from 4a15c3b to 8f6d9ac Compare August 5, 2025 13:50
@vercel vercel bot temporarily deployed to Preview – engineering August 5, 2025 13:51 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard August 5, 2025 13:51 Inactive
Add missing error counter metrics. Ensure everything is in the unkey namespace. Add/update godoc comments
@imeyer imeyer force-pushed the push-ooyormzzsosr branch from 8f6d9ac to 839d8ed Compare August 5, 2025 13:51
@imeyer imeyer marked this pull request as ready for review August 5, 2025 13:52
@vercel vercel bot temporarily deployed to Preview – dashboard August 5, 2025 13:53 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering August 5, 2025 13:53 Inactive
@imeyer
Copy link
Contributor Author

imeyer commented Aug 5, 2025

@chronark I think I addressed everything.

Copy link
Contributor Author

imeyer commented Aug 5, 2025

#3729 (comment)

Copy link
Collaborator

chronark commented Aug 5, 2025

hmm okay
I don’t see any benefit in that tbh, but sure

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a1ef985 and 839d8ed.

📒 Files selected for processing (14)
  • go/internal/services/keys/validation.go (0 hunks)
  • go/pkg/circuitbreaker/lib.go (2 hunks)
  • go/pkg/circuitbreaker/metrics.go (0 hunks)
  • go/pkg/db/replica.go (5 hunks)
  • go/pkg/prometheus/metrics/batch.go (1 hunks)
  • go/pkg/prometheus/metrics/buffer.go (1 hunks)
  • go/pkg/prometheus/metrics/cache.go (6 hunks)
  • go/pkg/prometheus/metrics/chproxy.go (2 hunks)
  • go/pkg/prometheus/metrics/circuitbreaker.go (1 hunks)
  • go/pkg/prometheus/metrics/database.go (2 hunks)
  • go/pkg/prometheus/metrics/http.go (4 hunks)
  • go/pkg/prometheus/metrics/keys.go (2 hunks)
  • go/pkg/prometheus/metrics/panic.go (1 hunks)
  • go/pkg/prometheus/metrics/ratelimit.go (9 hunks)
💤 Files with no reviewable changes (2)
  • go/pkg/circuitbreaker/metrics.go
  • go/internal/services/keys/validation.go
🧰 Additional context used
📓 Path-based instructions (2)
**/*.go

📄 CodeRabbit Inference Engine (CLAUDE.md)

**/*.go: Follow comprehensive documentation guidelines for Go code as described in go/GO_DOCUMENTATION_GUIDELINES.md
Every public function/type in Go code must be documented
Prefer interfaces for testability in Go code
Use AIDEV-* comments for complex/important code in Go services

Files:

  • go/pkg/prometheus/metrics/panic.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/circuitbreaker.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/db/replica.go
  • go/pkg/prometheus/metrics/batch.go
  • go/pkg/prometheus/metrics/buffer.go
  • go/pkg/prometheus/metrics/database.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/ratelimit.go
**/*.{env,js,ts,go}

📄 CodeRabbit Inference Engine (CLAUDE.md)

All environment variables must follow the format: UNKEY_<SERVICE_NAME>_VARNAME

Files:

  • go/pkg/prometheus/metrics/panic.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/circuitbreaker.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/db/replica.go
  • go/pkg/prometheus/metrics/batch.go
  • go/pkg/prometheus/metrics/buffer.go
  • go/pkg/prometheus/metrics/database.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/ratelimit.go
🧠 Learnings (7)
📓 Common learnings
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.
📚 Learning: in the metrics package init function, panicking on initialization errors is acceptable since it occu...
Learnt from: chronark
PR: unkeyed/unkey#2901
File: go/pkg/otel/metrics/metrics.go:11-22
Timestamp: 2025-02-26T15:07:05.646Z
Learning: In the metrics package init function, panicking on initialization errors is acceptable since it occurs during startup and indicates a fundamental issue that should be addressed immediately rather than allowing the application to continue with incorrect metrics setup.

Applied to files:

  • go/pkg/prometheus/metrics/panic.go
📚 Learning: in go packages, variables defined in one file within a package (like `latencybuckets` and `constlabe...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/prometheus/metrics/database.go:29-30
Timestamp: 2025-07-16T10:06:35.397Z
Learning: In Go packages, variables defined in one file within a package (like `latencyBuckets` and `constLabels` in go/pkg/prometheus/metrics/http.go) are accessible from other files in the same package without requiring imports. This is a common pattern for sharing configuration across multiple files within a package.

Applied to files:

  • go/pkg/prometheus/metrics/panic.go
  • go/pkg/circuitbreaker/lib.go
  • go/pkg/prometheus/metrics/keys.go
  • go/pkg/prometheus/metrics/chproxy.go
  • go/pkg/db/replica.go
  • go/pkg/prometheus/metrics/database.go
  • go/pkg/prometheus/metrics/cache.go
  • go/pkg/prometheus/metrics/http.go
  • go/pkg/prometheus/metrics/ratelimit.go
📚 Learning: applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,...
Learnt from: CR
PR: unkeyed/unkey#0
File: go/deploy/CLAUDE.md:0-0
Timestamp: 2025-07-21T18:05:58.236Z
Learning: Applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,h,cpp,java,rb,rs,php,pl,sql} : Use `AIDEV-NOTE:`, `AIDEV-TODO:`, `AIDEV-BUSINESS_RULE:`, or `AIDEV-QUESTION:` (all-caps prefix) as anchor comments aimed at AI and developers.

Applied to files:

  • go/pkg/prometheus/metrics/circuitbreaker.go
📚 Learning: applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,...
Learnt from: CR
PR: unkeyed/unkey#0
File: go/deploy/CLAUDE.md:0-0
Timestamp: 2025-07-21T18:05:58.236Z
Learning: Applies to go/deploy/**/*.{go,js,ts,tsx,py,sh,md,txt,json,yaml,yml,ini,env,conf,html,css,scss,xml,c,h,cpp,java,rb,rs,php,pl,sql} : Do not remove `AIDEV-*`s without explicit human instruction.

Applied to files:

  • go/pkg/prometheus/metrics/circuitbreaker.go
📚 Learning: for debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark queryrowcontex...
Learnt from: Flo4604
PR: unkeyed/unkey#3606
File: go/pkg/db/replica.go:8-11
Timestamp: 2025-07-16T15:38:53.491Z
Learning: For debugging database replica usage in go/pkg/db/replica.go, it's acceptable to mark QueryRowContext operations as "success" even though SQL errors only surface during row.Scan() calls. The timing metrics are the primary concern for debugging replica performance patterns.

Applied to files:

  • go/pkg/db/replica.go
  • go/pkg/prometheus/metrics/database.go
📚 Learning: the `cloudflareratelimiter` type definition in `apps/api/src/pkg/env.ts` should not have its interfa...
Learnt from: chronark
PR: unkeyed/unkey#2544
File: apps/api/src/pkg/env.ts:4-6
Timestamp: 2024-10-23T12:05:31.121Z
Learning: The `cloudflareRatelimiter` type definition in `apps/api/src/pkg/env.ts` should not have its interface changed; it should keep the `limit` method returning `Promise<{ success: boolean }>` without additional error properties.

Applied to files:

  • go/pkg/prometheus/metrics/ratelimit.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test Go API Local / Test
  • GitHub Check: Build / Build
  • GitHub Check: Test Agent Local / test_agent_local
  • GitHub Check: Test API / API Test Local
  • GitHub Check: Test Packages / Test
  • GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (23)
go/pkg/prometheus/metrics/batch.go (1)

88-103: LGTM! Well-structured error tracking metric.

The new BatchItemsProcessedErrorsTotal metric follows the established patterns with proper namespace, clear documentation, and appropriate labeling. The example usage and help text provide good guidance for implementation.

go/pkg/prometheus/metrics/keys.go (3)

22-22: LGTM! Namespace standardization applied.

Adding the "unkey" namespace aligns with the broader metrics standardization effort across the codebase.


16-16: LGTM! Comment updated appropriately.

The comment update to focus solely on API traffic patterns is correct since error rates are now tracked by the separate error metric below.


31-47: LGTM! Clear separation of error types.

The new KeyVerificationErrorsTotal metric properly distinguishes between program functionality errors and key validation outcomes (like "FORBIDDEN"). The documentation clearly explains this distinction and provides good example usage.

go/pkg/prometheus/metrics/database.go (3)

15-15: LGTM! Metric name pluralization improves consistency.

Renaming from singular to plural form (DatabaseOperationLatencyDatabaseOperationsLatency) makes the naming more consistent with other metrics in the codebase.

Also applies to: 24-24


26-26: LGTM! Namespace standardization applied.

Adding the "unkey" namespace aligns with the broader metrics standardization effort across the codebase.

Also applies to: 45-45


54-66: LGTM! Well-designed error tracking metric.

The new DatabaseOperationsErrorsTotal metric follows established patterns and appropriately omits the status label since it's specifically for errors. The documentation and example usage are clear and helpful.

go/pkg/circuitbreaker/lib.go (2)

12-12: LGTM! Import added for centralized metrics.

Adding the import for the centralized prometheus metrics package aligns with the refactoring effort to consolidate metric definitions.


202-202: Metric labels verified and usage is correct

I’ve confirmed that in go/pkg/prometheus/metrics/circuitbreaker.go, CircuitBreakerRequests is defined with labels []string{"service", "action"}, and in go/pkg/circuitbreaker/lib.go we call

metrics.CircuitBreakerRequests.WithLabelValues(cb.config.name, string(cb.state)).Inc()

which maps servicecb.config.name and actioncb.state. No changes required.

go/pkg/db/replica.go (1)

46-47: LGTM! Consistent metric variable renaming.

The renaming from singular to plural form (DatabaseOperationLatencyDatabaseOperationsLatency, DatabaseOperationTotalDatabaseOperationsTotal) is applied consistently across all database operation methods. This aligns with the pluralized metric names in the metrics package.

Also applies to: 72-73, 98-99, 122-123, 146-147

go/pkg/prometheus/metrics/buffer.go (1)

51-65: LGTM! Well-structured error metric addition.

The new BufferErrorsTotal metric follows the established pattern with proper:

  • Namespace consistency ("unkey")
  • Appropriate metric type (Counter for error tracking)
  • Consistent labeling with existing buffer metrics
  • Clear documentation and example usage
go/pkg/prometheus/metrics/circuitbreaker.go (2)

15-15: LGTM! Namespace standardization applied.

The addition of the "unkey" namespace aligns with the broader standardization effort across the metrics package.


20-20: LGTM! Consistent labeling and well-structured error metric.

The label consistency between CircuitBreakerRequests and the new CircuitBreakerErrorsTotal metric using ["service", "action"] is excellent. The new error metric follows the established pattern with proper documentation and namespace consistency.

Also applies to: 27-33

go/pkg/prometheus/metrics/cache.go (2)

22-22: LGTM! Consistent namespace standardization.

The addition of the "unkey" namespace to all existing cache metrics maintains consistency across the metrics package.

Also applies to: 39-39, 57-57, 73-73, 89-89, 105-105


114-128: Approve error metrics definitions

  • The new CacheReadsErrorsTotal and CacheRevalidationsErrorsTotal in go/pkg/prometheus/metrics/cache.go adhere to the existing naming, namespace (“unkey”), subsystem (“cache”), and labeling conventions.
  • Inspection of go/pkg/cache/cache.go shows reads report misses via the ok return value and revalidations log errors (no panics), so these counters will capture meaningful failure events.

Ready to merge.

go/pkg/prometheus/metrics/http.go (2)

60-60: LGTM! Consistent namespace standardization.

The addition of the "unkey" namespace to all HTTP metrics maintains consistency with the broader metrics package standardization effort.

Also applies to: 77-77, 110-110


86-100: LGTM! Well-structured HTTP error metric.

The new HTTPRequestErrorTotal metric excellently complements the existing HTTP metrics with:

  • Consistent labeling using ["method", "path", "status"]
  • Proper namespace and subsystem alignment
  • Clear documentation and usage example
  • Appropriate metric type for error tracking
go/pkg/prometheus/metrics/chproxy.go (4)

21-21: LGTM! Namespace standardization implemented correctly.

The addition of the "unkey" namespace aligns with the PR objectives for consistent metric namespacing across the codebase.


30-44: Excellent implementation of error tracking metric.

The new ChproxyErrorsTotal metric follows established patterns with proper documentation, consistent labeling, and appropriate naming conventions. This enhances observability for ClickHouse proxy error monitoring.


53-53: LGTM! Consistent namespace addition.

This change maintains consistency with the namespace standardization applied to other metrics in this file.


62-76: Well-designed row processing error metric.

The ChproxyRowsErrorsTotal metric properly complements the existing ChproxyRowsTotal metric by providing specific error tracking for row processing operations. The implementation follows all established conventions and documentation standards.

go/pkg/prometheus/metrics/ratelimit.go (2)

22-22: LGTM! Comprehensive namespace standardization across all rate-limit metrics.

All existing rate-limit metrics have been consistently updated to include the "unkey" namespace, covering gauges, counters, counter vectors, and histograms. This aligns perfectly with the PR objectives for metric namespace standardization.

Also applies to: 37-37, 52-52, 67-67, 82-82, 97-97, 114-114, 130-130, 148-148


157-170: Excellent addition of refresh error tracking.

The RatelimitRefreshFromOriginErrorsTotal metric properly complements the existing RatelimitRefreshFromOrigin counter by providing specific error tracking for origin refresh operations. The implementation follows all documentation standards and naming conventions.

Copy link
Collaborator

chronark commented Aug 5, 2025

then I’ll approve

@graphite-app
Copy link

graphite-app bot commented Aug 5, 2025

TV gif. Timmy from Shaun the Sheep blinks and extends 2 thumbs up as a lopsided grin emerges on the side of his face. (Added via Giphy)

Copy link
Contributor Author

imeyer commented Aug 5, 2025

I guess it's kinda like tdd?? here I create the metrics.. then I go attempt to make use of them later to "exercise" what I would expect. e.g. the CacheReadsError metric is likely useless, but I step through each function and check the logic to prove that is the case.

Copy link
Collaborator

chronark commented Aug 5, 2025

yeah sure, I just don't know why you want to lengthen your feedback cycle by doing a PR each time :D

@graphite-app
Copy link

graphite-app bot commented Aug 5, 2025

Graphite Automations

"Post a GIF when PR approved" took an action on this PR • (08/05/25)

1 gif was posted to this PR based on Andreas Thomas's automation.

Copy link
Contributor Author

imeyer commented Aug 5, 2025

I assumed you'd assume I knew wtf I was doing (to an extent) 🤣

Copy link
Collaborator

chronark commented Aug 5, 2025

haha yeah I do, that’s why I approved it and just leave you to cook :)

@imeyer imeyer enabled auto-merge August 5, 2025 14:13
@imeyer imeyer added this pull request to the merge queue Aug 5, 2025
@imeyer
Copy link
Contributor Author

imeyer commented Aug 5, 2025

No good jeopardy gifs exist for this wait...

Copy link
Collaborator

chronark commented Aug 5, 2025

CI is fucked right now
I think everyone agrees, we just need to agree on how we make it better

@imeyer
Copy link
Contributor Author

imeyer commented Aug 5, 2025

We can start with "make it better" 😸

Merged via the queue into main with commit 416f662 Aug 5, 2025
24 of 25 checks passed
@imeyer imeyer deleted the push-ooyormzzsosr branch August 5, 2025 14:49
@coderabbitai coderabbitai bot mentioned this pull request Aug 19, 2025
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants