Skip to content

feat: track credits and latency in clickhouse#3854

Closed
chronark wants to merge 2 commits intomainfrom
08-26-feat_track_credits_and_latency_in_clickhouse
Closed

feat: track credits and latency in clickhouse#3854
chronark wants to merge 2 commits intomainfrom
08-26-feat_track_credits_and_latency_in_clickhouse

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Aug 26, 2025

What does this PR do?

The main goal was to add credit tracking and add latencies, cause we need both in the dashboard.

This PR adds new raw tables for verifications and ratelimits to include the required columns and optimises the table settings.

This only adds tables, no code reads or writes to them yet, however there is a materialized view that populates the new table in real time.
I have already migrated our preview cluster and want to let it bake for a few days before migrating prod.

It also replaces the Goose migration system for ClickHouse with a Docker-based initialization approach. Instead of running migrations with Goose, the ClickHouse schema is now initialized directly during container startup using SQL files mounted into the container.

Key changes:

  • Created a custom ClickHouse Dockerfile that includes schema files
  • Added an initialization script that runs on container startup
  • Reorganized schema files into a more structured database-based layout
  • Added improved schema versions with better partitioning, TTL, and performance optimizations
  • Removed Goose dependencies from workflows and documentation
  • Updated Docker Compose configuration to use the new ClickHouse container

@perkinsjr for your PR we need to merge this one first, and then revert the schema changes in yours, then we should be good to go. I can do that if you want, lmk.

Type of change

  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)

How should this be tested?

  • Start the local environment with make up and verify ClickHouse initializes correctly
  • Run integration tests to ensure ClickHouse functionality works as expected
  • Verify that the new schema structure is created properly in ClickHouse
  • Check that the migration documentation for v1 to v2 tables works as described

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Summary by CodeRabbit

  • New Features

    • Expanded analytics for verifications and rate limits: added latency (avg, p75, p99), spent credits, last-used tracking, and minute/hour/day/month aggregations.
  • Dev Experience

    • Simplified local setup: ClickHouse is built with auto-initialization; separate migrator removed; easier startup.
  • Tests

    • More reliable ClickHouse test startup with readiness wait; builds image before running tests.
  • Documentation

    • Added zero-downtime migration guides; removed outdated ClickHouse test instructions.
  • Chores

    • Streamlined CI by removing Goose steps; deprecated migration-related Makefile targets.

@changeset-bot
Copy link

changeset-bot bot commented Aug 26, 2025

⚠️ No Changeset found

Latest commit: 2f71831

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 26, 2025

📝 Walkthrough

Walkthrough

Removes Goose-based ClickHouse migrations from CI, Makefiles, tooling, and tests. Introduces a custom ClickHouse image and init script that load SQL schemas at container startup. Replaces old migration files under internal/clickhouse with new schema files under go/pkg/clickhouse/schema. Adds v2 tables/materialized views for verifications and ratelimits.

Changes

Cohort / File(s) Summary
CI workflows
.github/workflows/job_test_api_local.yaml, .github/workflows/job_test_unit.yaml
Removed Goose install/migration steps; kept remaining steps unchanged.
Makefiles
Makefile, go/Makefile
Deleted ClickHouse migration targets and retry loop for migrator; docker compose up no longer runs migrations.
Deployment: ClickHouse image and compose
deployment/Dockerfile.clickhouse, deployment/docker-compose.yaml, deployment/init-clickhouse.sh
Added custom ClickHouse Dockerfile copying schemas and init script; compose now builds clickhouse from this Dockerfile; removed clickhouse_migrator; reformatted some commands; init script executes SQLs on first start.
Docs (testing and migrations)
apps/engineering/content/docs/contributing/testing.mdx, go/pkg/clickhouse/schema/databases/001_verifications/MIGRATION_v1_to_v2.md, .../002_ratelimits/MIGRATION_v1_to_v2.md
Removed ClickHouse Tests/Goose docs; added migration guides for v1→v2 (verifications, ratelimits).
New ClickHouse schemas (verifications v2)
go/pkg/clickhouse/schema/databases/001_verifications/*.sql
Added v2 raw table, per-minute/hour/day/month aggregate tables and MVs, temp sync MV from v1, and index on request_id.
New ClickHouse schemas (ratelimits v2)
go/pkg/clickhouse/schema/databases/002_ratelimits/*.sql
Added v2 raw table with latency, per-minute/hour/day/month tables and MVs, last-used table/MV, temp sync MV from v1, and index on request_id.
Remove Goose-based migrations and image
internal/clickhouse/Dockerfile, internal/clickhouse/schema/*.sql, internal/clickhouse/schema/000_README.md
Deleted Goose migrator Dockerfile and all Goose SQL migration files across metrics, verifications, ratelimits, business, billing.
Internal ClickHouse test utilities
internal/clickhouse/src/testutil.ts, internal/clickhouse/package.json
Switched test container image to unkey-clickhouse:latest, added wait strategy and delay, removed Goose invocations; added script to build ClickHouse image; test script builds before running.
Local tooling
tools/local/src/main.ts
Removed clickhouse_migrator from startup sets.
Minor formatting
go/pkg/db/retry.go, go/pkg/retry/retry_test.go
Whitespace/alignment only; no behavioral changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as Developer
  participant DC as docker-compose
  participant CH as ClickHouse (custom image)
  participant Init as init-clickhouse.sh
  participant FS as /opt/clickhouse-schemas

  Dev->>DC: docker compose up clickhouse
  DC->>CH: Build from deployment/Dockerfile.clickhouse
  CH->>Init: Run entrypoint init scripts on first start
  Init->>FS: Iterate database/* directories
  loop For each *.sql
    Init->>CH: clickhouse-client -q "<sql>"
    CH-->>Init: OK/ERR
  end
  Init-->>Dev: Initialization complete
Loading
sequenceDiagram
  autonumber
  actor CI as CI Workflow
  participant UT as Unit Tests
  participant API as API Local Tests

  CI->>UT: Setup Node / Build
  CI--X UT: Install goose (removed)
  CI->>UT: Run tests

  CI->>API: Build / Load MySQL schema
  CI--X API: Install goose (removed)
  CI--X API: goose up ClickHouse (removed)
  CI->>API: Run tests / dump & upload logs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 08-26-feat_track_credits_and_latency_in_clickhouse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel
Copy link

vercel bot commented Aug 26, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
dashboard Ready Ready Preview Comment Aug 26, 2025 5:47pm
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
engineering Ignored Ignored Preview Aug 26, 2025 5:47pm

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 26, 2025

Thank you for following the naming conventions for pull request titles! 🙏

@graphite-app
Copy link

graphite-app bot commented Aug 26, 2025

Graphite Automations

"Notify author when CI fails" took an action on this PR • (08/26/25)

1 teammate was notified to this PR based on Andreas Thomas's automation.

Copy link
Member

@perkinsjr perkinsjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anything insane that would cause issues.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 62

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
go/pkg/db/retry.go (2)

31-42: Optional: compute backoff instead of predefining slice (simpler, same semantics).

Removes the allocation and keeps behavior identical (50ms, 100ms, 200ms). Consider also adding jitter in the future to avoid synchronized retries, but that would require an extra import and is not necessary for this PR.

-    retry.Backoff(func(n int) time.Duration {
-      // Predefined backoff delays: 50ms, 100ms, 200ms
-      delays := []time.Duration{
-        DefaultBackoff,     // 50ms for attempt 1
-        DefaultBackoff * 2, // 100ms for attempt 2
-        DefaultBackoff * 4, // 200ms for attempt 3
-      }
-      if n <= 0 || n > len(delays) {
-        return DefaultBackoff // fallback to base delay
-      }
-      return delays[n-1]
-    }),
+    retry.Backoff(func(n int) time.Duration {
+      if n < 1 {
+        n = 1
+      }
+      // 50ms, 100ms, 200ms
+      return DefaultBackoff << (n - 1)
+    }),

43-56: Do not retry on context cancellation or deadline exceeded.

Currently, canceled or timed-out requests will still be retried, adding latency and wasting capacity. Skip retries for these cases.

     retry.ShouldRetry(func(err error) bool {
+      // Do not retry if the caller canceled or the context deadline elapsed.
+      if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+        return false
+      }
       // Don't retry if resource is not found - this is a valid response
       if IsNotFound(err) {
         return false
       }

Add the necessary imports outside this hunk:

import (
  "context"
  "errors"
  "time"

  "github.com/unkeyed/unkey/go/pkg/retry"
)
go/pkg/retry/retry_test.go (1)

136-187: Optional: add boundary/limit cases to strengthen ShouldRetry coverage

Nice table-driven tests. Consider adding:

  • Attempts cap interaction with ShouldRetry (e.g., Attempts(2) + always-retry policy).
  • Zero-attempts behavior under ShouldRetry (mirror the “invalid attempts” case from TestRetry).

Apply this diff to extend the table with two focused cases:

@@
 	tests := []struct {
 		name          string
 		shouldRetry   func(error) bool
 		errorSequence []error
 		expectedCalls int
 		expectedError error
 		expectedSleep time.Duration
 	}{
+		// Attempts cap should stop even if ShouldRetry would continue
+		{
+			name: "attempts cap limits retries despite retryable errors",
+			shouldRetry: func(err error) bool { return true },
+			// two failures, then would succeed, but we will cap attempts to 2 below
+			errorSequence: []error{retryableError, retryableError, nil},
+			// we will set Attempts(2) for this case in the test body
+			expectedCalls: 2,
+			expectedError: retryableError,
+			expectedSleep: 100 * time.Millisecond,
+		},
+		// Zero attempts should short-circuit even with ShouldRetry policy
+		{
+			name: "zero attempts short-circuits even with ShouldRetry",
+			shouldRetry: func(err error) bool { return true },
+			errorSequence: []error{retryableError},
+			// we will set Attempts(0) for this case in the test body
+			expectedCalls: 0,
+			expectedError: errors.New("invalid attempts"), // will be replaced below
+			expectedSleep: 0,
+		},
 	{
 		name:          "should retry all errors by default",
 		shouldRetry:   nil, // default behavior
 		errorSequence: []error{retryableError, retryableError, nil},
 		expectedCalls: 3,
 		expectedError: nil,
 		expectedSleep: 300 * time.Millisecond, // 100ms + 200ms
 	},
@@
 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
 			var retrier *retry
-			if tt.shouldRetry != nil {
-				retrier = New(ShouldRetry(tt.shouldRetry))
-			} else {
-				retrier = New()
-			}
+			// Per-test overrides for the two new boundary cases
+			switch tt.name {
+			case "attempts cap limits retries despite retryable errors":
+				retrier = New(ShouldRetry(tt.shouldRetry), Attempts(2))
+			case "zero attempts short-circuits even with ShouldRetry":
+				retrier = New(ShouldRetry(tt.shouldRetry), Attempts(0))
+				// err value asserted below; keep a stable sentinel
+				tt.expectedError = retrier.validate() // assume Do() returns validate() error
+			default:
+				if tt.shouldRetry != nil {
+					retrier = New(ShouldRetry(tt.shouldRetry))
+				} else {
+					retrier = New()
+				}
+			}

Note: If retry.validate() isn’t available, replace the sentinel with require.Error(t, err) and drop the equality check for that case. I can adjust to your actual constructor/validation behavior if you point me to the implementation.

go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_minute_mv_v2.sql (1)

5-9: Verify target table schema aligns with GROUP BY keys and column list.

You project identity_id, key_id, tags into the sink. Confirm verifications.key_verifications_per_minute_v2 defines these columns and they are included in ORDER BY appropriately. High-cardinality keys (identity_id, key_id) at per-minute granularity can explode rows; confirm this matches dashboard needs.

Also applies to: 18-25

deployment/docker-compose.yaml (2)

114-138: Healthcheck should validate schema readiness, not just server liveness.

SELECT 1 returns OK even if init SQL hasn’t run yet. This can start dependents before tables exist.

Probe for a known table (e.g., ratelimits.raw_ratelimits_v2):

-          "--query",
-          "SELECT 1",
+          "--query",
+          "SELECT count() FROM system.tables WHERE database = 'ratelimits' AND name = 'raw_ratelimits_v2'",

This better gates service readiness for apiv2/agent.


290-295: Unused volume “clickhouse-keeper”.

No service mounts clickhouse-keeper. Dead volumes cause confusion.

 volumes:
   mysql:
   clickhouse:
-  clickhouse-keeper:
   s3:
   metald-aio-data:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 041591d and 2f71831.

📒 Files selected for processing (88)
  • .github/workflows/job_test_api_local.yaml (0 hunks)
  • .github/workflows/job_test_unit.yaml (0 hunks)
  • Makefile (0 hunks)
  • apps/engineering/content/docs/contributing/testing.mdx (0 hunks)
  • deployment/Dockerfile.clickhouse (1 hunks)
  • deployment/docker-compose.yaml (4 hunks)
  • deployment/init-clickhouse.sh (1 hunks)
  • go/Makefile (0 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_minute_mv_v2.sql (3 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_hour_mv_v2.sql (2 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_day_v4.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_day_mv_v4.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_month_v4.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_month_mv_v4.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/011_temp_sync_v1_to_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/MIGRATION_v1_to_v2.md (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/002_raw_ratelimits_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_minute_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_hour_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_day_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_per_month_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/012_ratelimits_last_used_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/013_temp_sync_v1_to_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md (1 hunks)
  • go/pkg/db/retry.go (1 hunks)
  • go/pkg/retry/retry_test.go (5 hunks)
  • internal/clickhouse/Dockerfile (0 hunks)
  • internal/clickhouse/package.json (1 hunks)
  • internal/clickhouse/schema/000_README.md (0 hunks)
  • internal/clickhouse/schema/001_create_databases.sql (0 hunks)
  • internal/clickhouse/schema/002_create_metrics_raw_api_requests_v1.sql (0 hunks)
  • internal/clickhouse/schema/003_create_verifications_raw_key_verifications_v1.sql (0 hunks)
  • internal/clickhouse/schema/004_create_verifications_key_verifications_per_hour_v1.sql (0 hunks)
  • internal/clickhouse/schema/005_create_verifications_key_verifications_per_day_v1.sql (0 hunks)
  • internal/clickhouse/schema/006_create_verifications_key_verifications_per_month_v1.sql (0 hunks)
  • internal/clickhouse/schema/007_create_verifications_key_verifications_per_hour_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/008_create_verifications_key_verifications_per_day_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/009_create_verifications_key_verifications_per_month_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/010_create_ratelimits_raw_ratelimits_table.sql (0 hunks)
  • internal/clickhouse/schema/011_create_telemetry_raw_sdks_v1.sql (0 hunks)
  • internal/clickhouse/schema/012_create_billing_billable_verifications_per_month_v1.sql (0 hunks)
  • internal/clickhouse/schema/013_create_billing_billable_verifications_per_month_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/014_create_ratelimits_ratelimits_per_minute_v1.sql (0 hunks)
  • internal/clickhouse/schema/015_create_ratelimits_ratelimits_per_hour_v1.sql (0 hunks)
  • internal/clickhouse/schema/016_create_ratelimits_ratelimits_per_day_v1.sql (0 hunks)
  • internal/clickhouse/schema/017_create_ratelimits_ratelimits_per_month_v1.sql (0 hunks)
  • internal/clickhouse/schema/018_create_ratelimits_ratelimits_per_minute_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/019_create_ratelimits_ratelimits_per_hour_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/020_create_ratelimits_ratelimits_per_day_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/021_create_ratelimits_ratelimits_per_month_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/022_create_business_active_workspaces_per_month_v1.sql (0 hunks)
  • internal/clickhouse/schema/023_create_business_active_workspaces_per_month_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/024_create_ratelimits_last_used_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/025_create_billable_verifications_v2.sql (0 hunks)
  • internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql (0 hunks)
  • internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql (0 hunks)
  • internal/clickhouse/schema/028_create_verifications.key_verifications_per_hour_v2.sql (0 hunks)
  • internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql (0 hunks)
  • internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql (0 hunks)
  • internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql (0 hunks)
  • internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql (0 hunks)
  • internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql (0 hunks)
  • internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql (0 hunks)
  • internal/clickhouse/schema/036_create_verifications.key_verifications_per_hour_v3.sql (0 hunks)
  • internal/clickhouse/schema/037_create_verifications.key_verifications_per_hour_mv_v3.sql (0 hunks)
  • internal/clickhouse/schema/038_create_verifications.key_verifications_per_day_v3.sql (0 hunks)
  • internal/clickhouse/schema/039_create_verifications.key_verifications_per_day_mv_v3.sql (0 hunks)
  • internal/clickhouse/schema/040_create_verifications.key_verifications_per_month_v3.sql (0 hunks)
  • internal/clickhouse/schema/041_create_verifications.key_verifications_per_month_mv_v3.sql (0 hunks)
  • internal/clickhouse/schema/042_create_api_requests_per_hour_v1.sql (0 hunks)
  • internal/clickhouse/schema/043_create_api_requests_per_hour_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/044_create_api_requests_per_minute_v1.sql (0 hunks)
  • internal/clickhouse/schema/045_create_api_requests_per_minute_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/046_create_api_requests_per_day_v1.sql (0 hunks)
  • internal/clickhouse/schema/047_create_api_requests_per_day_mv_v1.sql (0 hunks)
  • internal/clickhouse/schema/048_raw_ratelimits_metrics_indexes_v1.sql (0 hunks)
  • internal/clickhouse/schema/049_raw_api_metrics_ratelimit_indexes_v1.sql (0 hunks)
  • internal/clickhouse/schema/050_create_verifications.key_verifications_per_minute_v1.sql (0 hunks)
  • internal/clickhouse/src/testutil.ts (2 hunks)
  • tools/local/src/main.ts (1 hunks)
💤 Files with no reviewable changes (56)
  • apps/engineering/content/docs/contributing/testing.mdx
  • go/Makefile
  • internal/clickhouse/schema/022_create_business_active_workspaces_per_month_v1.sql
  • internal/clickhouse/schema/043_create_api_requests_per_hour_mv_v1.sql
  • internal/clickhouse/Dockerfile
  • internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql
  • internal/clickhouse/schema/000_README.md
  • internal/clickhouse/schema/048_raw_ratelimits_metrics_indexes_v1.sql
  • .github/workflows/job_test_unit.yaml
  • internal/clickhouse/schema/021_create_ratelimits_ratelimits_per_month_mv_v1.sql
  • internal/clickhouse/schema/049_raw_api_metrics_ratelimit_indexes_v1.sql
  • internal/clickhouse/schema/001_create_databases.sql
  • internal/clickhouse/schema/023_create_business_active_workspaces_per_month_mv_v1.sql
  • internal/clickhouse/schema/045_create_api_requests_per_minute_mv_v1.sql
  • internal/clickhouse/schema/017_create_ratelimits_ratelimits_per_month_v1.sql
  • internal/clickhouse/schema/011_create_telemetry_raw_sdks_v1.sql
  • internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql
  • internal/clickhouse/schema/047_create_api_requests_per_day_mv_v1.sql
  • internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql
  • internal/clickhouse/schema/024_create_ratelimits_last_used_mv_v1.sql
  • internal/clickhouse/schema/040_create_verifications.key_verifications_per_month_v3.sql
  • internal/clickhouse/schema/038_create_verifications.key_verifications_per_day_v3.sql
  • internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql
  • internal/clickhouse/schema/041_create_verifications.key_verifications_per_month_mv_v3.sql
  • internal/clickhouse/schema/042_create_api_requests_per_hour_v1.sql
  • internal/clickhouse/schema/019_create_ratelimits_ratelimits_per_hour_mv_v1.sql
  • internal/clickhouse/schema/039_create_verifications.key_verifications_per_day_mv_v3.sql
  • internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql
  • internal/clickhouse/schema/004_create_verifications_key_verifications_per_hour_v1.sql
  • internal/clickhouse/schema/046_create_api_requests_per_day_v1.sql
  • internal/clickhouse/schema/005_create_verifications_key_verifications_per_day_v1.sql
  • internal/clickhouse/schema/016_create_ratelimits_ratelimits_per_day_v1.sql
  • internal/clickhouse/schema/018_create_ratelimits_ratelimits_per_minute_mv_v1.sql
  • internal/clickhouse/schema/012_create_billing_billable_verifications_per_month_v1.sql
  • internal/clickhouse/schema/003_create_verifications_raw_key_verifications_v1.sql
  • internal/clickhouse/schema/044_create_api_requests_per_minute_v1.sql
  • internal/clickhouse/schema/014_create_ratelimits_ratelimits_per_minute_v1.sql
  • internal/clickhouse/schema/050_create_verifications.key_verifications_per_minute_v1.sql
  • internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql
  • internal/clickhouse/schema/009_create_verifications_key_verifications_per_month_mv_v1.sql
  • internal/clickhouse/schema/010_create_ratelimits_raw_ratelimits_table.sql
  • internal/clickhouse/schema/028_create_verifications.key_verifications_per_hour_v2.sql
  • .github/workflows/job_test_api_local.yaml
  • internal/clickhouse/schema/036_create_verifications.key_verifications_per_hour_v3.sql
  • Makefile
  • internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql
  • internal/clickhouse/schema/008_create_verifications_key_verifications_per_day_mv_v1.sql
  • internal/clickhouse/schema/015_create_ratelimits_ratelimits_per_hour_v1.sql
  • internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql
  • internal/clickhouse/schema/025_create_billable_verifications_v2.sql
  • internal/clickhouse/schema/037_create_verifications.key_verifications_per_hour_mv_v3.sql
  • internal/clickhouse/schema/020_create_ratelimits_ratelimits_per_day_mv_v1.sql
  • internal/clickhouse/schema/006_create_verifications_key_verifications_per_month_v1.sql
  • internal/clickhouse/schema/007_create_verifications_key_verifications_per_hour_mv_v1.sql
  • internal/clickhouse/schema/002_create_metrics_raw_api_requests_v1.sql
  • internal/clickhouse/schema/013_create_billing_billable_verifications_per_month_mv_v1.sql
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.go: Follow comprehensive documentation guidelines for Go code as described in go/GO_DOCUMENTATION_GUIDELINES.md
Every public function/type in Go code must be documented
Prefer interfaces for testability in Go code
Use AIDEV-* comments for complex/important code in Go services

Files:

  • go/pkg/db/retry.go
  • go/pkg/retry/retry_test.go
**/*.{env,js,ts,go}

📄 CodeRabbit inference engine (CLAUDE.md)

All environment variables must follow the format: UNKEY_<SERVICE_NAME>_VARNAME

Files:

  • go/pkg/db/retry.go
  • tools/local/src/main.ts
  • go/pkg/retry/retry_test.go
  • internal/clickhouse/src/testutil.ts
**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{js,jsx,ts,tsx}: Use Biome for formatting and linting in TypeScript/JavaScript projects
Prefer named exports over default exports in TypeScript/JavaScript, except for Next.js pages

Files:

  • tools/local/src/main.ts
  • internal/clickhouse/src/testutil.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx}: Follow strict TypeScript configuration
Use Zod for runtime validation in TypeScript projects

Files:

  • tools/local/src/main.ts
  • internal/clickhouse/src/testutil.ts
**/*_test.go

📄 CodeRabbit inference engine (CLAUDE.md)

**/*_test.go: Use table-driven tests in Go
Organize Go integration tests with real dependencies
Organize Go tests by HTTP status codes

Files:

  • go/pkg/retry/retry_test.go
🧠 Learnings (2)
📓 Common learnings
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
PR: unkeyed/unkey#3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.

Applied to files:

  • go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_minute_v2.sql
🧬 Code graph analysis (1)
tools/local/src/main.ts (2)
tools/local/src/db.ts (1)
  • prepareDatabase (13-81)
tools/local/src/cmd/api.ts (1)
  • bootstrapApi (9-52)
🪛 LanguageTool
go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md

[grammar] ~1-~1: Use correct spacing
Context: ...Migration Guide: Raw Ratelimits v1 to v2 This guide explains how to migrate from ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~3-~3: Use correct spacing
Context: ... raw_ratelimits_v2 with zero downtime. ## What's Changed in v2 - ✅ **Latency trac...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~5-~5: Use correct spacing
Context: ... zero downtime. ## What's Changed in v2 - ✅ Latency tracking: New latency co...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~7-~7: There might be a mistake here.
Context: ...cy tracking**: New latency column for ratelimit check performance - ✅ Partitioning:...

(QB_NEW_EN_OTHER)


[grammar] ~7-~7: There might be a mistake here.
Context: ...` column for ratelimit check performance - ✅ Partitioning: Monthly partitions f...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...ter performance and lifecycle management - ✅ TTL: Automatic data cleanup after ...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...*: Automatic data cleanup after 3 months - ✅ Deduplication: Protection against ...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...against duplicate inserts during retries - ✅ Better ordering: Optimized ORDER B...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...timized ORDER BY for time-series queries - ✅ Latency aggregations: avg, p75, p9...

(QB_NEW_EN)


[grammar] ~12-~12: Insert the missing word
Context: ...s - ✅ Latency aggregations: avg, p75, p99 latency metrics in all aggregation ...

(QB_NEW_EN_OTHER_ERROR_IDS_32)


[grammar] ~12-~12: Use correct spacing
Context: ...atency metrics in all aggregation tables ## Migration Steps ### Phase 1: Deploy New...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~14-~14: Use correct spacing
Context: ...l aggregation tables ## Migration Steps ### Phase 1: Deploy New Schema (Zero Downtim...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~16-~16: Use correct spacing
Context: ...ase 1: Deploy New Schema (Zero Downtime) 1. Deploy the new schema files - This cre...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~18-~18: To join two clauses or set off examples, consider using an em dash.
Context: ...ntime) 1. Deploy the new schema files - This creates v2 tables and temporary sync: ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~18-~18: Use correct spacing
Context: ...is creates v2 tables and temporary sync: bash # The auto-migration will create: # - raw_ratelimits_v2 (new table) # - All new aggregation tables (minute/hour/day/month v2) # - temp_sync_v1_to_v2 (materialized view for live sync) 2. Verify deployment - Check that all new...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~26-~26: To join two clauses or set off examples, consider using an em dash.
Context: ...ive sync) 2. **Verify deployment** - Check that all new tables exist: sql ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~26-~26: Use correct spacing
Context: ...ent** - Check that all new tables exist: sql SHOW TABLES FROM ratelimits LIKE '%v2%'; ### Phase 2: Backfill Historical Data **⚠️ ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~31-~31: Use correct spacing
Context: ...` ### Phase 2: Backfill Historical Data ⚠️ Important: The temporary materializ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~33-~33: Use correct spacing
Context: ...storical data while new writes continue. Run these queries manually to backfi...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~35-~35: Use correct spacing
Context: ...* to backfill historical data in chunks: #### Step 1: Find your data range ```sql SELE...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~37-~37: There might be a mistake here.
Context: ...unks: #### Step 1: Find your data range sql SELECT toDateTime(min(time)/1000) as earliest_data, toDateTime(max(time)/1000) as latest_data, count(*) as total_rows FROM ratelimits.raw_ratelimits_v1; #### Step 2: Backfill month by month Replace ...

(QB_NEW_EN_OTHER)


[grammar] ~46-~46: There might be a mistake here.
Context: ...`` #### Step 2: Backfill month by month Replace the date ranges based on your ac...

(QB_NEW_EN_OTHER)


[grammar] ~47-~47: Use correct spacing
Context: ...s based on your actual data from Step 1: sql -- January 2024 (example - adjust dates) INSERT INTO ratelimits.raw_ratelimits_v2 SELECT request_id, time, workspace_id, namespace_id, identifier, passed, 0.0 as latency -- v1 doesn't have this column, default to 0.0 FROM ratelimits.raw_ratelimits_v1 WHERE time >= toUnixTimestamp64Milli(toDateTime('2024-01-01 00:00:00')) AND time < toUnixTimestamp64Milli(toDateTime('2024-02-01 00:00:00')); -- February 2024 INSERT INTO ratelimits.raw_ratelimits_v2 SELECT request_id, time, workspace_id, namespace_id, identifier, passed, 0.0 as latency FROM ratelimits.raw_ratelimits_v1 WHERE time >= toUnixTimestamp64Milli(toDateTime('2024-02-01 00:00:00')) AND time < toUnixTimestamp64Milli(toDateTime('2024-03-01 00:00:00')); -- Continue for each month until you reach current data... #### Step 3: Monitor progress ```sql -- Check...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~81-~81: Use correct spacing
Context: ...ta... #### Step 3: Monitor progresssql -- Check backfill progress SELECT 'v1' as version, toDateTime(min(time)/1000) as earliest, toDateTime(max(time)/1000) as latest, count() as rows FROM ratelimits.raw_ratelimits_v1 UNION ALL SELECT 'v2' as version, toDateTime(min(time)/1000) as earliest, toDateTime(max(time)/1000) as latest, count() as rows FROM ratelimits.raw_ratelimits_v2 ORDER BY version; ``` ### Phase 3: Switch Application to v2 1. **...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~100-~100: Use correct spacing
Context: ... ### Phase 3: Switch Application to v2 1. **Update your application** to write tor...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~102-~102: Use the right verb tense
Context: ...write to raw_ratelimits_v2 instead of v1 2. Include the new field: - latency (Float6...

(QB_NEW_EN_OTHER_ERROR_IDS_13)


[grammar] ~103-~103: Use correct spacing
Context: ...ad of v1 2. Include the new field: - latency (Float64) - ratelimit check latency in m...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~104-~104: Use correct spacing
Context: ...cy in milliseconds with sub-ms precision ### Phase 4: Cleanup (After Confirming v2 Wo...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~106-~106: Use correct spacing
Context: ...e 4: Cleanup (After Confirming v2 Works) 1. Drop the temporary sync view: ```sq...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~108-~108: Use correct spacing
Context: ...s) 1. Drop the temporary sync view: sql DROP VIEW ratelimits.temp_sync_v1_to_v2; 2. Optional: Keep v1 as backup or drop it...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~113-~113: Use correct spacing
Context: ...ptional: Keep v1 as backup** or drop it: sql -- To drop v1 (only after confirming v2 works perfectly) DROP TABLE ratelimits.raw_ratelimits_v1; ## Rollback Plan If you need to rollback: ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~119-~119: Use correct spacing
Context: ..._ratelimits_v1; ``` ## Rollback Plan If you need to rollback: 1. **Switch ap...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~121-~121: Use the right verb tense
Context: ... ``` ## Rollback Plan If you need to rollback: 1. Switch application back to writing to ...

(QB_NEW_EN_OTHER_ERROR_IDS_13)


[grammar] ~124-~124: There might be a mistake here.
Context: ...1` 2. Keep the temp sync view active 3. Investigate issues with v2 4. **Re-run...

(QB_NEW_EN_OTHER)


[grammar] ~125-~125: There might be a mistake here.
Context: ...active 3. Investigate issues with v2 4. Re-run migration when ready ## Import...

(QB_NEW_EN_OTHER)


[grammar] ~126-~126: There might be a mistake here.
Context: ...th v2 4. Re-run migration when ready ## Important Notes - 🔄 Zero downtime:...

(QB_NEW_EN_OTHER)


[grammar] ~128-~128: Use correct spacing
Context: ...gration** when ready ## Important Notes - 🔄 Zero downtime: New data automatic...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~130-~130: There might be a mistake here.
Context: ...cally flows to v2 via the temp sync view - 📊 Historical data: Will have `laten...

(QB_NEW_EN)


[grammar] ~131-~131: There might be a mistake here.
Context: ...ted later when you add latency tracking) - 🗓️ TTL: v2 data will auto-delete af...

(QB_NEW_EN)


[grammar] ~132-~132: There might be a mistake here.
Context: ... v2 data will auto-delete after 3 months - 🔍 Deduplication: Protects against d...

(QB_NEW_EN)


[grammar] ~133-~133: There might be a mistake here.
Context: ...against duplicate inserts during retries - 📈 Performance: Monthly partitions i...

(QB_NEW_EN)


[grammar] ~134-~134: There might be a mistake here.
Context: ...improve query performance on time ranges - ⚡ TDigest quantiles: Efficient appro...

(QB_NEW_EN)


[grammar] ~135-~135: There might be a mistake here.
Context: ...mate percentile calculations for latency ## New Aggregation Metrics Available With ...

(QB_NEW_EN)


[grammar] ~137-~137: Use correct spacing
Context: ...cy ## New Aggregation Metrics Available With v2, all aggregation tables now incl...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~139-~139: There might be a mistake here.
Context: ... v2, all aggregation tables now include: - latency_avg: Average ratelimit check latency - `lat...

(QB_NEW_EN)


[grammar] ~140-~140: There might be a mistake here.
Context: ...s now include: - latency_avg: Average ratelimit check latency - latency_p75: 75th per...

(QB_NEW_EN_OTHER)


[grammar] ~140-~140: There might be a mistake here.
Context: ...cy_avg: Average ratelimit check latency - latency_p75: 75th percentile latency - latency_p99...

(QB_NEW_EN)


[grammar] ~141-~141: There might be a mistake here.
Context: ...- latency_p75: 75th percentile latency - latency_p99: 99th percentile latency Perfect for m...

(QB_NEW_EN)


[grammar] ~142-~142: Use correct spacing
Context: ...- latency_p99: 99th percentile latency Perfect for monitoring ratelimit perform...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~144-~144: There might be a mistake here.
Context: ...centile latency Perfect for monitoring ratelimit performance and identifying bottlenecks...

(QB_NEW_EN_OTHER)


[grammar] ~144-~144: Use correct spacing
Context: ...performance and identifying bottlenecks! ## Questions? - Check the temp sync view i...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~146-~146: Use correct spacing
Context: ... identifying bottlenecks! ## Questions? - Check the temp sync view is active: `SEL...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~148-~148: There might be a mistake here.
Context: ... - Check the temp sync view is active: SELECT * FROM system.tables WHERE name = 'temp_sync_v1_to_v2' - Monitor both tables during migration to ...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...ing migration to ensure data consistency - The aggregation tables (minute/hour/day/...

(QB_NEW_EN_OTHER)


[grammar] ~150-~150: Use a period to end declarative sentences
Context: ...th) will automatically populate from v2 data

(QB_NEW_EN_OTHER_ERROR_IDS_25)

go/pkg/clickhouse/schema/databases/001_verifications/MIGRATION_v1_to_v2.md

[grammar] ~1-~1: Use correct spacing
Context: ...on Guide: Raw Key Verifications v1 to v2 This guide explains how to migrate from ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~3-~3: Use correct spacing
Context: ...ey_verifications_v2` with zero downtime. ## What's Changed in v2 - ✅ **Partitioning...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~5-~5: Use correct spacing
Context: ... zero downtime. ## What's Changed in v2 - ✅ Partitioning: Monthly partitions f...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~7-~7: There might be a mistake here.
Context: ...ter performance and lifecycle management - ✅ TTL: Automatic data cleanup after ...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...*: Automatic data cleanup after 3 months - ✅ Deduplication: Protection against ...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...against duplicate inserts during retries - ✅ Sub-millisecond latency: Float64 f...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...Float64 for precise latency measurements - ✅ Credit tracking: New `spent_credit...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...t tracking**: New spent_credits column - ✅ Better ordering: Optimized ORDER B...

(QB_NEW_EN)


[grammar] ~12-~12: Use correct spacing
Context: ...timized ORDER BY for time-series queries ## Migration Steps ### Phase 1: Deploy New...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~14-~14: Use correct spacing
Context: ... time-series queries ## Migration Steps ### Phase 1: Deploy New Schema (Zero Downtim...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~16-~16: Use correct spacing
Context: ...ase 1: Deploy New Schema (Zero Downtime) 1. Deploy the new schema files - This cre...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~18-~18: To join two clauses or set off examples, consider using an em dash.
Context: ...ntime) 1. Deploy the new schema files - This creates v2 tables and temporary sync: ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~18-~18: Use correct spacing
Context: ...is creates v2 tables and temporary sync: bash # The auto-migration will create: # - raw_key_verifications_v2 (new table) # - All new aggregation tables (minute/hour/day/month v2/v4) # - temp_sync_v1_to_v2 (materialized view for live sync) 2. Verify deployment - Check that all new...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~26-~26: To join two clauses or set off examples, consider using an em dash.
Context: ...ive sync) 2. **Verify deployment** - Check that all new tables exist: sql ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~26-~26: Use correct spacing
Context: ...ent** - Check that all new tables exist: sql SHOW TABLES FROM verifications LIKE '%v2%'; SHOW TABLES FROM verifications LIKE '%v4%'; ### Phase 2: Backfill Historical Data **⚠️ ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~32-~32: Use correct spacing
Context: ...` ### Phase 2: Backfill Historical Data ⚠️ Important: The temporary materializ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~34-~34: Use correct spacing
Context: ...storical data while new writes continue. Run these queries manually to backfi...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~36-~36: Use correct spacing
Context: ...* to backfill historical data in chunks: #### Step 1: Find your data range ```sql SELE...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~38-~38: There might be a mistake here.
Context: ...unks: #### Step 1: Find your data range sql SELECT toDateTime(min(time)/1000) as earliest_data, toDateTime(max(time)/1000) as latest_data, count(*) as total_rows FROM verifications.raw_key_verifications_v1; #### Step 2: Backfill month by month Replace ...

(QB_NEW_EN_OTHER)


[grammar] ~47-~47: There might be a mistake here.
Context: ...`` #### Step 2: Backfill month by month Replace the date ranges based on your ac...

(QB_NEW_EN_OTHER)


[grammar] ~48-~48: Use correct spacing
Context: ...s based on your actual data from Step 1: sql -- January 2024 (example - adjust dates) INSERT INTO verifications.raw_key_verifications_v2 SELECT request_id, time, workspace_id, key_space_id, identity_id, key_id, region, outcome, tags, 0 as spent_credits, -- v1 doesn't have this column, default to 0 0.0 as latency -- v1 doesn't have this column, default to 0.0 FROM verifications.raw_key_verifications_v1 WHERE time >= toUnixTimestamp64Milli(toDateTime('2024-01-01 00:00:00')) AND time < toUnixTimestamp64Milli(toDateTime('2024-02-01 00:00:00')); -- February 2024 INSERT INTO verifications.raw_key_verifications_v2 SELECT request_id, time, workspace_id, key_space_id, identity_id, key_id, region, outcome, tags, 0 as spent_credits, 0.0 as latency FROM verifications.raw_key_verifications_v1 WHERE time >= toUnixTimestamp64Milli(toDateTime('2024-02-01 00:00:00')) AND time < toUnixTimestamp64Milli(toDateTime('2024-03-01 00:00:00')); -- Continue for each month until you reach current data... #### Step 3: Monitor progress ```sql -- Check...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~90-~90: Use correct spacing
Context: ...ta... #### Step 3: Monitor progresssql -- Check backfill progress SELECT 'v1' as version, toDateTime(min(time)/1000) as earliest, toDateTime(max(time)/1000) as latest, count() as rows FROM verifications.raw_key_verifications_v1 UNION ALL SELECT 'v2' as version, toDateTime(min(time)/1000) as earliest, toDateTime(max(time)/1000) as latest, count() as rows FROM verifications.raw_key_verifications_v2 ORDER BY version; ``` ### Phase 3: Switch Application to v2 1. **...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~109-~109: Use correct spacing
Context: ... ### Phase 3: Switch Application to v2 1. **Update your application** to write tor...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~111-~111: Use the right verb tense
Context: ...o raw_key_verifications_v2 instead of v1 2. Include the new fields: - spent_credits ...

(QB_NEW_EN_OTHER_ERROR_IDS_13)


[grammar] ~112-~112: Use correct spacing
Context: ...d of v1 2. Include the new fields: - spent_credits (Int64) - credits consumed by this verif...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~113-~113: Use correct spacing
Context: ... - credits consumed by this verification - latency (Float64) - latency in milliseconds with...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~114-~114: Use correct spacing
Context: ...cy in milliseconds with sub-ms precision ### Phase 4: Cleanup (After Confirming v2 Wo...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~116-~116: Use correct spacing
Context: ...e 4: Cleanup (After Confirming v2 Works) 1. Drop the temporary sync view: ```sq...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~118-~118: Use correct spacing
Context: ...s) 1. Drop the temporary sync view: sql DROP VIEW verifications.temp_sync_v1_to_v2; 2. Optional: Keep v1 as backup or drop it...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~123-~123: Use correct spacing
Context: ...ptional: Keep v1 as backup** or drop it: sql -- To drop v1 (only after confirming v2 works perfectly) DROP TABLE verifications.raw_key_verifications_v1; ## Rollback Plan If you need to rollback: ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~129-~129: Use correct spacing
Context: ...rifications_v1; ``` ## Rollback Plan If you need to rollback: 1. **Switch ap...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~131-~131: Use the right verb tense
Context: ... ``` ## Rollback Plan If you need to rollback: 1. Switch application back to writing to ...

(QB_NEW_EN_OTHER_ERROR_IDS_13)


[grammar] ~134-~134: There might be a mistake here.
Context: ...1` 2. Keep the temp sync view active 3. Investigate issues with v2 4. **Re-run...

(QB_NEW_EN_OTHER)


[grammar] ~135-~135: There might be a mistake here.
Context: ...active 3. Investigate issues with v2 4. Re-run migration when ready ## Import...

(QB_NEW_EN_OTHER)


[grammar] ~136-~136: There might be a mistake here.
Context: ...th v2 4. Re-run migration when ready ## Important Notes - 🔄 Zero downtime:...

(QB_NEW_EN_OTHER)


[grammar] ~138-~138: Use correct spacing
Context: ...gration** when ready ## Important Notes - 🔄 Zero downtime: New data automatic...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~140-~140: There might be a mistake here.
Context: ...cally flows to v2 via the temp sync view - 📊 Historical data: Will have `spent...

(QB_NEW_EN)


[grammar] ~141-~141: There might be a mistake here.
Context: ...cy=0.0` (can be updated later if needed) - 🗓️ TTL: v2 data will auto-delete af...

(QB_NEW_EN)


[grammar] ~142-~142: There might be a mistake here.
Context: ... v2 data will auto-delete after 3 months - 🔍 Deduplication: Protects against d...

(QB_NEW_EN)


[grammar] ~143-~143: There might be a mistake here.
Context: ...against duplicate inserts during retries - 📈 Performance: Monthly partitions i...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ...improve query performance on time ranges ## Questions? - Check the temp sync view i...

(QB_NEW_EN)


[grammar] ~146-~146: Use correct spacing
Context: ...erformance on time ranges ## Questions? - Check the temp sync view is active: `SEL...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~148-~148: There might be a mistake here.
Context: ... - Check the temp sync view is active: SELECT * FROM system.tables WHERE name = 'temp_sync_v1_to_v2' - Monitor both tables during migration to ...

(QB_NEW_EN)


[grammar] ~149-~149: There might be a mistake here.
Context: ...ing migration to ensure data consistency - The aggregation tables (minute/hour/day/...

(QB_NEW_EN_OTHER)


[grammar] ~150-~150: Use a period to end declarative sentences
Context: ...th) will automatically populate from v2 data

(QB_NEW_EN_OTHER_ERROR_IDS_25)

🪛 Checkov (3.2.334)
deployment/Dockerfile.clickhouse

[LOW] 1-8: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[LOW] 1-8: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🔇 Additional comments (22)
go/pkg/db/retry.go (1)

35-35: LGTM: formatting-only change.

Whitespace-only tweak on the 100ms entry. No functional impact.

go/pkg/retry/retry_test.go (3)

149-155: LGTM: formatting-only change; expectations remain consistent with default backoff

The alignment edits are cosmetic. The test semantics (retry-all default with 100ms, 200ms sleeps) are preserved and still validate the intended behavior.


196-201: LGTM: deterministic timing via injected sleeper

Overriding the internal sleeper to accumulate durations keeps tests fast and eliminates flakiness. Good pattern for backoff testing.


212-215: LGTM: strict equality on accumulated sleep is safe here

Since no real time elapses, exact equality on durations is appropriate. No tolerance window needed.

deployment/init-clickhouse.sh (1)

6-13: Manual Verification Required: glob behavior and sorting
The automated test did not yield the expected directory and file listings, so please confirm manually that:

  • shopt -s nullglob is in effect before any glob operations, ensuring globs with no matches expand to empty rather than literal patterns.
  • Database directories under /opt/clickhouse-schemas/ are iterated in natural (numeric) sort order (e.g. 001_…, 002_…, 010_…).
  • SQL files within each database directory are likewise processed via an explicit sort -V, guaranteeing consistent numbering order.

Once verified, update or retain the proposed refactor accordingly.

go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_month_mv_v4.sql (1)

4-17: Guard against column-order coupling between MV and target table

Without an explicit column list, INSERT … SELECT semantics apply: mapping is by position. Any future reorder in the destination will silently corrupt data.

Please confirm the destination table verifications.key_verifications_per_month_v4 has columns in exactly this order:
workspace_id, key_space_id, identity_id, key_id, outcome, count, spent_credits, latency_avg, latency_p75, latency_p99, time, tags.

If not, adjust the SELECT order to match, or (preferred) define a view-on-view pattern that inserts with explicit column list. I can generate a script to parse the DDLs and validate order across all MV/targets if helpful.

Also applies to: 18-26

go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_day_mv_v4.sql (1)

4-17: Verify MV → table column order alignment

Same risk as the monthly MV. Please confirm the destination table column order exactly matches the SELECT order, or update accordingly.

If you want, I can generate a quick repo scan to compare MV SELECT order with table DDLs for all v4 per-day/month/hour artifacts.

Also applies to: 18-26

go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_hour_mv_v2.sql (1)

1-3: MV column order must match target table exactly

Because TO … uses positional mapping, any column order mismatch will silently corrupt data. Please verify the destination verifications.key_verifications_per_hour_v2 order matches this SELECT.

I can provide a quick script to parse both DDLs and compare column sequences if you want an automated check.

Also applies to: 4-17, 18-26

tools/local/src/main.ts (1)

66-73: Seed flow should also wait for ClickHouse (if it touches CH) or explicitly note it doesn’t

If seed interacts with ClickHouse (directly or via services), add waitForClickHouse(); otherwise document that seed only targets PlanetScale.

-      await startContainers(["planetscale", "clickhouse", "agent"]);
+      await startContainers(["planetscale", "clickhouse", "agent"]);
+      await waitForClickHouse();
deployment/Dockerfile.clickhouse (1)

6-8: The above diff looks good, but I need to confirm exactly how your init-clickhouse.sh script references credentials to ensure the ENV vars line up. Could you please paste the first ~50 lines of deployment/init-clickhouse.sh? You can do that with:

sed -n '1,50p' deployment/init-clickhouse.sh

Once I see which env vars it uses (e.g. CLICKHOUSE_ADMIN_USER, CLICKHOUSE_ADMIN_PASSWORD, or others), I can verify that your compose file matches and finalize the hardened Dockerfile snippet.

go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_day_mv_v2.sql (1)

13-19: Confirmed – keep the millisecond conversion

We’ve verified that in ratelimits.raw_ratelimits_v2, the time column is declared as Int64 (unix milliseconds) (see 002_raw_ratelimits_v2.sql:6). Therefore using fromUnixTimestamp64Milli(time) before toStartOfDay is correct and no change is needed.

• File: go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_day_mv_v2.sql
• Lines: 13–19

go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_per_month_mv_v2.sql (1)

13-14: Ignore DateTime conversion suggestion for per-month MV

The time column in ratelimits.raw_ratelimits_v2 is defined as Int64 holding Unix milliseconds and not as a DateTime/DateTime64; thus, wrapping it with fromUnixTimestamp64Milli(time) before toStartOfMonth() is required:

  • In go/pkg/clickhouse/schema/databases/002_ratelimits/002_raw_ratelimits_v2.sql:6, time Int64 CODEC(Delta, LZ4) indicates an integer epoch milli value.

You can leave the existing expression

toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time

as-is.

Likely an incorrect or invalid review comment.

go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_hour_mv_v2.sql (1)

13-13: No change needed: raw.time is Int64 (milliseconds) so conversion is correct

The raw_ratelimits_v2.time column is defined as Int64 representing Unix milliseconds, so using fromUnixTimestamp64Milli(time) (and thus toStartOfHour(...)) is required rather than calling toStartOfHour(time) directly. Thanks for suggesting the check—this conversion is indeed necessary here.

go/pkg/clickhouse/schema/databases/002_ratelimits/002_raw_ratelimits_v2.sql (3)

22-22: Validate ORDER BY against your dominant query patterns.

Current key is (workspace_id, time, namespace_id, identifier). If dashboards mostly filter a single workspace and narrow by time across all namespaces/identifiers, this is good. If they typically slice by (workspace_id, namespace_id, identifier) first and then time, (workspace_id, namespace_id, identifier, time) may improve locality.

Share expected WHERE/GROUP BY patterns so we can pick the optimal sort key. I can run a few CH “EXPLAIN PIPELINE” examples to confirm.


13-16: Type choices look good; ensure writers send correct types.

passed Bool and latency Float64 are appropriate. Confirm writers actually send Bool/UInt8 for passed and millisecond precision for latency to avoid implicit casts.

If any producers send strings like "true"/"false", we should normalize at ingestion.


6-6: Efficient time compression codec.

Int64 CODEC(Delta, LZ4) on millisecond epoch is a solid default for time columns.

go/pkg/clickhouse/schema/databases/001_verifications/MIGRATION_v1_to_v2.md (1)

21-24: “v2/v4” aggregations mention is confusing.

You list “minute/hour/day/month v2/v4”. Are there v4 tables in this PR for verifications? If not, drop “/v4” to avoid operators looking for non-existent tables.

Once confirmed, I’ll update the wording here.

go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_minute_mv_v2.sql (2)

13-19: GROUP BY alias is fine; double-check destination schema compatibility.

Grouping by the time alias is supported. Ensure ratelimits.ratelimits_per_minute_v2 matches column order/types: (workspace_id String, namespace_id String, identifier String, passed UInt64, total UInt64, latency_avg Float64, latency_p75 Float64, latency_p99 Float64, time DateTime) or DateTime64(3) if you prefer ms.

Please run DESCRIBE TABLE ratelimits.ratelimits_per_minute_v2 and confirm types; I’ll adjust if needed.


10-12: Good choice of TDigest for p75/p99.

TDigest handles skewed latency distributions well. No changes needed.

go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md (1)

9-10: Retention wording vs DDL mismatch.

The doc says “3 months” TTL; the raw table uses INTERVAL 100 DAY. Align doc and schema.

Let me know the intended retention and I’ll update either the doc or the DDL.

deployment/docker-compose.yaml (2)

82-82: Verify image tag “redis:8.0” exists for your environment.

Some registries don’t yet publish redis:8.0. If unavailable, pin to a known-good (e.g., redis:7.2 or 7.4).

Confirm the tag resolves in your CI runners and dev machines.


53-56: Compose “deploy.replicas” is ignored outside Swarm.

If you run plain docker-compose, deploy: is ignored. If you intend 3 replicas locally, use docker compose up --scale apiv2=3 or switch to Swarm/Stack.

Confirm target runtime (Compose vs Swarm) so we can document the right scaling approach.

Comment on lines +59 to 63
await startContainers(["planetscale", "clickhouse", "agent"]);

const resources = await prepareDatabase();
!skipEnv && (await bootstrapApi(resources));
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Same readiness concern for API flow

Ensure ClickHouse (and schema init) is ready before bootstrapping the API.

Apply the same waitForClickHouse() call here. Also confirm the API reads CLICKHOUSE_URL from env; current local bootstrap uses CLICKHOUSE_URL (not prefixed UNKEY_), which may violate our env naming guideline for ts/js files. If that guideline applies here, consider UNKEY_CLICKHOUSE_URL.

🤖 Prompt for AI Agents
In tools/local/src/main.ts around lines 59–63, add a waitForClickHouse() call
before calling bootstrapApi to ensure ClickHouse and its schema are fully ready
(and verify prepareDatabase completes schema init before returning); also verify
the API expects the environment variable name—if our TS/JS env-naming guideline
requires UNKEY_ prefixes, change the local bootstrap to set UNKEY_CLICKHOUSE_URL
(and update any local reads/use of CLICKHOUSE_URL to UNKEY_CLICKHOUSE_URL) so
the API reads the correct prefixed variable; ensure the wait call and env-name
change are propagated where prepareDatabase/bootstrapApi interact.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review continued from previous batch...

Comment on lines +3 to +17
time DateTime,
workspace_id String,
key_space_id String,
identity_id String,
key_id String,
outcome LowCardinality(String),
tags Array(String),
count Int64,
spent_credits Int64,
latency_avg Float64,
latency_p75 Float64,
latency_p99 Float64
)
ENGINE = SummingMergeTree()
ORDER BY (workspace_id, time, key_space_id, identity_id, key_id, tags, outcome)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Use AggregatingMergeTree with stateful metrics; avoid summing finalized averages/quantiles.

Same issue as ratelimits: SummingMergeTree will corrupt latency_* in merges. Also switch counters to unsigned and consider a normalized key for tags.

Proposed DDL:

 CREATE TABLE IF NOT EXISTS verifications.key_verifications_per_minute_v2
 (
   time          DateTime,
   workspace_id  String,
   key_space_id  String,
   identity_id   String,
   key_id        String,
   outcome       LowCardinality(String),
-  tags          Array(String),
-  count         Int64,
-  spent_credits Int64,
-  latency_avg   Float64,
-  latency_p75   Float64,
-  latency_p99   Float64
+  tags          Array(String),
+  tags_hash     UInt64 MATERIALIZED cityHash64(arrayStringConcat(arraySort(tags))),
+  count         SimpleAggregateFunction(sum, UInt64),
+  spent_credits SimpleAggregateFunction(sum, UInt64),
+  latency_avg_state AggregateFunction(avg, Float64),
+  latency_p75_state AggregateFunction(quantileTDigest(0.75), Float64),
+  latency_p99_state AggregateFunction(quantileTDigest(0.99), Float64)
 )
-ENGINE = SummingMergeTree()
-ORDER BY (workspace_id, time, key_space_id, identity_id, key_id, tags, outcome)
+ENGINE = AggregatingMergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (workspace_id, time, key_space_id, identity_id, key_id, tags_hash, outcome)
 ;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
time DateTime,
workspace_id String,
key_space_id String,
identity_id String,
key_id String,
outcome LowCardinality(String),
tags Array(String),
count Int64,
spent_credits Int64,
latency_avg Float64,
latency_p75 Float64,
latency_p99 Float64
)
ENGINE = SummingMergeTree()
ORDER BY (workspace_id, time, key_space_id, identity_id, key_id, tags, outcome)
CREATE TABLE IF NOT EXISTS verifications.key_verifications_per_minute_v2
(
time DateTime,
workspace_id String,
key_space_id String,
identity_id String,
key_id String,
outcome LowCardinality(String),
tags Array(String),
tags_hash UInt64 MATERIALIZED cityHash64(arrayStringConcat(arraySort(tags))),
count SimpleAggregateFunction(sum, UInt64),
spent_credits SimpleAggregateFunction(sum, UInt64),
latency_avg_state AggregateFunction(avg, Float64),
latency_p75_state AggregateFunction(quantileTDigest(0.75), Float64),
latency_p99_state AggregateFunction(quantileTDigest(0.99), Float64)
)
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(time)
ORDER BY (workspace_id, time, key_space_id, identity_id, key_id, tags_hash, outcome)
;
🤖 Prompt for AI Agents
In
go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_minute_v2.sql
around lines 3-17, replace the SummingMergeTree with an AggregatingMergeTree and
change types to use stateful aggregates and unsigned counters: convert count and
spent_credits to UInt64, replace latency_avg/latency_p75/latency_p99 with
appropriate AggregateFunction states (e.g., AggregateFunction(avgState, Float64)
and AggregateFunction(quantilesState, Float64) or quantileState variants), stop
storing finalized averages/quantiles as Float64, and adjust the ORDER BY to
avoid putting raw tags there (either remove tags from ORDER BY or use a
normalized tag key) — ensure the DDL stores aggregation states so merges compute
metrics correctly and consider normalizing tags (or moving to a tag mapping
table) before including them in the primary key.

Comment on lines +5 to +20
CREATE MATERIALIZED VIEW IF NOT EXISTS verifications.temp_sync_v1_to_v2
TO verifications.raw_key_verifications_v2
AS
SELECT
request_id,
time,
workspace_id,
key_space_id,
identity_id,
key_id,
region,
outcome,
tags,
0 as spent_credits, -- v1 doesn't have this column, default to 0
0.0 as latency -- v1 doesn't have this column, default to 0.0
FROM verifications.raw_key_verifications_v1; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Same migration guardrails as ratelimits; avoid biasing new latency/credits.

  • Consider a time fence to avoid duplicating historical backfills during the live-sync window.
  • Emitting 0 for spent_credits and 0.0 for latency systematically skews aggregates. Prefer making these Nullable in v2 raw and emitting NULLs from the temp MV, or track a separate “observed” flag to exclude unknowns from averages/quantiles.

Example fence:

 FROM verifications.raw_key_verifications_v1;
+-- If backfilling, constrain the live-sync window:
+-- WHERE time >= toDateTime64('2025-08-26 00:00:00', 3, 'UTC')

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In
go/pkg/clickhouse/schema/databases/001_verifications/011_temp_sync_v1_to_v2.sql
around lines 5-20, the temp MV currently emits 0 and 0.0 for
spent_credits/latency and has no time fence which risks skewing aggregates and
duplicating historical backfills; change the approach so the v2 raw columns are
nullable (or add an explicit observed boolean) and have the MV emit NULLs (or
observed=false) for these unknown fields, and add a time fence condition (e.g.,
WHERE time >= <live_sync_start_ts>) to avoid replaying historical rows during
the live-sync window; ensure downstream schema is updated to accept NULLs (or
the observed flag) and adjust any aggregate queries to ignore NULLs or filter on
observed.

Comment on lines +7 to +12
- ✅ **Partitioning**: Monthly partitions for better performance and lifecycle management
- ✅ **TTL**: Automatic data cleanup after 3 months
- ✅ **Deduplication**: Protection against duplicate inserts during retries
- ✅ **Sub-millisecond latency**: Float64 for precise latency measurements
- ✅ **Credit tracking**: New `spent_credits` column
- ✅ **Better ordering**: Optimized ORDER BY for time-series queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify retention: doc says “3 months” but tables often use “100 DAY”.

Keep the doc precise and consistent with the DDL (use “3 months” only if the table uses INTERVAL 3 MONTH). This avoids misunderstandings in compliance/retention expectations.

I can submit a doc-only patch once you confirm the intended TTL.

🧰 Tools
🪛 LanguageTool

[grammar] ~7-~7: There might be a mistake here.
Context: ...ter performance and lifecycle management - ✅ TTL: Automatic data cleanup after ...

(QB_NEW_EN)


[grammar] ~8-~8: There might be a mistake here.
Context: ...*: Automatic data cleanup after 3 months - ✅ Deduplication: Protection against ...

(QB_NEW_EN)


[grammar] ~9-~9: There might be a mistake here.
Context: ...against duplicate inserts during retries - ✅ Sub-millisecond latency: Float64 f...

(QB_NEW_EN)


[grammar] ~10-~10: There might be a mistake here.
Context: ...Float64 for precise latency measurements - ✅ Credit tracking: New `spent_credit...

(QB_NEW_EN)


[grammar] ~11-~11: There might be a mistake here.
Context: ...t tracking**: New spent_credits column - ✅ Better ordering: Optimized ORDER B...

(QB_NEW_EN)


[grammar] ~12-~12: Use correct spacing
Context: ...timized ORDER BY for time-series queries ## Migration Steps ### Phase 1: Deploy New...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/databases/001_verifications/MIGRATION_v1_to_v2.md
around lines 7 to 12, the bullet saying “TTL: Automatic data cleanup after 3
months” is inconsistent with the DDL which uses a 100 DAY TTL; update the doc to
match the exact TTL used by the table (e.g., replace “3 months” with “100 DAY”
or change the DDL to use INTERVAL 3 MONTH), and be explicit by quoting the exact
INTERVAL expression used in the table definition so retention language is
precise and unambiguous.

Comment on lines +4 to +6
workspace_id String,
namespace_id String,
identifier String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optional storage optimization: LowCardinality for strings and TTL.

For consistency and storage reduction:

-  workspace_id  String,
-  namespace_id  String,
-  identifier    String,
+  workspace_id  LowCardinality(String),
+  namespace_id  LowCardinality(String),
+  identifier    LowCardinality(String),

Add PARTITION BY and optional TTL as needed by retention policy.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
workspace_id String,
namespace_id String,
identifier String,
-- change these three columns to LowCardinality(String) for storage efficiency
workspace_id LowCardinality(String),
namespace_id LowCardinality(String),
identifier LowCardinality(String),

Comment on lines +4 to +14
SELECT
workspace_id,
namespace_id,
identifier,
countIf(passed > 0) as passed,
count(*) as total,
avg(latency) as latency_avg,
quantileTDigest(0.75)(latency) as latency_p75,
quantileTDigest(0.99)(latency) as latency_p99,
toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time
FROM ratelimits.raw_ratelimits_v2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Emit states for month MV as well for correctness.

Keep MV outputs consistent across all grains to prevent merge-time skew.

 SELECT
   workspace_id,
   namespace_id,
   identifier,
-  countIf(passed > 0) as passed,
-  count(*) as total,
-  avg(latency) as latency_avg,
-  quantileTDigest(0.75)(latency) as latency_p75,
-  quantileTDigest(0.99)(latency) as latency_p99,
+  sum(toUInt64(passed > 0)) AS passed,
+  count() AS total,
+  avgState(latency) AS latency_avg_state,
+  quantileTDigestState(0.75)(latency) AS latency_p75_state,
+  quantileTDigestState(0.99)(latency) AS latency_p99_state,
   toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time
 FROM ratelimits.raw_ratelimits_v2
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
workspace_id,
namespace_id,
identifier,
countIf(passed > 0) as passed,
count(*) as total,
avg(latency) as latency_avg,
quantileTDigest(0.75)(latency) as latency_p75,
quantileTDigest(0.99)(latency) as latency_p99,
toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time
FROM ratelimits.raw_ratelimits_v2
SELECT
workspace_id,
namespace_id,
identifier,
sum(toUInt64(passed > 0)) AS passed,
count() AS total,
avgState(latency) AS latency_avg_state,
quantileTDigestState(0.75)(latency) AS latency_p75_state,
quantileTDigestState(0.99)(latency) AS latency_p99_state,
toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time
FROM ratelimits.raw_ratelimits_v2
🤖 Prompt for AI Agents
In
go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_per_month_mv_v2.sql
around lines 4–14, the materialized view for monthly grain only emits aggregated
metrics and is missing the corresponding "state" output columns that other grain
MVs include; add the same state columns to the SELECT (e.g., passed_state,
total_state, latency_avg_state, latency_p75_state, latency_p99_state — actual
names matching the other MVs) and compute/aggregate them the same way those MVs
do, keep the SELECT aggregations and the toStartOfMonth(time) time bucketing
unchanged, and ensure the GROUP BY includes time, workspace_id, namespace_id and
identifier so the monthly MV outputs are consistent with other grains to avoid
merge-time skew.

Comment on lines +5 to +16
CREATE MATERIALIZED VIEW IF NOT EXISTS ratelimits.temp_sync_v1_to_v2
TO ratelimits.raw_ratelimits_v2
AS
SELECT
request_id,
time,
workspace_id,
namespace_id,
identifier,
passed,
0.0 as latency -- v1 doesn't have this column, default to 0.0
FROM ratelimits.raw_ratelimits_v1; No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Guard against double-writes/backfills; consider Nullable latency.

  • If you backfill v2 separately while this MV is active, any concurrent backfill into v1 will also flow into v2, risking duplication. Plan sequencing or add a time fence.
  • Encoding unknown latency as 0.0 biases averages/quantiles. Prefer Nullable(Float64) in v2 raw and emit NULL here, or add a separate boolean “latency_known”.

If you adopt a time fence, one pragmatic approach (replace the timestamp at deploy):

 FROM ratelimits.raw_ratelimits_v1
+WHERE time >= toDateTime64('2025-08-26 00:00:00', 3, 'UTC')
 ;

And/or ensure v2 backfill reads from the authoritative source and v1 backfill is paused while the MV is active.


🏁 Script executed:

#!/bin/bash
# List all SQL migration files in the ratelimits directory
echo "=== Migration files ==="
ls go/pkg/clickhouse/schema/databases/002_ratelimits

# Find where the raw_ratelimits_v2 table is created
echo -e "\n=== CREATE TABLE raw_ratelimits_v2 context ==="
rg -nP "CREATE TABLE.*raw_ratelimits_v2" -C5 go/pkg/clickhouse/schema/databases/002_ratelimits

# Show the column definitions for raw_ratelimits_v2, focusing on the latency column
echo -e "\n=== Column definitions in raw_ratelimits_v2 ==="
rg -nP "raw_ratelimits_v2" -C5 go/pkg/clickhouse/schema/databases/002_ratelimits | rg -P "latency"

Length of output: 3849


🏁 Script executed:

#!/bin/bash
# Inspect raw_ratelimits_v2 schema for latency and time types
echo "=== raw_ratelimits_v2 CREATE TABLE ==="
sed -n '1,50p' go/pkg/clickhouse/schema/databases/002_ratelimits/002_raw_ratelimits_v2.sql

echo -e "\n=== raw_ratelimits_v2 latency & time columns ==="
grep -nE 'latency|time' go/pkg/clickhouse/schema/databases/002_ratelimits/002_raw_ratelimits_v2.sql

# Inspect raw_ratelimits_v1 schema for time type
echo -e "\n=== raw_ratelimits_v1 CREATE TABLE ==="
sed -n '1,50p' go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql

echo -e "\n=== raw_ratelimits_v1 time column ==="
grep -nE 'time' go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql

Length of output: 2559


🏁 Script executed:

#!/bin/bash
# Display the migration guide for v1 to v2 to inspect backfill instructions
sed -n '1,200p' go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md

# Search specifically for any backfill sequencing guidance
grep -nC2 'backfill' go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md

Length of output: 5474


Update MV to prevent backfill duplicates and support nullable latency

Please strengthen the temporary sync view in go/pkg/clickhouse/schema/databases/002_ratelimits/013_temp_sync_v1_to_v2.sql to avoid duplicate records and avoid biasing latency metrics:

• Add a time‐fence filter to the MV’s SELECT so it only ingests new rows after deployment. For example:

CREATE MATERIALIZED VIEW IF NOT EXISTS ratelimits.temp_sync_v1_to_v2
TO ratelimits.raw_ratelimits_v2 AS
SELECT
  request_id,
  time,
  workspace_id,
  namespace_id,
  identifier,
  passed,
  NULL    AS latency          -- switch to NULLable latency
FROM ratelimits.raw_ratelimits_v1
WHERE time >= toUnixTimestamp64Milli(toDateTime('2025-08-26 00:00:00'))
;

• Change raw_ratelimits_v2.latency to Nullable(Float64) (e.g.

ALTER TABLE ratelimits.raw_ratelimits_v2
  MODIFY COLUMN latency Nullable(Float64);

) and have the MV emit NULL for unknown latency instead of 0.0.

• (Optional) Alternatively, if you want to keep latency non-nullable, add a boolean latency_known flag in v2 to distinguish backfilled rows.

These updates will (1) prevent overlapping backfill batches from being re-streamed by the MV and (2) avoid skewing average/quantile calculations with default zeros.

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/databases/002_ratelimits/013_temp_sync_v1_to_v2.sql
around lines 5-16, the materialized view currently emits 0.0 for latency and
will re-ingest historical rows causing duplicate/backfill overlap; update the MV
SELECT to only ingest rows after deployment by adding a WHERE time >=
toUnixTimestamp64Milli(toDateTime('<DEPLOY_TS>')) filter, emit NULL for latency
instead of 0.0, and ensure the target column is nullable by changing
raw_ratelimits_v2.latency to Nullable(Float64) via an ALTER TABLE MODIFY COLUMN
statement (or alternatively add a latency_known boolean if you must keep latency
non-nullable).

Comment on lines +31 to +36
### Phase 2: Backfill Historical Data

**⚠️ Important**: The temporary materialized view (`temp_sync_v1_to_v2`) is now active and syncing all NEW data from v1 to v2. You can safely backfill historical data while new writes continue.

Run these queries **manually** to backfill historical data in chunks:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add a high-water-mark and grace window to the backfill plan.

Same rationale as verifications: define HWM at start, backfill < HWM, then after cutover run a small final catch-up (e.g., +5 minutes) before dropping the temp sync MV.

I can add the snippet to the doc.

🧰 Tools
🪛 LanguageTool

[grammar] ~31-~31: Use correct spacing
Context: ...` ### Phase 2: Backfill Historical Data ⚠️ Important: The temporary materializ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~33-~33: Use correct spacing
Context: ...storical data while new writes continue. Run these queries manually to backfi...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~35-~35: Use correct spacing
Context: ...* to backfill historical data in chunks: #### Step 1: Find your data range ```sql SELE...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

🤖 Prompt for AI Agents
In go/pkg/clickhouse/schema/databases/002_ratelimits/MIGRATION_v1_to_v2.md
around lines 31-36, the backfill instructions lack a high-water-mark (HWM) and a
grace window; update the plan to compute and record an HWM timestamp at the
start, restrict chunked backfills to rows with timestamps strictly less than the
HWM, and after cutover run a final catch-up pass that backfills data up to HWM +
5 minutes (grace window) before dropping the temporary materialized view; add a
short example SQL snippet showing how to set the HWM, use it in WHERE clauses
for chunked backfills, and run the final catch-up with the +5 minute grace
window.

Comment on lines +137 to +145
## New Aggregation Metrics Available

With v2, all aggregation tables now include:
- `latency_avg`: Average ratelimit check latency
- `latency_p75`: 75th percentile latency
- `latency_p99`: 99th percentile latency

Perfect for monitoring ratelimit performance and identifying bottlenecks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Call out minute bucketing explicitly for dashboards.

Since v2 exposes latency_avg/p75/p99 per-minute/hour/day/month, add an example query that reads one bucket to help dashboard authors and validate time bucketing.

I can add a “Sample queries” section with a per-minute and per-hour example.

🧰 Tools
🪛 LanguageTool

[grammar] ~137-~137: Use correct spacing
Context: ...cy ## New Aggregation Metrics Available With v2, all aggregation tables now incl...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~139-~139: There might be a mistake here.
Context: ... v2, all aggregation tables now include: - latency_avg: Average ratelimit check latency - `lat...

(QB_NEW_EN)


[grammar] ~140-~140: There might be a mistake here.
Context: ...s now include: - latency_avg: Average ratelimit check latency - latency_p75: 75th per...

(QB_NEW_EN_OTHER)


[grammar] ~140-~140: There might be a mistake here.
Context: ...cy_avg: Average ratelimit check latency - latency_p75: 75th percentile latency - latency_p99...

(QB_NEW_EN)


[grammar] ~141-~141: There might be a mistake here.
Context: ...- latency_p75: 75th percentile latency - latency_p99: 99th percentile latency Perfect for m...

(QB_NEW_EN)


[grammar] ~142-~142: Use correct spacing
Context: ...- latency_p99: 99th percentile latency Perfect for monitoring ratelimit perform...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~144-~144: There might be a mistake here.
Context: ...centile latency Perfect for monitoring ratelimit performance and identifying bottlenecks...

(QB_NEW_EN_OTHER)


[grammar] ~144-~144: Use correct spacing
Context: ...performance and identifying bottlenecks! ## Questions? - Check the temp sync view i...

(QB_NEW_EN_OTHER_ERROR_IDS_5)

@chronark chronark marked this pull request as draft August 28, 2025 08:28
@chronark chronark closed this Sep 5, 2025
@chronark chronark deleted the 08-26-feat_track_credits_and_latency_in_clickhouse branch November 5, 2025 12:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants