Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
|
📝 WalkthroughWalkthroughAdds v3 ClickHouse aggregation tables (minute/hour/day/month) and their materialized views, extends raw retention TTL, updates app and test references to v3, and introduces migration scripts and DDL to populate and backfill the v3 tables. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Raw as key_verifications_raw_v2
participant MinMV as key_verifications_per_minute_mv_v3
participant Min as key_verifications_per_minute_v3
participant HourMV as key_verifications_per_hour_mv_v3
participant Hour as key_verifications_per_hour_v3
participant DayMV as key_verifications_per_day_mv_v3
participant Day as key_verifications_per_day_v3
participant MonthMV as key_verifications_per_month_mv_v3
participant Month as key_verifications_per_month_v3
participant App as Application
Raw->>MinMV: materialize (count, spent_credits, avgState, quantilesTDigestState) to minute
MinMV->>Min: write minute aggregates
Min->>HourMV: merge-aggregate (avgMergeState, quantilesTDigestMergeState) to hour
HourMV->>Hour: write hour aggregates
Hour->>DayMV: merge-aggregate to day
DayMV->>Day: write day aggregates
Day->>MonthMV: merge-aggregate to month
MonthMV->>Month: write month aggregates
rect rgb(230,245,230)
Note over App,Month: Application and tests now query `*_v3` tables via aliases
App->>Min: query v3 tables
App->>Hour: query v3 tables
App->>Day: query v3 tables
App->>Month: query v3 tables
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
🧰 Additional context used🧠 Learnings (5)📓 Common learnings📚 Learning: 2025-11-21T09:03:42.171ZApplied to files:
📚 Learning: 2025-04-22T14:43:11.724ZApplied to files:
📚 Learning: 2025-04-22T14:40:51.459ZApplied to files:
📚 Learning: 2025-10-30T15:10:52.743ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
Thank you for following the naming conventions for pull request titles! 🙏 |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (10)
tools/migrate/ch_logs.ts (3)
92-105: Consider batching identity resolution for better performance.The current implementation resolves missing
external_idvalues one-by-one within the row processing loop. For large datasets with many missing external IDs, this results in O(n) database queries, which can significantly slow down the migration.Consider batching the identity lookups:
// Collect all missing identity_ids first const missingIdentityIds = rows .filter(row => row.external_id === "" && !identityCache.has(row.identity_id)) .map(row => row.identity_id); // Batch query all missing identities if (missingIdentityIds.length > 0) { const identities = await db.query.identities.findMany({ where: (table, { inArray }) => inArray(table.id, missingIdentityIds), }); // Populate cache for (const identity of identities) { identityCache.set(identity.id, identity.externalId); } } // Now process rows using the cache for (const row of rows) { if (row.external_id === "") { const externalId = identityCache.get(row.identity_id); if (!externalId) { console.error("identity not found", row.identity_id); continue; } row.external_id = externalId; } inserts.push(row); }
40-41: Consider making time range configurable.The start and end timestamps are hardcoded, which reduces the script's flexibility and reusability. Consider making these configurable via environment variables or command-line arguments.
Example using environment variables:
- const start = 1724930749353; - const end = 1738212041696; + const start = Number(process.env.MIGRATION_START_TIME || "1724930749353"); + const end = Number(process.env.MIGRATION_END_TIME || "1738212041696");Or using a command-line argument library like
yargsorcommanderfor better usability.
69-70: Consider parameterized query approach.While the direct interpolation of
tandt + dtis safe here (since they're numbers), using parameterized queries is generally a better practice for maintainability and consistency.Note: Since ClickHouse query API may not support standard parameterization, and the current approach is safe for numeric values, this is a minor style suggestion. The current implementation is acceptable given that
tis a controlled numeric value.go/pkg/clickhouse/key_verifications_test.go (2)
245-260: Avoid range-variable capture in parallel subtestsIn the parallel subtests you call
t.Parallel()insidet.Runwhile closing over the loop variabletable(and in some caseskeyID/identityID). On Go versions before the new range semantics, this can cause all subtests to use the last loop value, making assertions flaky or misleading. The newer external_id tests already usetbl := table/id := identityIDto avoid this; mirroring that pattern here would make these older blocks safer and more consistent.Example pattern within each
for _, table := range ...:for _, table := range []string{...} { tbl := table t.Run(tbl, func(t *testing.T) { t.Parallel() // use tbl instead of table }) }Likewise, introduce
k := keyIDor similar whenkeyIDis also captured in a parallel subtest.Also applies to: 269-279, 290-300, 314-323
328-471: Newexternal_idcoverage is thorough and matches the data modelThe added subtests around
external_id(per-identity storage, filtering, outcome breakdowns, identity–external_id consistency, and credits spent perexternal_id) align with how the data is generated and provide strong end-to-end coverage across all v3 aggregation tables. The use ofid := identityID/tbl := tableinside parallel subtests correctly avoids range-capture pitfalls as well.One minor follow-up you might consider later: if a
billable_verifications_per_month_v3table is introduced, mirroring the existingbilling per workspaceassertion against that table as well would keep billing coverage aligned with the rest of the v3 rollout.go/pkg/clickhouse/user.go (1)
173-186: Default allowed tables correctly extended to include v3 variantsAdding the four
default.key_verifications_per_*_v3tables toDefaultAllowedTablesensures new ClickHouse users can access the new analytics surfaces without losing access to existing v2 tables. Once v2 is fully deprecated, you may want a follow-up pass to remove the v2 entries and tighten the permissions.go/cmd/create-clickhouse-user/main.go (2)
37-41: Update description to reflect v3 analytics tablesThe help text still says “grants SELECT on all v2 analytics tables”, but you now also grant on the v3 key verification tables. I’d reword this block to something like “v2/v3 analytics tables” or just “analytics tables (key verifications, ratelimits, API requests)” to avoid it going stale again.
67-79: v3 tables correctly added to allowlist; consider rollout orderingIncluding the v3 key verification tables in
allowedTableslooks correct and keeps the CLI in sync with the new schema. Just ensure migrations that create these v3 tables are applied before runningcreate-clickhouse-userin environments where they don’t yet exist, or GRANTs insideConfigureUsercould fail.go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql (2)
1-31: Day‑level v3 schema looks consistent; optional index onexternal_idThe v3 day table schema (including
external_idin both columns and ORDER BY) and the 356‑day TTL look consistent with the rest of the v3 pipeline. If you expect frequent filters onexternal_idspecifically (beyondworkspace_id/time/key_space_id), you might consider adding a bloom filter index for it as well, mirroringidentity_id/key_id. Otherwise, this definition looks good.
33-59: Align MV SELECT column order with target table to avoid implicit mapping issuesThe aggregation logic (summing counts/credits and merging latency states) looks correct, but the
SELECTlist currently emitstimelast while the target table hastimeas its first column. To avoid relying on ClickHouse’s implicit column mapping and any type/order surprises, I’d explicitly match the table’s column order in the MV:-CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v3 TO key_verifications_per_day_v3 AS -SELECT - workspace_id, - key_space_id, - identity_id, - external_id, - key_id, - outcome, - tags, - sum(count) as count, - sum(spent_credits) as spent_credits, - avgMergeState(latency_avg) as latency_avg, - quantilesTDigestMergeState(0.75)(latency_p75) as latency_p75, - quantilesTDigestMergeState(0.99)(latency_p99) as latency_p99, - toDate(toStartOfDay(time)) AS time -FROM - key_verifications_per_hour_v3 -GROUP BY - workspace_id, - time, - key_space_id, - identity_id, - external_id, - key_id, - outcome, - tags -; +CREATE MATERIALIZED VIEW key_verifications_per_day_mv_v3 +TO key_verifications_per_day_v3 AS +SELECT + toDate(toStartOfDay(time)) AS time, + workspace_id, + key_space_id, + identity_id, + external_id, + key_id, + outcome, + tags, + sum(count) AS count, + sum(spent_credits) AS spent_credits, + avgMergeState(latency_avg) AS latency_avg, + quantilesTDigestMergeState(0.75)(latency_p75) AS latency_p75, + quantilesTDigestMergeState(0.99)(latency_p99) AS latency_p99 +FROM + key_verifications_per_hour_v3 +GROUP BY + time, + workspace_id, + key_space_id, + identity_id, + external_id, + key_id, + outcome, + tags +;This keeps the MV unambiguous and aligned with the table schema.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go/pkg/clickhouse/migrations/atlas.sumis excluded by!**/*.sum
📒 Files selected for processing (15)
apps/api/src/routes/v1_analytics_getVerifications.ts(1 hunks)apps/engineering/content/docs/architecture/services/analytics.mdx(1 hunks)go/apps/api/routes/v2_analytics_get_verifications/handler.go(1 hunks)go/cmd/create-clickhouse-user/main.go(1 hunks)go/pkg/clickhouse/key_verifications_test.go(14 hunks)go/pkg/clickhouse/migrations/20251120214146.sql(1 hunks)go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql(1 hunks)go/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sql(1 hunks)go/pkg/clickhouse/schema/003_key_verifications_per_hour_v3.sql(1 hunks)go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql(1 hunks)go/pkg/clickhouse/user.go(1 hunks)internal/clickhouse/src/keys/active_keys.ts(1 hunks)internal/clickhouse/src/verification_outcomes_propagate_correctly.test.ts(2 hunks)internal/clickhouse/src/verifications.ts(1 hunks)tools/migrate/ch_logs.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- go/pkg/clickhouse/schema/003_key_verifications_per_hour_v3.sql
- go/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sql
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
📚 Learning: 2025-09-12T13:25:41.849Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
Applied to files:
internal/clickhouse/src/verifications.ts
📚 Learning: 2025-09-25T09:18:38.763Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Applied to files:
internal/clickhouse/src/verifications.ts
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/migrations/20251120214146.sqlgo/pkg/clickhouse/schema/001_key_verifications_raw_v2.sqlgo/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
go/pkg/clickhouse/migrations/20251120214146.sqlgo/pkg/clickhouse/schema/001_key_verifications_raw_v2.sqlgo/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql
📚 Learning: 2025-07-17T14:24:20.403Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3631
File: go/pkg/db/bulk_keyring_insert.sql.go:23-25
Timestamp: 2025-07-17T14:24:20.403Z
Learning: In go/pkg/db/bulk_keyring_insert.sql.go and similar bulk insert generated files, hardcoded zero values for fields like size_approx and size_last_updated_at are intentional and reflect the original SQL query structure, not missing parameters.
Applied to files:
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql
📚 Learning: 2025-08-14T16:25:48.167Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 3785
File: go/apps/api/routes/v2_keys_reroll_key/401_test.go:52-61
Timestamp: 2025-08-14T16:25:48.167Z
Learning: User Flo4604 requested creation of a GitHub issue to track converting all test files to use table-driven test patterns as a broader codebase improvement, following the suggestion made during review of go/apps/api/routes/v2_keys_reroll_key/401_test.go.
Applied to files:
go/pkg/clickhouse/key_verifications_test.go
📚 Learning: 2025-10-30T15:10:52.743Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Applied to files:
go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sqltools/migrate/ch_logs.ts
📚 Learning: 2025-01-31T13:50:45.004Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 2825
File: apps/dashboard/app/(app)/logs/components/table/logs-table.tsx:100-118
Timestamp: 2025-01-31T13:50:45.004Z
Learning: Sensitive data in logs is masked at the API layer (apps/api/src/pkg/middleware/metrics.ts) before being written to Clickhouse. This includes redacting API keys, plaintext values, and authorization headers using regex replacements.
Applied to files:
tools/migrate/ch_logs.ts
🧬 Code graph analysis (1)
tools/migrate/ch_logs.ts (1)
apps/dashboard/lib/db.ts (1)
db(5-26)
🪛 GitHub Actions: autofix.ci
tools/migrate/ch_logs.ts
[error] 35-35: lint/correctness/noUnusedVariables: 'insert' is defined but never used. Unused variable. Suggested fix: rename to '_insert' or apply unsafe fix. Some errors were emitted while applying fixes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: PR title
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Dashboard / Test Dashboard
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test Packages / Test
- GitHub Check: Build / Build
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (8)
go/pkg/clickhouse/key_verifications_test.go (1)
121-133: v3 aggregation table wiring in tests looks consistentAll updated loops now target the v3 aggregation tables (per-minute/hour/day/month), matching the rest of the PR (handlers, TS clients, migrations). Query shapes and expectations remain unchanged, so coverage is preserved while pointing at the new storage.
Also applies to: 135-154, 167-180, 194-205, 222-233, 245-260, 269-279, 290-300, 314-323
go/pkg/clickhouse/schema/001_key_verifications_raw_v2.sql (1)
44-45: TTL change to 90 days on raw_v2 looks fine; ensure migration is appliedBumping the raw_v2 TTL from 1 month to
INTERVAL 90 DAYis consistent with extending raw retention and doesn’t affect ordering or indexes. Given these schema files mirror production, just make sure the corresponding ClickHouse migration has already adjusted the live table TTL to match this definition.internal/clickhouse/src/verification_outcomes_propagate_correctly.test.ts (1)
111-115: Swapping per-day table to v3 keeps the outcome propagation test alignedPointing both the
OPTIMIZE TABLEand direct aggregation query atdefault.key_verifications_per_day_v3matches the rest of the v3 rollout while preserving the test’s intent of validating raw→aggregated outcome propagation.Also applies to: 118-128
go/apps/api/routes/v2_analytics_get_verifications/handler.go (1)
31-45: v2 analytics handler now correctly targets v3 aggregation tablesUpdating
tableAliasesandallowedTablesto thedefault.key_verifications_per_*_v3tables keeps the SQL parser’s aliasing and allowlist in sync with the new schema, while still permitting access todefault.key_verifications_raw_v2. This matches the ClickHouse user defaults and the TS clients’ v3 table usage.internal/clickhouse/src/verifications.ts (1)
261-341: Timeseries intervals now consume v3 tables consistentlyAll entries in
INTERVALShave been updated to use thedefault.key_verifications_per_*_v3tables, so every granularity (minute → quarter) reads from the new aggregation layer while reusing the existing query and fill logic. This keeps the TS client aligned with the Go handlers and migrations that now target v3.apps/api/src/routes/v1_analytics_getVerifications.ts (1)
223-245: Deprecated v1 analytics route correctly upgraded to v3 tablesThe
tablesmapping now targetsdefault.key_verifications_per_{hour,day,month}_v3, so v1 consumers read from the same aggregation layer as v2 and the internal TS clients. The existingfilland date handling logic can remain unchanged as long as the v3 schemas preserve the hour/day/month time column types used previously.internal/clickhouse/src/keys/active_keys.ts (1)
62-142: Active-keys timeseries now reads from v3 verification aggregatesUpdating
ACTIVE_KEYS_INTERVALSto point atdefault.key_verifications_per_*_v3ensures the active-keys charts are driven by the new aggregation tables, matching the rest of the analytics stack without changing query semantics.go/pkg/clickhouse/migrations/20251120214146.sql (1)
1-4: Confirm intended retention split between day vs hour/minute/raw TTLsThe ALTERs look syntactically fine and consistent (day_v3 at 356 days, hour/minute_v3 and raw_v2 at 90 days). Please just double‑check this retention profile is what you want for v3 (especially reducing/aligning raw_v2 to 90d) before running in prod, as it’s hard to undo once TTL starts deleting data.
…unkeyed/unkey into 11-20-feat_add_external_id_to_order_key
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go/pkg/clickhouse/migrations/atlas.sumis excluded by!**/*.sum
📒 Files selected for processing (3)
go/pkg/clickhouse/migrations/20251121081837.sql(1 hunks)go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql(1 hunks)go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
📚 Learning: 2025-09-25T09:18:38.763Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Applied to files:
go/pkg/clickhouse/migrations/20251121081837.sqlgo/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sqlgo/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/migrations/20251121081837.sqlgo/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sqlgo/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test Dashboard / Test Dashboard
- GitHub Check: Test API / API Test Local
- GitHub Check: Build / Build
- GitHub Check: Test Packages / Test
- GitHub Check: Test Go API Local / Test
🔇 Additional comments (3)
go/pkg/clickhouse/schema/019_billable_verifications_per_month_v2.sql (1)
18-18: Verify source tablekey_verifications_per_month_v3exists before this schema is applied.This schema file documents the target state. The materialized view now sources from
key_verifications_per_month_v3instead ofkey_verifications_per_month_v2. Ensure the upstream v3 table is created and populated before this schema definition is applied in the target environment.go/pkg/clickhouse/schema/017_active_workspaces_per_month_v2.sql (1)
13-13: Confirmkey_verifications_per_month_v3source table schema and timeline.This schema documents the target state where the keys-based materialized view sources from
key_verifications_per_month_v3. Verify that v3 table structure (schema, columns) is compatible with the expected aggregation (selectingworkspace_idandtoDate(time)). Note that the ratelimits-based view (line 15) correctly continues sourcing fromratelimits_per_month_v2.go/pkg/clickhouse/migrations/20251121081837.sql (1)
1-8: Verify migration sequence and data continuity during cutover.This migration drops and recreates materialized views to switch their data source from v2 to v3 tables. Ensure the following:
- Pre-migration check: Verify that
key_verifications_per_month_v3exists and is populated before this migration executes.- Cutover timing: Confirm there are no data gaps between when the old views stop ingesting (DROP) and new views start ingesting (CREATE). Depending on ClickHouse transaction handling, a gap could cause missing aggregations.
- Backfill strategy: If v3 data has a different historical extent than v2, clarify whether backfill or recomputation of aggregations is needed.
- Migration order: This migration should execute after all v3 base tables and intermediate aggregations are ready (per PR objectives step 1).
Aligns with PR objective to migrate tables before deploying APIs.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tools/migrate/ch_logs.ts (1)
67-138: Add connection cleanup.The past review requested proper cleanup of database connections in a
finallyblock, but this is still missing. The MySQL connection (conn) and ClickHouse clients should be closed when the script completes or fails.Wrap the main processing loop in try-finally:
try { for (const table of tables) { // ... existing processing loop } } finally { await conn.end(); await rawCH.close(); // Note: Check if ClickHouse wrapper client (ch) needs cleanup }
🧹 Nitpick comments (2)
tools/migrate/ch_logs.ts (2)
49-51: Validate required MySQL environment variables.The script validates
CLICKHOUSE_URLbut not the MySQL connection variables (DATABASE_USERNAME,DATABASE_PASSWORD,DATABASE_HOST). Missing variables will cause cryptic connection errors.Add validation before line 49:
if (!process.env.DATABASE_USERNAME || !process.env.DATABASE_PASSWORD || !process.env.DATABASE_HOST) { throw new Error("DATABASE_USERNAME, DATABASE_PASSWORD, and DATABASE_HOST must be set"); }
123-123: Extract concurrency tuning parameters to constants.The concurrency adjustment uses magic numbers (500, 10, 10, 2) that would be clearer as named constants.
Define constants at the top of the file:
const MAX_CONCURRENCY = 500; const MIN_CONCURRENCY = 10; const CONCURRENCY_INCREMENT_FACTOR = 10; const CONCURRENCY_DECREASE_FACTOR = 2;Then update the adjustments:
concurrency = Math.min(MAX_CONCURRENCY, concurrency + CONCURRENCY_INCREMENT_FACTOR / concurrency); // ... concurrency = Math.max(MIN_CONCURRENCY, concurrency / CONCURRENCY_DECREASE_FACTOR);Also applies to: 127-127
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tools/migrate/ch_logs.ts(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4379
File: apps/engineering/content/docs/architecture/services/analytics.mdx:66-72
Timestamp: 2025-11-21T09:03:42.171Z
Learning: In the unkeyed/unkey repository, when updating ClickHouse analytics table versions, public-facing table aliases (e.g., `key_verifications_per_month_v1`) should only be bumped to newer internal table versions (v2 → v3) if the change affects user-visible behavior or query results. Internal schema changes that don't impact end users (like adding fields to the order key) don't require updating the alias mapping.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
tools/migrate/ch_logs.ts
📚 Learning: 2025-10-21T09:45:47.560Z
Learnt from: ogzhanolguncu
Repo: unkeyed/unkey PR: 4107
File: go/pkg/clickhouse/key_verifications_test.go:20-20
Timestamp: 2025-10-21T09:45:47.560Z
Learning: In ClickHouse tests (e.g., go/pkg/clickhouse/key_verifications_test.go), parallel execution with t.Parallel() is safe when tests use workspaceID-based isolation. Each test generates a unique workspaceID (via uid.New(uid.WorkspacePrefix)), and all database operations are scoped to that workspaceID, providing logical isolation even when tests share a single ClickHouse instance.
Applied to files:
tools/migrate/ch_logs.ts
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test API / API Test Local
- GitHub Check: Test Packages / Test
- GitHub Check: Build / Build
🔇 Additional comments (1)
tools/migrate/ch_logs.ts (1)
7-29: Update script to target v3 tables instead of v2.V3 tables exist in the codebase (created in migrations 20251120154236.sql) with the
external_idfield included, yet this migration script targets only v2 aggregate tables. This script is the only place still referencingkey_verifications_per_minute_v2,per_hour_v2,per_day_v2, andper_month_v2.Since the PR objective is adding external_id support to v3 tables, update the table names in lines 9, 15, 20, 25 from v2 to v3 to align with the current migration target.
⛔ Skipped due to learnings
Learnt from: Flo4604 Repo: unkeyed/unkey PR: 4379 File: apps/engineering/content/docs/architecture/services/analytics.mdx:66-72 Timestamp: 2025-11-21T09:03:42.171Z Learning: In the unkeyed/unkey repository, when updating ClickHouse analytics table versions, public-facing table aliases (e.g., `key_verifications_per_month_v1`) should only be bumped to newer internal table versions (v2 → v3) if the change affects user-visible behavior or query results. Internal schema changes that don't impact end users (like adding fields to the order key) don't require updating the alias mapping.Learnt from: chronark Repo: unkeyed/unkey PR: 3161 File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33 Timestamp: 2025-04-22T14:40:51.459Z Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.Learnt from: chronark Repo: unkeyed/unkey PR: 3161 File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13 Timestamp: 2025-04-22T14:43:11.724Z Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.Learnt from: Flo4604 Repo: unkeyed/unkey PR: 4016 File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10 Timestamp: 2025-09-25T09:18:38.763Z Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.Learnt from: chronark Repo: unkeyed/unkey PR: 3958 File: internal/clickhouse/src/requests.ts:7-8 Timestamp: 2025-09-12T13:25:41.849Z Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
…unkeyed/unkey into 11-20-feat_add_external_id_to_order_key
…unkeyed/unkey into 11-20-feat_add_external_id_to_order_key
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
go/pkg/clickhouse/migrations/20251125075943.sql (1)
21-21: Fix TTL retention period to 365 days.The TTL uses 356 days instead of 365 days for a year, which creates an incorrect retention window. This same issue appears in the schema file and should be corrected for consistency.
Apply this diff:
-PRIMARY KEY (`workspace_id`, `time`, `key_space_id`, `identity_id`, `external_id`, `key_id`, `outcome`, `tags`) ORDER BY (`workspace_id`, `time`, `key_space_id`, `identity_id`, `external_id`, `key_id`, `outcome`, `tags`) PARTITION BY (toStartOfMonth(time)) TTL time + toIntervalDay(356) SETTINGS index_granularity = 8192; +PRIMARY KEY (`workspace_id`, `time`, `key_space_id`, `identity_id`, `external_id`, `key_id`, `outcome`, `tags`) ORDER BY (`workspace_id`, `time`, `key_space_id`, `identity_id`, `external_id`, `key_id`, `outcome`, `tags`) PARTITION BY (toStartOfMonth(time)) TTL time + toIntervalDay(365) SETTINGS index_granularity = 8192;go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql (1)
31-31: Fix TTL retention period to 365 days.The TTL uses 356 days instead of 365 days for a year. This should be corrected to reflect an accurate yearly retention period.
Apply this diff:
-TTL time + INTERVAL 356 DAY DELETE +TTL time + INTERVAL 365 DAY DELETEtools/migrate/ch_logs.ts (2)
27-27: Fix retention calculation to use 365 days per year.The retention calculation uses 356 days instead of 365 days, resulting in an incorrect 4-year retention window.
Apply this diff:
- retention: 4 * 356 * 24 * 60 * 60 * 1000, + retention: 4 * 365 * 24 * 60 * 60 * 1000,
219-229: Critical: SQL injection vulnerability in UPDATE query.The UPDATE query uses string interpolation to insert
externalId,workspace_id,key_space_id, andidentity_iddirectly into the SQL statement. If any of these values contain special characters (e.g., single quotes in "O'Brien"), it could break the query or enable SQL injection attacks.Use parameterized queries if supported by the ClickHouse client:
- await rawCH.exec({ - query: ` - UPDATE ${table} - SET external_id = '${externalId}' - WHERE - workspace_id = '${row.workspace_id}' - AND key_space_id = '${row.key_space_id}' - AND identity_id = '${row.identity_id}' - AND ( external_id = '' OR external_id = 'undefined' ) - `, - }); + await rawCH.exec({ + query: ` + UPDATE ${table} + SET external_id = {externalId:String} + WHERE + workspace_id = {workspaceId:String} + AND key_space_id = {keySpaceId:String} + AND identity_id = {identityId:String} + AND ( external_id = '' OR external_id = 'undefined' ) + `, + query_params: { + externalId: externalId, + workspaceId: row.workspace_id, + keySpaceId: row.key_space_id, + identityId: row.identity_id, + }, + });If parameterized queries are not available, at minimum escape single quotes:
const escape = (str: string) => str.replace(/'/g, "\\'"); await rawCH.exec({ query: ` UPDATE ${table} SET external_id = '${escape(externalId)}' WHERE workspace_id = '${escape(row.workspace_id)}' AND key_space_id = '${escape(row.key_space_id)}' AND identity_id = '${escape(row.identity_id)}' AND ( external_id = '' OR external_id = 'undefined' ) `, });
🧹 Nitpick comments (4)
tools/migrate/ch_copy.ts (4)
63-77: Avoid logging full JSON result sets for every window
console.log(json);will dump the full response payload for each time window, which can explode logs and slow the script for large ranges or many keys.Consider either removing this log or replacing it with a summarized variant (e.g.,
json.data.lengthand maybe a few sample IDs), or gating the full dump behind a debug flag:- console.log(json); + if (process.env.DEBUG_CH_COPY === "1") { + console.log(json); + }
83-127: Concurrency control looks solid; minor optional cleanupsThe semaphore +
Promise.racepattern with AIMD‑style concurrency control is well thought out and should behave nicely under load. A couple of optional polish ideas:
- Extract the concurrency tuning logic into small helpers for readability (e.g.,
onSuccess()/onError()).- Consider rounding
concurrencywhen comparing againstsemaphore.sizeto avoid subtle float vs int comparisons over time (even though it’s unlikely to cause real issues).No blockers here; just nits if you touch this again.
131-131: Ensure your runtime supports top‑levelawaitfor this scriptThe script ends with a top‑level
await Promise.all(...). That’s fine in ESM/modern TS setups (e.g.,moduleset appropriately and usingtsx/Node with top‑level await support), but will fail in older CommonJS or misconfigured builds.Just double‑check this is executed in an environment where top‑level
awaitis allowed (e.g., viatsx tools/migrate/ch_copy.tsor similar).
89-113: Usequery_paramsoption (notparams) with @clickhouse/client-web v1.11.1The suggestion to use named query parameters is sound—the current string interpolation of
keyIdshould be replaced with a typed parameter. However, the proposed diff usesparams, but @clickhouse/client-web v1.11.1 expectsquery_paramsas the option key..query({ query: ` INSERT INTO ${v3} SELECT time, workspace_id, key_space_id, identity_id, external_id, key_id, outcome, tags, count, spent_credits, latency_avg, latency_p75, latency_p99 FROM ${v2} WHERE time >= fromUnixTimestamp64Milli(${t}) AND time < fromUnixTimestamp64Milli(${t + dt}) - AND key_id = '${keyId}' - `, - }) + AND key_id = {keyId:String} + `, + query_params: { keyId }, + })
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go/pkg/clickhouse/migrations/atlas.sumis excluded by!**/*.sum
📒 Files selected for processing (8)
apps/engineering/content/docs/architecture/services/analytics.mdx(1 hunks)go/pkg/clickhouse/migrations/20251125075943.sql(1 hunks)go/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sql(1 hunks)go/pkg/clickhouse/schema/003_key_verifications_per_hour_v3.sql(1 hunks)go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql(1 hunks)go/pkg/clickhouse/schema/005_key_verifications_per_month_v3.sql(1 hunks)tools/migrate/ch_copy.ts(1 hunks)tools/migrate/ch_logs.ts(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- go/pkg/clickhouse/schema/005_key_verifications_per_month_v3.sql
- apps/engineering/content/docs/architecture/services/analytics.mdx
- go/pkg/clickhouse/schema/003_key_verifications_per_hour_v3.sql
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4379
File: apps/engineering/content/docs/architecture/services/analytics.mdx:66-72
Timestamp: 2025-11-21T09:03:42.171Z
Learning: In the unkeyed/unkey repository, when updating ClickHouse analytics table versions, public-facing table aliases (e.g., `key_verifications_per_month_v1`) should only be bumped to newer internal table versions (v2 → v3) if the change affects user-visible behavior or query results. Internal schema changes that don't impact end users (like adding fields to the order key) don't require updating the alias mapping.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
📚 Learning: 2025-11-21T09:03:42.171Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4379
File: apps/engineering/content/docs/architecture/services/analytics.mdx:66-72
Timestamp: 2025-11-21T09:03:42.171Z
Learning: In the unkeyed/unkey repository, when updating ClickHouse analytics table versions, public-facing table aliases (e.g., `key_verifications_per_month_v1`) should only be bumped to newer internal table versions (v2 → v3) if the change affects user-visible behavior or query results. Internal schema changes that don't impact end users (like adding fields to the order key) don't require updating the alias mapping.
Applied to files:
go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sqlgo/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sqltools/migrate/ch_copy.tstools/migrate/ch_logs.tsgo/pkg/clickhouse/migrations/20251125075943.sql
📚 Learning: 2025-04-22T14:40:51.459Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql:31-33
Timestamp: 2025-04-22T14:40:51.459Z
Learning: The ClickHouse table schemas in the codebase mirror the production environment and cannot be modified directly in PRs without careful migration planning.
Applied to files:
go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sqlgo/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sqltools/migrate/ch_copy.tstools/migrate/ch_logs.tsgo/pkg/clickhouse/migrations/20251125075943.sql
📚 Learning: 2025-04-22T14:43:11.724Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3161
File: go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql:1-13
Timestamp: 2025-04-22T14:43:11.724Z
Learning: In the unkey project, the SQL files in clickhouse/schema/databases represent the current production schema and shouldn't be modified directly in PRs. Schema changes require dedicated migration scripts.
Applied to files:
go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sqlgo/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sqlgo/pkg/clickhouse/migrations/20251125075943.sql
📚 Learning: 2025-10-30T15:10:52.743Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4190
File: go/internal/services/keys/verifier.go:51-53
Timestamp: 2025-10-30T15:10:52.743Z
Learning: PR #4190 for unkeyed/unkey is focused solely on database schema and query changes for identity-based credits. It adds IdentityCredits and KeyCredits fields to structs and queries, but does not implement the priority enforcement logic in the usagelimiter. The logic implementation is intentionally deferred to a later PR in the stack.
Applied to files:
go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sqlgo/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sql
📚 Learning: 2025-09-25T09:18:38.763Z
Learnt from: Flo4604
Repo: unkeyed/unkey PR: 4016
File: go/pkg/clickhouse/migrations/20250925091254.sql:10-10
Timestamp: 2025-09-25T09:18:38.763Z
Learning: In ClickHouse v2 table migrations, the region field should not be carried over from v1 because v1 uses Cloudflare region identifiers while v2 uses AWS region identifiers. These are incompatible data formats, so setting region to empty string in materialized view migrations is the correct approach.
Applied to files:
tools/migrate/ch_copy.tstools/migrate/ch_logs.tsgo/pkg/clickhouse/migrations/20251125075943.sql
📚 Learning: 2025-10-21T09:45:47.560Z
Learnt from: ogzhanolguncu
Repo: unkeyed/unkey PR: 4107
File: go/pkg/clickhouse/key_verifications_test.go:20-20
Timestamp: 2025-10-21T09:45:47.560Z
Learning: In ClickHouse tests (e.g., go/pkg/clickhouse/key_verifications_test.go), parallel execution with t.Parallel() is safe when tests use workspaceID-based isolation. Each test generates a unique workspaceID (via uid.New(uid.WorkspacePrefix)), and all database operations are scoped to that workspaceID, providing logical isolation even when tests share a single ClickHouse instance.
Applied to files:
tools/migrate/ch_copy.tstools/migrate/ch_logs.ts
📚 Learning: 2025-02-27T14:35:15.251Z
Learnt from: ogzhanolguncu
Repo: unkeyed/unkey PR: 2918
File: apps/dashboard/lib/trpc/routers/api/overview-api-search.ts:0-0
Timestamp: 2025-02-27T14:35:15.251Z
Learning: When using Drizzle ORM with LIKE queries, use the sql`` tagged template syntax (e.g., sql`${table.name} LIKE ${`%${input.query}%`}`) instead of direct string interpolation to prevent SQL injection attacks.
Applied to files:
tools/migrate/ch_logs.ts
📚 Learning: 2025-09-12T13:25:41.849Z
Learnt from: chronark
Repo: unkeyed/unkey PR: 3958
File: internal/clickhouse/src/requests.ts:7-8
Timestamp: 2025-09-12T13:25:41.849Z
Learning: In the ClickHouse v2 table migration, the location field was renamed from `colo` to `region` across all tables including `default.api_requests_raw_v2`. The zod insert schemas must be updated accordingly to avoid runtime insert failures.
Applied to files:
go/pkg/clickhouse/migrations/20251125075943.sql
🧬 Code graph analysis (2)
tools/migrate/ch_copy.ts (1)
internal/clickhouse/src/testutil.ts (1)
start(27-50)
tools/migrate/ch_logs.ts (1)
apps/dashboard/lib/db.ts (1)
db(5-26)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test Dashboard / Test Dashboard
- GitHub Check: Test Packages / Test
- GitHub Check: Test Go API Local / Test
- GitHub Check: Test API / API Test Local
- GitHub Check: Build / Build
- GitHub Check: Analyze (javascript-typescript)
🔇 Additional comments (11)
go/pkg/clickhouse/migrations/20251125075943.sql (4)
1-1: LGTM: TTL modification on raw table.The 90-day TTL on the raw table aligns with the retention period of the minute and hour v3 tables.
23-81: LGTM: v3 table schemas are consistent.All v3 tables (hour, minute, month) consistently include
external_idin the order key afteridentity_id, with appropriate bloom filter indexes and TTL settings.
83-89: LGTM: v2 materialized views correctly redirected to v3 sources.The v2 materialized views are properly updated to consume data from the new v3 tables while maintaining their v2 target tables. This approach preserves backward compatibility while leveraging the improved v3 schema.
91-97: LGTM: v3 materialized views form correct aggregation chain.The materialized views properly cascade from raw_v2 through minute, hour, day, to month v3 tables, with all aggregation functions (avgMergeState, quantilesTDigestMergeState) correctly applied and
external_idincluded in the GROUP BY clauses.go/pkg/clickhouse/schema/004_key_verifications_per_day_v3.sql (1)
34-60: LGTM: Materialized view correctly aggregates hour-level data.The view properly aggregates from
key_verifications_per_hour_v3with appropriate merge state functions and includesexternal_idin the grouping dimensions.go/pkg/clickhouse/schema/002_key_verifications_per_minute_v3.sql (2)
1-32: LGTM: Minute-level table schema is correct.The table structure properly includes
external_idin the order key, uses appropriate DateTime granularity, and has a reasonable 90-day TTL for minute-level data.
34-60: LGTM: Materialized view correctly aggregates raw events.The view properly uses
avgStateandquantilesTDigestState(not Merge variants) when aggregating from the raw table, and correctly includesexternal_idfrom the source data.tools/migrate/ch_logs.ts (4)
31-96: LGTM: Robust setup with cache handling.The script includes proper environment validation, persistent caching to reduce database lookups, and signal handlers for graceful shutdown with cache persistence.
106-189: LGTM: Well-designed batch processing with adaptive concurrency.The windowed processing approach with adaptive concurrency control (increases on success, decreases on failure) is a robust pattern for managing load on both ClickHouse and MySQL during migration.
194-218: LGTM: Identity lookup with effective caching.The caching strategy effectively reduces database load by storing both found and deleted (null) identities, preventing repeated lookups for missing records.
232-232: LGTM: Clean exit after migration completes.The explicit exit ensures the process terminates cleanly after all tables are processed and the cache is saved.
perkinsjr
left a comment
There was a problem hiding this comment.
I don't see anything that would be an issue
Graphite Automations"Post a GIF when PR approved" took an action on this PR • (11/25/25)1 gif was posted to this PR based on Andreas Thomas's automation. |


What does this PR do?
v3 tables just add the external_id to the order key
I will first
Type of change
How should this be tested?
Checklist
Required
pnpm buildpnpm fmtmake fmton/godirectoryconsole.logsgit pull origin main