Skip to content

fix: do not send 4x as many rows to clickhouse#3161

Merged
chronark merged 3 commits intomainfrom
ratelimit-deduplication
Apr 24, 2025
Merged

fix: do not send 4x as many rows to clickhouse#3161
chronark merged 3 commits intomainfrom
ratelimit-deduplication

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Apr 22, 2025

What does this PR do?

We now run clickhouse in integration tests and assert the correct rows are inserted.

As an experiment, this also restructures the way we define clickhouse tables and how migrations would run.
We're trying to with test containers first, it has no impact on production yet.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Chore (refactoring code, technical debt, workflow improvements)
  • Enhancement (small improvements)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How should this be tested?

  • Test A
  • Test B

Checklist

Required

  • Filled out the "How to test" section in this PR
  • Read Contributing Guide
  • Self-reviewed my own code
  • Commented on my code in hard-to-understand areas
  • Ran pnpm build
  • Ran pnpm fmt
  • Checked for warnings, there are none
  • Removed all console.logs
  • Merged the latest changes from main onto my branch with git pull origin main
  • My changes don't cause any responsiveness issues

Appreciated

  • If a UI change was made: Added a screen recording or screenshots to this PR
  • Updated the Unkey Docs if changes were necessary

Summary by CodeRabbit

  • New Features

    • Introduced comprehensive ClickHouse database support, including schema migrations for verifications, ratelimits, metrics, billing, and business analytics.
    • Added new aggregated and raw data tables, as well as materialized views for detailed analytics and reporting.
    • Embedded all SQL migration files for ClickHouse, enabling seamless initialization and upgrades.
  • Improvements

    • Enhanced test harness and integration tests to use real ClickHouse containers and validate data persistence.
    • Increased concurrency for processing ratelimit events.
    • Refined API and test utilities to support ClickHouse alongside MySQL.
  • Bug Fixes

    • Improved error handling and environment setup for database containers.
  • Documentation

    • Added detailed README for ClickHouse schema naming conventions.
  • Dependency Updates

    • Upgraded several dependencies to their latest versions for improved stability and features.
  • Refactor

    • Streamlined configuration and initialization logic for database containers and batch processing.

We now run clickhouse in integration tests and assert the correct rows
are inserted.
@changeset-bot
Copy link

changeset-bot bot commented Apr 22, 2025

⚠️ No Changeset found

Latest commit: 43ebe17

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Apr 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 24, 2025 5:37am
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 24, 2025 5:37am

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 22, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

This change introduces comprehensive ClickHouse integration across the Go codebase, including infrastructure, schema, test harness, and API layers. It adds embedded SQL migrations for ClickHouse, creates new databases and tables for verifications, ratelimits, metrics, billing, and business analytics, and implements materialized views for efficient aggregation. The test harness and container orchestration now support ClickHouse alongside MySQL, with updated configuration and environment handling. New generic query utilities and direct ClickHouse connection accessors are provided. Integration and route tests are refactored to validate ClickHouse persistence, and batch processing logic is improved for concurrency and encapsulation.

Changes

File(s) / Path(s) Change Summary
go/apps/api/integration/harness.go, go/pkg/testutil/http.go Test harness and HTTP test utilities now support ClickHouse: ClickHouse containers are started, migrations run from embedded files, and real ClickHouse clients are initialized and injected into the harness.
go/apps/api/integration/multi_node_ratelimiting/run.go Integration test enhanced to verify rate limiting data and metrics persistence in ClickHouse after load tests, with new aggregation and assertion logic.
go/apps/api/routes/v2_ratelimit_limit/200_test.go Test suite refactored: namespace creation extracted to a helper, and a new subtest validates that ratelimit events are flushed to ClickHouse.
go/cmd/api/main.go Redundant type conversions removed for configuration fields.
go/go.mod Dependency versions updated for several direct and indirect packages.
go/pkg/batch/process.go Batch state is now managed locally within the processing goroutine instead of as a struct field, improving encapsulation.
go/pkg/clickhouse/client.go, go/pkg/clickhouse/interface.go, go/pkg/clickhouse/noop.go ClickHouse client and interface updated: logging on initialization, increased batch processor concurrency, and new public Conn() method for direct connection access.
go/pkg/clickhouse/select.go New generic Select[T] utility added for querying ClickHouse and unmarshaling results into typed slices.
go/pkg/clickhouse/schema/embed.go Embedded filesystem introduced to provide access to all SQL migration files for ClickHouse schemas.
go/pkg/clickhouse/schema/databases/... Numerous new SQL migration scripts added, creating databases, tables, and materialized views for verifications, ratelimits, metrics, billing, and business domains in ClickHouse.
go/pkg/testutil/containers/api.go, go/pkg/testutil/containers/clickhouse.go Container orchestration refactored: API and ClickHouse startup now use configuration structs, embedded migrations, and improved DSN/network handling.
go/pkg/clickhouse/schema/databases/000_README.md Documentation added specifying naming conventions for ClickHouse tables and views.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Harness
    participant MySQL as MySQL Container
    participant CH as ClickHouse Container
    participant API as API Cluster

    Test->>CH: Start ClickHouse container & run embedded migrations
    Note right of CH: Embedded SQL files executed
    Test->>CH: Initialize ClickHouse client
    Test->>MySQL: Start MySQL container
    Test->>MySQL: Initialize MySQL client
    Test->>API: Start API cluster with MySQL & ClickHouse DSNs
    API->>CH: Persist and query ratelimit, metrics, etc.
    API->>MySQL: Persist and query relational data
    Test->>CH: Query ClickHouse for test assertions
Loading

Possibly related PRs

Suggested reviewers

  • perkinsjr
  • mcstepp
  • MichaelUnkey
  • ogzhanolguncu

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

Error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2
Failed executing command with error: you are using a configuration file for golangci-lint v2 with golangci-lint v1: please use golangci-lint v2


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 58bf585 and 43ebe17.

⛔ Files ignored due to path filters (1)
  • go/go.sum is excluded by !**/*.sum
📒 Files selected for processing (1)
  • go/go.mod (5 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vercel vercel bot temporarily deployed to Preview – dashboard April 22, 2025 14:28 Inactive
@vercel vercel bot temporarily deployed to Preview – engineering April 22, 2025 14:29 Inactive
Copy link
Contributor

niceeee. we have a proper namespacing/folder structure for schemas :praise-the-unkey:

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (1)
go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql (1)

1-12: ⚠️ Potential issue

Critical: Table column type mismatch for AggregatingMergeTree
The AggregatingMergeTree engine requires the column storing an aggregate state (e.g., from maxSimpleState) to be of type AggregateFunction. Defining time Int64 will fail when the materialized view tries to INSERT a state.

Please update the time column to the correct state type. For example:

 CREATE TABLE IF NOT EXISTS ratelimits.ratelimits_last_used_v1
 (
-  time          Int64,
+  time          AggregateFunction(maxSimpleState, Int64),
   workspace_id  String,
   namespace_id  String,
   identifier    String
 )
 ENGINE = AggregatingMergeTree()
 ORDER BY (workspace_id, namespace_id, time, identifier);
🧹 Nitpick comments (40)
go/pkg/clickhouse/schema/databases/005_business/001_database.sql (1)

1-2: Unify SQL formatting for consistency.

Here the semicolon is placed on its own line, but other database migrations (e.g., verifications, ratelimits, metrics) include the semicolon on the same line. To maintain consistency, consider combining into a single statement:

CREATE DATABASE IF NOT EXISTS business;
go/pkg/clickhouse/noop.go (1)

15-16: Remove duplicate interface assertion

The interface assertion var _ Bufferer = (*noop)(nil) appears twice consecutively on lines 15 and 16.

var _ Bufferer = (*noop)(nil)
-var _ Bufferer = (*noop)(nil)
go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql (1)

1-8: Consider optimizing table structure for query patterns

The table definition looks good for basic monthly workspace tracking. However, consider the following optimizations:

  1. No explicit PRIMARY KEY is defined (it defaults to the ORDER BY key). If you frequently query by workspace_id, consider adding a secondary index or changing the sorting key.
  2. There's no TTL (Time-To-Live) specified. If this data doesn't need to be retained indefinitely, adding a TTL could help manage storage.
  3. Consider specifying compression options if this table will store significant data.
CREATE TABLE IF NOT EXISTS business.active_workspaces_per_month_v1
(
  time          Date,
  workspace_id  String
)
ENGINE = MergeTree()
ORDER BY (time)
+PRIMARY KEY (time)
+SETTINGS index_granularity = 8192
;
go/pkg/clickhouse/client.go (2)

66-68: Be cautious with connection URL logging

Adding connection information logging is good for debugging, but be aware that the URL might contain sensitive information such as credentials. Consider masking sensitive parts of the URL in production environments.

-config.Logger.Info("initializing clickhouse client",
-	"url", config.URL,
-)
+// Parse the URL to mask credentials if present
+parsedURL, err := url.Parse(config.URL)
+if err == nil && parsedURL.User != nil {
+	parsedURL.User = url.UserPassword(parsedURL.User.Username(), "******")
+}
+config.Logger.Info("initializing clickhouse client",
+	"url", parsedURL.String(),
+)

254-256: Exposing raw connection increases flexibility but requires caution

Adding the Conn() method provides flexibility for direct database access, which is useful for custom queries and testing. However, it also allows bypassing the client's abstractions and safeguards.

Consider documenting usage guidelines in the method comment to ensure proper use:

+// Conn returns the underlying ClickHouse connection.
+// Use with caution as it bypasses the client's batch processing and other safeguards.
+// Primarily intended for direct queries and testing purposes.
func (c *clickhouse) Conn() ch.Conn {
	return c.conn
}
go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql (1)

1-8: Well-structured materialized view for workspace analytics

This materialized view effectively aggregates data from the ratelimits table into the business analytics table. The implementation follows good practices for maintaining derived data automatically.

One minor style suggestion: consider removing the blank line at the beginning of the file for consistency with other SQL files.

-

CREATE MATERIALIZED VIEW IF NOT EXISTS business.active_workspaces_ratelimits_per_month_mv_v1
TO business.active_workspaces_per_month_v1
AS
SELECT
  workspace_id, toDate(time) as time
FROM ratelimits.ratelimits_per_month_v1
;
go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql (1)

1-23: Verify SQL dialect support for alias in GROUP BY or use full expression
ClickHouse may not allow grouping by the time alias; it’s safer to use the full toStartOfHour(fromUnixTimestamp64Milli(time)) expression. Consider this optional refactor to avoid runtime errors:

 GROUP BY
   workspace_id,
   key_space_id,
   identity_id,
   key_id,
   outcome,
-  time,
+  toStartOfHour(fromUnixTimestamp64Milli(time)),
   tags
;

Please confirm your ClickHouse version supports alias-based grouping or apply the above change.

go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql (1)

2-15: Use explicit expressions instead of alias in GROUP BY for compatibility
Like above, grouping on year and month aliases may not be accepted in all ClickHouse versions. To ensure portability, reference the functions directly:

 GROUP BY
     workspace_id,
-    year,
-    month
+    toYear(time),
+    toMonth(time)
 ;

This guarantees the view compiles across ClickHouse releases.

go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql (1)

1-14: Consider adding a PARTITION BY clause for better data lifecycle management
Currently the table is ordered by (workspace_id, namespace_id, time, identifier), but lacks partitioning. Introducing a PARTITION BY toYYYYMMDD(time) (or monthly) can greatly improve merge performance and enable TTL-based cleanup.

 ENGINE = SummingMergeTree()
+PARTITION BY toYYYYMMDD(time)
 ORDER BY (workspace_id, namespace_id, time, identifier)

Partitioning by day (or month) will help manage retention and speed up queries against date ranges.

go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql (1)

1-22: Verify column mappings and consider aliasing for clarity

This MV writes into verifications.key_verifications_per_month_v2. Please confirm that the SELECT output column order and names align exactly with the target table’s schema (e.g., (workspace_id, key_space_id, identity_id, key_id, outcome, count, time, tags)). Additionally, consider renaming the count alias to a more descriptive name (e.g., verification_count) to avoid shadowing the SQL function and improve readability.

go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql (1)

1-15: Consider renaming count alias for clarity

The view currently uses count(*) AS count, which shadows the built‑in SQL function and can be confusing downstream. It’s recommended to use a descriptive alias such as billable_count or verification_count to clearly convey its intent in the billing schema.

go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql (1)

1-14: Consider adding partitioning to the SummingMergeTree
Right now the table is unpartitioned, which could lead to a large number of partitions as it grows. For monthly aggregates, adding a PARTITION BY toYYYYMM(time) (or equivalent) can improve query performance and data lifecycle management.

go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql (1)

1-18: Optional: Partitioning recommendation for materialized view target
Although this is a view, ensure that the target table ratelimits_per_month_v1 (defined in 008_ratelimits_per_month_v1.sql) uses partitioning by month to optimize storage and queries.

go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql (1)

3-4: Consider using DateTime instead of Int64 for time

Storing time as Int64 unix milliseconds works, but consider using ClickHouse's native DateTime/DateTime64 type. This would enable more readable queries and direct use of ClickHouse's time functions without conversion.

-    -- unix milli
-    time          Int64,
+    time          DateTime64(3),  -- millisecond precision
go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql (3)

4-11: Avoid shadowing input column and improve clarity

Alias time here shadows the original time field from raw_key_verifications_v1. Consider renaming the alias to something like day_start or date_bucket to clearly signal that this is the bucketed timestamp.


10-11: Rename aggregated count column for clarity

Using count as both the function and the output column name can be confusing. Renaming the alias to verification_count (or similar) will make the schema more self-documenting.


1-20: Consider backfilling existing data with POPULATE

By default, ClickHouse materialized views only capture new inserts. If you need this view to aggregate historical rows already in raw_key_verifications_v1, you should use CREATE MATERIALIZED VIEW … POPULATE or run a one‑off backfill.

go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql (3)

1-11: Add PARTITION BY to manage large partitions

Currently this table will use the primary key as the partitioning key, leading to unbounded partitions over time. For daily aggregates, consider:

PARTITION BY toYYYYMM(time)

to keep monthly partitions manageable.


3-4: Evaluate LowCardinality on high‑repetition columns

Columns like workspace_id and host may have low unique value counts. Marking them as LowCardinality(String) can reduce memory usage and speed up queries:

- workspace_id String,
- host         String,
+ workspace_id LowCardinality(String),
+ host         LowCardinality(String),

1-20: Implement TTL to enforce retention policy

To automatically purge stale metrics, add a TTL clause such as:

TTL time + INTERVAL 90 DAY

This helps control disk usage for rolling‑window analytics.

go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql (3)

1-11: Define PARTITION BY for monthly aggregates

Without an explicit partition key, all data lands in a single partition. For a monthly summary table, add:

PARTITION BY toYYYYMM(time)

This keeps partitions aligned with calendar boundaries.


1-10: Optimize tags column with LowCardinality

If individual tag values repeat frequently, using Array(LowCardinality(String)) can significantly reduce index size:

- tags Array(String),
+ tags Array(LowCardinality(String)),

1-14: Add TTL for long‑term data cleanup

Even monthly aggregates can grow unbounded. Consider:

TTL time + INTERVAL 1 YEAR

to automatically remove data older than your retention period.

go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql (3)

1-11: Introduce PARTITION BY for daily summary

To avoid a single ever‑growing partition, partition by month (or day):

PARTITION BY toYYYYMM(time)

This optimizes merges and query performance.


1-10: Use LowCardinality on tags array elements

If tag values are drawn from a small set, switch to:

- tags Array(String),
+ tags Array(LowCardinality(String)),

to shrink index size.


1-14: Set a TTL for daily aggregates

Hourly and daily summaries may only be needed short‑term. Add, for example:

TTL time + INTERVAL 30 DAY

to auto‑purge stale buckets.

go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql (3)

1-11: Partition hourly data to control shard size

Define a partition key, for example:

PARTITION BY toYYYYMM(time)

so that each month’s hourly buckets live in their own partitions.


1-10: Apply LowCardinality to tags array if appropriate

For sparse tag sets, use:

- tags Array(String),
+ tags Array(LowCardinality(String)),

to reduce the memory footprint.


1-14: Add TTL for short‑term hourly summaries

Consider:

TTL time + INTERVAL 7 DAY

to automatically expire hourly aggregates after one week.

go/pkg/clickhouse/schema/databases/000_README.md (1)

18-20: Minor formatting inconsistency in the backtick usage.

There's a slight formatting inconsistency in the backtick usage around the prefix examples.

- - `raw_`: Input data tables
- - `tmp_{yourname}_`: Temporary tables for experiments, add your name, so it's easy to identify ownership.
+ - `raw_` - Input data tables
+ - `tmp_{yourname}_` - Temporary tables for experiments, add your name, so it's easy to identify ownership.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~18-~18: Loose punctuation mark.
Context: ...tion][version] ### Prefixes -raw: Input data tables - tmp_{yourname}_`: ...

(UNLIKELY_OPENING_PUNCTUATION)

go/apps/api/integration/multi_node_ratelimiting/run.go (3)

202-217: Fix step numbering in comments.

This section is labeled as "Step 6" but should be "Step 7" since there's already a Step 6 above.

-// Step 6: Verify Clickhouse Metrics Data
+// Step 7: Verify Clickhouse Metrics Data

208-208: Consider using a named query parameter for workspace_id.

This line uses string formatting for the workspace ID, which is inconsistent with the parameterized approach used in the previous query. Consider using named parameters here as well for consistency and to avoid potential SQL injection issues.

-row := h.CH.Conn().QueryRow(ctx, fmt.Sprintf(`SELECT count(*) as total_requests, count(DISTINCT request_id) as unique_requests FROM metrics.raw_api_requests_v1 WHERE workspace_id = '%s';`, h.Resources().UserWorkspace.ID))
+row := h.CH.Conn().QueryRow(ctx, `SELECT count(*) as total_requests, count(DISTINCT request_id) as unique_requests FROM metrics.raw_api_requests_v1 WHERE workspace_id = {workspace_id:String};`, 
+  map[string]string{
+    "workspace_id": h.Resources().UserWorkspace.ID,
+  })

214-214: Simplify boolean expressions by using direct equality checks.

The boolean expression can be simplified by directly comparing the values rather than applying the equality comparison to the result of the comparison.

-return metricsCount == uint64(totalRequests) && uniqueCount == uint64(totalRequests) // nolint:gosec
+return metricsCount == uint64(totalRequests) && uniqueCount == uint64(totalRequests) // nolint:gosec

No actual change needed here, just noting that the expression is already optimized.

go/pkg/testutil/containers/clickhouse.go (3)

17-43: Docs are out‑of‑sync with the actual API

The comment block still promises that the function returns a driver.Conn plus one DSN, but the signature now returns two DSNs and no connection. This can mislead future callers and reviewers.

Please revise the godoc or reinstate the original return value for consistency.


101-122: Retry loop could leak if Ping hangs longer than the context

Minor: the context inside Ping has a 10 s timeout, while the outer Retry has none.
If ClickHouse never comes up the goroutine will keep spinning until the test‑timeout kicks in.
Consider adding a max‑retry duration or wrapping the entire Retry with a parent context to avoid very long hangs.


132-179: SQL splitting by ; is brittle

runClickHouseMigrations naïvely splits on ;, which breaks on:

  • Embedded semicolons in string literals / comments
  • PL/SQL style multiline statements (future‑proofing)

A safer option is to use clickhouse.Conn.Exec with client.MultiStatement enabled or a SQL parser.

If you keep the current approach, guard against empty statements and trim comments.

go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql (2)

4-4: Prefer DateTime64(3) over raw Int64 for timestamps

Storing Unix‑milliseconds as Int64 makes time‑based functions, partition pruning and human querying harder.
DateTime64(3) keeps the same precision, is automatically understood by ClickHouse’s date functions, and compresses better.


29-30: Use IPv6 data type for ip_address

ClickHouse has a dedicated IPv6 (which also stores IPv4) providing better validation and sorting.
Storing as plain String loses this benefit and wastes space.

go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql (2)

6-6: Timestamp column type

Same comment as for the metrics table – switch time Int64time DateTime64(3) for native support of ClickHouse date functions.


21-22: Consider Enum8 for outcome

The outcome column has a fixed, known set.
Enum8('VALID'=1,'RATE_LIMITED'=2,'EXPIRED'=3,'DISABLED'=4) is faster, smaller and self‑documenting compared to LowCardinality(String).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fae4bcf and aed62bb.

⛔ Files ignored due to path filters (1)
  • go/go.sum is excluded by !**/*.sum
📒 Files selected for processing (68)
  • go/apps/api/integration/harness.go (4 hunks)
  • go/apps/api/integration/multi_node_ratelimiting/run.go (3 hunks)
  • go/apps/api/routes/v2_ratelimit_limit/200_test.go (9 hunks)
  • go/cmd/api/main.go (2 hunks)
  • go/go.mod (5 hunks)
  • go/pkg/batch/process.go (2 hunks)
  • go/pkg/clickhouse/client.go (4 hunks)
  • go/pkg/clickhouse/interface.go (2 hunks)
  • go/pkg/clickhouse/noop.go (2 hunks)
  • go/pkg/clickhouse/schema/databases/000_README.md (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/002_raw_key_verifications_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/013_key_verifications_per_month_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/014_key_verifications_per_month_mv_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/015_key_verifications_per_hour_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/016_key_verifications_per_hour_mv_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/017_key_verifications_per_day_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/004_ratelimits_per_hour_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/006_ratelimits_per_day_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/007_ratelimits_per_day_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/008_ratelimits_per_month_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/010_ratelimits_last_used_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/002_raw_api_requests_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/007_api_requests_per_day_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/001_database.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/005_billalble_verifications_v2_mv.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/001_database.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/002_active_workspaces_per_month_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_keys_per_month_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/databases/005_business/003_active_workspaces_ratelimits_per_month_mv_v1.sql (1 hunks)
  • go/pkg/clickhouse/schema/embed.go (1 hunks)
  • go/pkg/clickhouse/select.go (1 hunks)
  • go/pkg/codes/constants_gen.go (1 hunks)
  • go/pkg/testutil/containers/api.go (3 hunks)
  • go/pkg/testutil/containers/clickhouse.go (5 hunks)
  • go/pkg/testutil/http.go (2 hunks)
  • go/pkg/testutil/seed/seed.go (0 hunks)
💤 Files with no reviewable changes (1)
  • go/pkg/testutil/seed/seed.go
🧰 Additional context used
🧬 Code Graph Analysis (2)
go/pkg/clickhouse/client.go (1)
go/pkg/otel/logging/interface.go (1)
  • Logger (11-116)
go/pkg/testutil/containers/clickhouse.go (2)
go/pkg/testutil/containers/containers.go (1)
  • Containers (13-17)
go/pkg/clickhouse/schema/embed.go (1)
  • Migrations (10-10)
🪛 LanguageTool
go/pkg/clickhouse/schema/databases/000_README.md

[uncategorized] ~18-~18: Loose punctuation mark.
Context: ...tion][version] ### Prefixes -raw: Input data tables - tmp_{yourname}_`: ...

(UNLIKELY_OPENING_PUNCTUATION)

🔇 Additional comments (67)
go/pkg/codes/constants_gen.go (1)

2-2: Generated timestamp update is expected.

This file is auto‑generated by generate.go; updating the Generated at timestamp reflects a fresh regeneration. No manual edits are required.

go/pkg/clickhouse/schema/databases/001_verifications/001_database.sql (1)

1-1: Good: Idempotent database creation.

The CREATE DATABASE IF NOT EXISTS verifications; statement correctly ensures the verifications database is only created once. No changes needed.

go/pkg/clickhouse/schema/databases/002_ratelimits/000_database.sql (1)

1-1: Good: Idempotent database creation.

The CREATE DATABASE IF NOT EXISTS ratelimits; statement is correctly formed for idempotent schema setup. No adjustments required.

go/pkg/clickhouse/schema/databases/003_metrics/001_database.sql (1)

1-1: Good: Idempotent database creation.

The CREATE DATABASE IF NOT EXISTS metrics; statement is properly defined, ensuring idempotent application of the migration.

go/pkg/clickhouse/schema/databases/004_billing/001_database.sql (1)

1-2: Looks good, proper usage of IF NOT EXISTS

The SQL statement for creating the billing database is correctly using the IF NOT EXISTS clause to ensure idempotency, which is essential for migrations that may be run multiple times.

go/pkg/clickhouse/noop.go (2)

6-6: Looks good, proper import for ClickHouse driver

Adding the ClickHouse Go driver import supports the new interface method.


43-45: Implementation aligns with interface requirements

The Conn() method correctly implements the interface by returning nil, which is appropriate for a no-op implementation where the actual connection isn't needed.

go/cmd/api/main.go (2)

128-128: Good cleanup removing redundant type conversion

Removing the unnecessary int() type conversion improves code clarity since cmd.Int() already returns an int.


144-144: Good cleanup removing redundant type conversion

Removing the unnecessary int() type conversion improves code clarity since cmd.Int() already returns an int.

go/pkg/clickhouse/interface.go (2)

6-6: Looks good, proper import for ClickHouse driver

Adding the ClickHouse Go driver import supports the new interface method.


31-33: Interface extension enables direct connection access

Adding the Conn() method to the Querier interface provides direct access to the underlying ClickHouse connection, enabling more fine-grained control and the ability to verify exactly how many rows are being inserted - which aligns well with the PR objective of fixing the "4x rows" issue.

go/pkg/clickhouse/schema/embed.go (1)

1-10: Well-structured implementation of file embedding for SQL migrations

This file uses Go's embed package to include SQL migration files directly in the compiled binary, which is a good practice for several reasons:

  1. Ensures migrations are always available alongside the application
  2. Simplifies deployment by eliminating the need for separate files
  3. Makes testing more reliable and portable

The implementation follows Go's standard patterns for file embedding and includes clear documentation.

go/pkg/clickhouse/client.go (1)

145-145: Good performance optimization with increased concurrency

Doubling the number of consumers from 4 to 8 for the ratelimits batch processor should improve throughput. This change aligns well with the PR objective of addressing the issue of sending too many rows to ClickHouse.

go/pkg/testutil/http.go (2)

57-57: Improved naming clarity for database connection string.

Renaming from generic dsn to more specific mysqlDSN makes the code more self-documenting and distinguishes it from the ClickHouse DSN added later.


91-99:

Details

✅ Verification successful

Well-structured integration of real ClickHouse client.

The code correctly initializes a real ClickHouse connection instead of a mock, which directly addresses the PR objective of fixing the issue with incorrect row counts sent to ClickHouse. The proper error handling ensures test failures are clear and immediate if the ClickHouse client cannot be created.

Let's verify this test harness is being used in integration tests that check row counts:


🏁 Script executed:

#!/bin/bash
# Find integration tests that might be asserting on ClickHouse row counts
rg -A 5 -B 5 "clickhouse|ClickHouse" --type go go/apps/api/integration/
rg -A 5 -B 5 "assert.*row.*count" --type go go/apps/api/integration/

Length of output: 6453


Integration tests leverage the real ClickHouse client for aggregated counts

  • Found in go/apps/api/integration/harness.go: the test harness calls containerMgr.RunClickHouse() and uses clickhouse.New(...)—exactly the code under review.
  • In go/apps/api/integration/multi_node_ratelimiting/run.go, tests execute clickhouse.Select[...] to retrieve aggregated counts (total_requests, success_count, failure_count), validating row counts indirectly.

These usages confirm that your real ClickHouse initialization in go/pkg/testutil/http.go (lines 91–99) is exercised by integration tests and addresses the original issue. No further changes needed.

go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1.sql (1)

1-10: Well-designed table schema for billable ratelimits aggregation.

This table effectively supports monthly aggregation of ratelimit usage per workspace:

  • The SummingMergeTree engine is an excellent choice as it automatically sums the count column during merges
  • Ordering by workspace_id, year, month enables efficient querying for billing reports
  • The schema is consistent with other billing tables, maintaining a clear pattern

This schema directly supports the PR objective by providing a structured way to track and verify the number of rows inserted for ratelimits.

go/pkg/clickhouse/schema/databases/004_billing/004_billable_verifications_v2.sql (1)

1-10: Consistent schema design for billable verifications v2.

This table follows the same design principles as the ratelimits table:

  • Uses SummingMergeTree for efficient aggregation
  • Maintains the same column structure and ordering
  • The v2 suffix indicates this is an iteration on an earlier schema version

This schema, along with the test harness changes, will help validate correct insertion counts for verifications.

go/pkg/clickhouse/schema/databases/004_billing/002_billable_verifications_per_month_v1.sql (1)

1-10:

Details

✅ Verification successful

Original billable verifications schema maintained for compatibility.

This v1 table has identical structure to the v2 version, suggesting it's kept for backward compatibility or migration purposes. Maintaining both versions is a good practice during schema evolution to ensure continuous operation.

The presence of both v1 and v2 tables raises a question:


🏁 Script executed:

#!/bin/bash
# Check if there are any materialized views or other objects that still reference the v1 table
rg -A 10 "billable_verifications_per_month_v1" go/pkg/clickhouse/schema/

Length of output: 2491


v1 table usage verified

  • The materialized view at go/pkg/clickhouse/schema/databases/004_billing/003_billable_verifications_per_month_mv_v1.sql writes into billing.billable_verifications_per_month_v1, confirming it’s still in use and must be retained for compatibility.

All set to approve these changes.

go/pkg/clickhouse/schema/databases/002_ratelimits/011_ratelimits_last_used_mv_v1.sql (1)

4-17: ratelimits_last_used_mv_v1: Materialized view definition looks solid
The use of maxSimpleState(time) with grouping on (workspace_id, namespace_id, identifier) correctly captures the last-used timestamp per identifier. No issues detected.

go/pkg/clickhouse/schema/databases/002_ratelimits/003_ratelimits_per_minute_mv_v1.sql (1)

1-17: Materialized view definition looks correct

The ratelimits_per_minute_mv_v1 view accurately aggregates by minute, computes passed and total counts, and writes to ratelimits.ratelimits_per_minute_v1. No issues spotted in the SELECT, grouping, or MV syntax.

go/pkg/clickhouse/schema/databases/001_verifications/018_key_verifications_per_day_mv_v3.sql (1)

1-22: Approve materialized view for daily aggregations
The key_verifications_per_day_mv_v3 correctly selects and groups by all required dimensions including tags and applies toStartOfDay on the millisecond timestamp. This aligns with the v3 schema evolution for daily verifications.

go/pkg/clickhouse/schema/databases/001_verifications/020_key_verifications_per_month_mv_v3.sql (1)

1-23: Approve materialized view for monthly aggregations
The key_verifications_per_month_mv_v3 view correctly mirrors the daily version, using toStartOfMonth and including tags. It follows the established v3 pattern for monthly summarization.

go/pkg/clickhouse/schema/databases/002_ratelimits/009_ratelimits_per_month_mv_v1.sql (1)

1-18: Approve monthly rate-limit materialized view
The ratelimits_per_month_mv_v1 view properly uses countIf(passed > 0) and count(*), grouping by the month-start timestamp. It aligns with existing per-minute, per-hour, and per-day views.

go/pkg/clickhouse/schema/databases/003_metrics/008_api_requests_per_day_mv_v1.sql (1)

1-18: Daily aggregation view for API requests looks correct
This materialized view properly buckets metrics.raw_api_requests_v1 into daily counts by workspace_id, path, response_status, host, method, and toStartOfDay(...). The SQL follows established conventions and should integrate seamlessly.

go/pkg/clickhouse/schema/databases/001_verifications/008_key_verifications_per_month_mv_v1.sql (1)

2-21: Monthly aggregation view for key verifications is accurate
The view groups raw_key_verifications_v1 by all key dimensions and toStartOfMonth(...) for a monthly summary. The syntax and naming align with the other verifications migrations.

go/pkg/clickhouse/schema/databases/001_verifications/010_key_verifications_per_hour_mv_v2.sql (1)

1-22: Hourly aggregation view for key verifications (v2) with tags is correct
This view extends the hourly aggregation to include the tags column, grouping on the same dimensions plus tags. The use of toStartOfHour(...) matches project standards and the grouping is complete.

go/pkg/clickhouse/schema/databases/004_billing/006_billable_ratelimits_v1_mv.sql (1)

2-15: Monthly billing view for rate limits is well-defined
The view filters passed > 0 and sums passed per workspace_id, year, and month using toYear(time) and toMonth(time). This matches billing requirements and is consistent with other aggregate views.

go/pkg/clickhouse/schema/databases/003_metrics/006_api_requests_per_minute_mv_v1.sql (1)

1-18: Per-minute aggregation view for API requests is solid
This view correctly groups raw API requests into one‑minute buckets via toStartOfMinute(fromUnixTimestamp64Milli(time)) and dimensions. The pattern mirrors existing per-hour and per-day views.

go/pkg/clickhouse/schema/databases/003_metrics/004_api_requests_per_hour_mv_v1.sql (1)

1-18: Materialized view for hourly API requests is correctly defined.
The aggregation logic, grouping keys, and timestamp truncation align with the intended design.

go/pkg/clickhouse/schema/databases/002_ratelimits/002_ratelimits_per_minute_v1.sql (1)

1-14: Per-minute ratelimits table schema looks good.
Use of SummingMergeTree and the specified ORDER BY columns is appropriate for incremental counters.

go/pkg/clickhouse/schema/databases/002_ratelimits/005_ratelimits_per_hour_mv_v1.sql (1)

1-18: Hourly ratelimits materialized view is properly constructed.
The countIf for passed requests, total count, and hourly timestamp truncation match the requirements.

go/pkg/clickhouse/schema/databases/001_verifications/012_key_verifications_per_day_mv_v2.sql (1)

1-23: Daily key verifications materialized view is accurate.
Including tags in the grouping and using toStartOfDay on the timestamp is correct for v2.

go/pkg/clickhouse/schema/databases/001_verifications/004_key_verifications_per_hour_mv_v1.sql (1)

1-21: Hourly key verifications materialized view implementation is sound.
The grouping on the truncated hour and count aggregation aligns with the raw data structure.

go/pkg/clickhouse/schema/databases/001_verifications/003_key_verifications_per_hour_v1.sql (1)

1-13: Well-designed hourly aggregation table with appropriate engine choice

This ClickHouse table is well-structured for hourly verification aggregations. The SummingMergeTree engine is ideal for this use case as it will automatically sum the count column during merges. The column ordering in the PRIMARY KEY supports efficient filtering by workspace and key space first, followed by time-based access patterns.

The use of LowCardinality for the outcome field is a good optimization for a column that likely has few distinct values.

go/pkg/clickhouse/schema/databases/001_verifications/007_key_verifications_per_month_v1.sql (1)

1-13: Consistent table design following the established pattern

This monthly aggregation table maintains the same well-designed structure as the hourly table, creating a consistent pattern across different time granularities. The identical column structure and SummingMergeTree engine configuration promote code maintainability and predictable query patterns.

The versioned naming convention (_v1) also indicates good planning for future schema evolution.

go/pkg/clickhouse/schema/databases/002_ratelimits/001_raw_ratelimits_v1.sql (2)

1-14: Well-structured raw events table with appropriate indexing

This table design for raw rate limit events uses appropriate indexing strategies with both the primary ORDER BY and secondary indices to support common query patterns. The MergeTree engine is a good choice for raw event storage.


16-19: Good use of secondary indices

The secondary indices on (workspace_id, time) and request_id will help optimize different query patterns beyond what the primary key ordering provides.

go/pkg/batch/process.go (3)

105-107: Improved thread safety by localizing batch state

Moving the batch slice from being a struct field to a local variable within the process method is an excellent refactoring. This ensures each consumer goroutine has its own independent batch state, eliminating potential race conditions.


109-116: Updated flushAndReset to use localized batch

The flushAndReset function has been correctly updated to work with the localized batch variable.


119-134: Updated main processing loop

The main processing loop has been properly updated to use the localized batch variable, maintaining the same batching logic while improving thread safety.

go/pkg/clickhouse/schema/databases/001_verifications/006_key_verifications_per_day_mv_v1.sql (1)

1-3: Validate MV target table existence and schema alignment

Ensure that the destination table verifications.key_verifications_per_day_v1 already exists and its column order and types exactly match the SELECT list (workspace_id, key_space_id, identity_id, key_id, outcome, count, time). A mismatch will cause the materialized view to fail at runtime.

go/pkg/clickhouse/schema/databases/001_verifications/019_key_verifications_per_month_v3.sql (1)

1-14: Well-structured aggregation table for monthly key verifications

This is a well-designed table with appropriate use of the SummingMergeTree engine, which will automatically sum the count column during merges. The ORDER BY clause includes all dimensions needed for efficient querying of verification metrics by workspace, key space, identity, key, and time, with tags and outcome at the end.

The inclusion of tags as an Array(String) adds important flexibility for multi-dimensional analysis.

go/pkg/clickhouse/schema/databases/003_metrics/003_api_requests_per_hour_v1.sql (1)

1-20: Well-designed hourly API metrics table with appropriate indexing

The table structure effectively supports hourly API request analytics with:

  • Good use of LowCardinality for the method column to optimize storage
  • Clear comments explaining the method column format
  • Appropriate use of SummingMergeTree for aggregation
  • Logical column ordering in the ORDER BY clause that prioritizes workspace_id and time

This will enable efficient querying of API usage patterns across multiple dimensions.

go/pkg/clickhouse/schema/databases/003_metrics/005_api_requests_per_minute_v1.sql (1)

1-20: Properly structured minute-level API metrics table

This table mirrors the hourly metrics table design but at minute-level granularity, which is appropriate for more detailed analytics. The table structure and indexing strategy are consistent with the hourly table, maintaining a good balance between query performance and storage efficiency.

go/pkg/clickhouse/schema/databases/001_verifications/005_key_verifications_per_day_v1.sql (1)

1-13: Appropriate daily verification metrics table design

This v1 table design provides a good structure for day-level verification metrics. The SummingMergeTree engine is appropriate for this use case, and the ORDER BY clause efficiently organizes data for common query patterns focusing on workspace, key space, and time.

Note that this v1 table doesn't include the tags array that exists in the v3 monthly table, which is expected for a v1 schema version.

go/apps/api/integration/harness.go (6)

8-8: Proper import of the ClickHouse package

Adding the ClickHouse import is necessary for the integration test harness to use ClickHouse functionality.


27-27: Good addition of ClickHouse client to the Harness struct

Adding the CH field of type clickhouse.ClickHouse to the Harness struct allows tests to interact with ClickHouse directly.


47-56: Proper ClickHouse container setup with error handling

The code correctly:

  1. Starts a ClickHouse container with migrations
  2. Creates a real ClickHouse client with the host DSN
  3. Uses a no-op logger appropriate for tests
  4. Checks for errors during client creation

This ensures ClickHouse is properly configured for integration tests.


57-57: Appropriate MySQL setup after ClickHouse initialization

Starting MySQL after ClickHouse is a logical sequence, as it ensures both databases are available before the test harness is fully initialized.


73-75: Proper field initialization in the Harness struct

The MySQL DSN and ClickHouse client are correctly assigned to the Harness struct fields, making them available for tests.


80-84: Well-structured API configuration with both database connections

The API configuration now properly includes both MySQL and ClickHouse DSNs, which allows the API to connect to both databases during tests. This change addresses the PR objective of enabling verification that the correct number of rows are sent to ClickHouse.

go/pkg/clickhouse/schema/databases/001_verifications/011_key_verifications_per_day_v2.sql (1)

1-14: Table schema looks well-designed for daily verification aggregations.

The table follows good ClickHouse design practices:

  • Uses SummingMergeTree engine which is appropriate for aggregation tables
  • Includes LowCardinality(String) for outcome which optimizes for repeated string values
  • Properly orders data by all dimensions to support efficient querying patterns
  • Includes tags as an Array type for flexible categorization
go/pkg/clickhouse/schema/databases/000_README.md (1)

1-72: Comprehensive naming convention documentation.

This README provides clear and structured guidance for maintaining consistency in the ClickHouse schema. The conventions are well-defined and the examples effectively illustrate each pattern.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~18-~18: Loose punctuation mark.
Context: ...tion][version] ### Prefixes -raw: Input data tables - tmp_{yourname}_`: ...

(UNLIKELY_OPENING_PUNCTUATION)

go/pkg/clickhouse/schema/databases/001_verifications/009_key_verifications_per_hour_v2.sql (1)

1-14: Table schema looks well-designed for hourly verification aggregations.

The table structure maintains consistency with other time-based aggregation tables while providing hourly granularity:

  • Uses identical column structure and types as the daily aggregation table
  • Implements the same composite ordering key
  • Appropriately uses SummingMergeTree for aggregation
go/apps/api/integration/multi_node_ratelimiting/run.go (3)

14-14: LGTM: Added ClickHouse import.

Added import for the ClickHouse package to support new verification steps.


134-137: LGTM: Improved variable naming.

Renamed variable from err to callErr to avoid shadowing the earlier err variable.


166-201: LGTM: Added ClickHouse verification for rate limit data.

This Step 6 adds thorough verification of rate limit data in ClickHouse:

  • Uses a well-structured aggregatedCounts type for query results
  • Implements polling with require.Eventually to handle async writes
  • Verifies total requests, successes and failures match expected counts
go/pkg/clickhouse/select.go (2)

9-61: Great documentation! Extensive and clear explanations.

The documentation for this function is excellent. It covers:

  • Purpose and functionality
  • Parameter substitution and SQL injection prevention
  • Requirements for struct tags
  • Context usage for timeouts
  • Error handling expectations
  • Example usage with real code
  • Memory considerations for large result sets
  • Thread-safety information

This level of detail makes the function self-documenting and assists future developers in using it correctly.


62-74: Clean implementation with proper error handling

The implementation is concise and follows the principle of minimal code for maximum functionality. The error handling is appropriate, wrapping and forwarding any errors from the underlying driver.

go/apps/api/routes/v2_ratelimit_limit/200_test.go (3)

13-14: Good addition of ClickHouse imports

The addition of ClickHouse-related imports aligns with the PR objective of integrating ClickHouse for data verification in tests.


65-109: Excellent implementation of ClickHouse verification test

This new test case effectively validates that rate limit events are properly persisted in ClickHouse. The test:

  1. Creates a namespace with a unique identifier
  2. Sends a rate limit request
  3. Queries ClickHouse using the new Select generic function
  4. Verifies the event data was correctly stored

The use of require.Eventually is appropriate for handling potential asynchronous data flushing to ClickHouse.


325-338: Good refactoring with helper function

Extracting the namespace creation logic into a helper function improves code maintainability and reduces duplication across test cases. This is a positive change that follows the DRY principle.

go/pkg/testutil/containers/api.go (3)

22-26: Good struct design for API configuration

Creating a structured configuration type is a better approach than passing multiple parameters. This design:

  1. Makes the API more maintainable
  2. Allows for future expansion without changing function signatures
  3. Makes the code more readable by grouping related parameters

28-28: Clean function signature refactoring

Updating the function signature to use the new config struct maintains backward compatibility while improving the API design.


56-75: Proper ClickHouse integration

The addition of the ClickHouse DSN to the environment variables ensures that the API containers can connect to ClickHouse. This integration is implemented correctly.

go/go.mod (2)

16-16: Appropriate addition of goose for database migrations

Adding github.com/pressly/goose/v3 is appropriate for managing database migrations, which aligns with the introduction of ClickHouse schema migrations.


22-22:

Details

❓ Verification inconclusive

Dependency version update

Updating github.com/urfave/cli/v3 to v3.2.0 is likely necessary to support other changes in the codebase. Make sure to test for any breaking changes in the CLI functionality.


🏁 Script executed:

#!/bin/bash
# Check for any breaking changes in urfave/cli v3.2.0
gh api repos/urfave/cli/releases/tags/v3.2.0 --jq '.body' | grep -i "break"

Length of output: 73


Action Required: Verify CLI v3.2.0 for breaking changes
An automated scan of the GitHub release notes for “break” returned no matches, but keyword searches can miss subtler API changes. Please manually review the CHANGELOG or full release notes and exercise your CLI commands to ensure nothing has regressed.

• File: go/go.mod (line 22)

  github.com/urfave/cli/v3 v3.2.0

@github-actions
Copy link
Contributor

github-actions bot commented Apr 22, 2025

Thank you for following the naming conventions for pull request titles! 🙏

@vercel vercel bot temporarily deployed to Preview – engineering April 22, 2025 14:50 Inactive
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
go/pkg/testutil/containers/clickhouse.go (2)

142-177: Enhanced migration process using embedded filesystem

The migration process has been improved by using the embedded filesystem from the schema package. Some suggestions:

  1. Consider adding a comment that explicitly states migrations are applied in lexicographic order (as guaranteed by fs.WalkDir).
  2. It might be worth adding logging for each migration being applied.
func runClickHouseMigrations(conn driver.Conn) error {
+	// Migrations are applied in lexicographic order as guaranteed by fs.WalkDir
	return fs.WalkDir(schema.Migrations, ".", func(path string, d fs.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() {
			return nil
		}

+		// Apply migration from file
+		fmt.Printf("Applying ClickHouse migration: %s\n", path)
		f, err := schema.Migrations.Open(path)
		if err != nil {
			return err
		}
		defer f.Close()

170-173: Verify SQL statement execution

Your code appends a semicolon to each SQL statement before execution. While this works correctly, it could result in double semicolons if the original statement already includes one. Consider checking if the statement already ends with a semicolon before adding another.

-			err = conn.Exec(context.Background(), fmt.Sprintf("%s;", query))
+			// Add semicolon if not already present
+			if !strings.HasSuffix(query, ";") {
+				query = query + ";"
+			}
+			err = conn.Exec(context.Background(), query)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aed62bb and 58bf585.

📒 Files selected for processing (1)
  • go/pkg/testutil/containers/clickhouse.go (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
go/pkg/testutil/containers/clickhouse.go (2)
go/pkg/testutil/containers/containers.go (1)
  • Containers (13-17)
go/pkg/clickhouse/schema/embed.go (1)
  • Migrations (10-10)
🔇 Additional comments (2)
go/pkg/testutil/containers/clickhouse.go (2)

74-78: Well done fixing the migration execution order issue!

The code now correctly runs migrations before closing the connection, addressing the critical issue identified in the previous review. Good addition of execution timing information for diagnostics.

Also applies to: 127-128


86-87: Nice improvement with Docker network integration

The container is now explicitly connected to the Docker network, and you're providing two DSN strings - one for host access and another for Docker network access. This facilitates inter-container communication in your test environment.

Also applies to: 96-97

@chronark chronark enabled auto-merge April 23, 2025 07:03
@chronark chronark disabled auto-merge April 24, 2025 05:36
@chronark chronark merged commit e7d3ed7 into main Apr 24, 2025
25 of 34 checks passed
@chronark chronark deleted the ratelimit-deduplication branch April 24, 2025 05:36
@vercel vercel bot temporarily deployed to Preview – engineering April 24, 2025 05:37 Inactive
@vercel vercel bot temporarily deployed to Preview – dashboard April 24, 2025 05:37 Inactive
@coderabbitai coderabbitai bot mentioned this pull request Jul 7, 2025
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants