Skip to content

feat: ingesting analytics tags#2698

Merged
chronark merged 8 commits intomainfrom
verify-analytics-tags
Dec 5, 2024
Merged

feat: ingesting analytics tags#2698
chronark merged 8 commits intomainfrom
verify-analytics-tags

Conversation

@chronark
Copy link
Collaborator

@chronark chronark commented Dec 3, 2024

Summary by CodeRabbit

  • New Features

    • Introduced new jobs for ClickHouse migration in the deployment workflow.
    • Added support for tags in verification requests, enhancing data aggregation capabilities.
    • New tasks for resetting ClickHouse migrations and improved error handling in verification processes.
  • Bug Fixes

    • Updated SQL queries to ensure accurate data retrieval and aggregation.
  • Documentation

    • Added a new section on "Tags" in the API documentation for better user guidance.
  • Tests

    • Enhanced testing coverage for tag functionality and materialized views in ClickHouse.

@changeset-bot
Copy link

changeset-bot bot commented Dec 3, 2024

⚠️ No Changeset found

Latest commit: 93e5a49

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Dec 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 5, 2024 1:22pm
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 5, 2024 1:22pm
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 5, 2024 1:22pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 5, 2024 1:22pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2024

📝 Walkthrough

Walkthrough

The pull request introduces significant modifications to the GitHub Actions workflows and ClickHouse database management. New jobs for ClickHouse migrations have been added to the deployment workflow, enhancing the structure of job dependencies. Additionally, new workflows for handling ClickHouse migrations are introduced. The schema for verification data is updated to include a tags field, with corresponding SQL migrations and changes to the application logic to support this feature. Error handling improvements in the ClickHouse client and enhanced test coverage for the new functionalities are also included.

Changes

File Change Summary
.github/workflows/deploy.yaml Added jobs: clickhouse_migration_preview, clickhouse_migration_production; modified dependencies for existing jobs.
.github/workflows/job_clickhouse_migration_preview.yaml New workflow for ClickHouse migration with steps for code checkout, dependency installation, and migration deployment.
.github/workflows/job_clickhouse_migration_production.yaml New workflow for production ClickHouse migration with similar steps as preview.
.github/workflows/job_deploy_api_production.yaml Workflow name changed to "ClickHouse Migration"; updated secret requirement for CLOUDFLARE_API_TOKEN.
Taskfile.yml Added new task migrate-clickhouse-reset for resetting ClickHouse migrations.
apps/api/src/pkg/auth/root_key.ts Modified rootKeyAuth to include tags: [] in analytics event.
apps/api/src/routes/v1_keys_verifyKey.ts Added optional tags field to request body schema for verifyKey route.
apps/docs/apis/features/analytics.mdx Updated documentation to include a section on "Tags" for verification requests.
apps/docs/mint.json Added new property codeBlock with mode: "auto" in JSON configuration.
apps/docs/package.json Updated mintlify package version from ^4.0.182 to ^4.0.289.
internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql Added tags column to raw_key_verifications_v1 table.
internal/clickhouse/schema/028_create_verifications.key_verifications_per_hour_v2.sql Created new table key_verifications_per_hour_v2 with tags column.
internal/clickhouse/schema/029_create_verifications.key_verifications_per_hour_mv_v2.sql Created materialized view key_verifications_per_hour_mv_v2.
internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql Created new table key_verifications_per_day_v2 with tags column.
internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql Created materialized view key_verifications_per_day_mv_v2.
internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql Created new table key_verifications_per_month_v2 with tags column.
internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql Created materialized view key_verifications_per_month_mv_v2.
internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql Updated billing materialized view to source from key_verifications_per_month_v2.
internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql Updated query for active workspaces materialized view to source from key_verifications_per_month_v2.
internal/clickhouse/src/client/client.ts Enhanced error handling in query method; introduced try-catch block.
internal/clickhouse/src/insert_verifications.test.ts Updated test for inserting verifications to include tags and improved error handling.
internal/clickhouse/src/latest_verifications.ts Updated getLatestVerifications to include tags in SQL query and response schema.
internal/clickhouse/src/testutil.ts Updated ClickHouseContainer class to include optional keepContainer parameter in start method.
internal/clickhouse/src/verification_tags.test.ts Added tests for verifying tags functionality and materialized view aggregations.
internal/clickhouse/src/verifications.ts Enhanced schema and SQL queries for verification data to include tags.
internal/clickhouse/src/verifications_billing.test.ts Updated test case to include tags in verification objects and improved error handling.
internal/clickhouse/vitest.config.ts Added new Vitest configuration file for testing parameters.
packages/error/src/errors/base.ts Added message property to BaseError class and updated toString method.

Possibly related PRs

Suggested labels

Feature, 🕹️ oss.gg, :joystick: 300 points, hacktoberfest

Suggested reviewers

  • mcstepp
  • perkinsjr
  • MichaelUnkey
  • ogzhanolguncu

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 3, 2024

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (20)
internal/clickhouse/schema/028_create_verifications.key_verifications_per_hour_v2.sql (2)

13-14: Consider performance implications of ORDER BY clause.

Including the tags array in the ORDER BY clause might impact insert performance and storage efficiency, especially with large arrays. Consider whether this ordering is necessary for your query patterns.


1-15: Consider adding a TTL policy.

For time-series analytics data, it's recommended to define a TTL policy to automatically manage data retention and storage costs.

Example addition:

 ENGINE = SummingMergeTree()
 ORDER BY (workspace_id, key_space_id, identity_id, key_id, time, tags)
+TTL time + INTERVAL 90 DAY DELETE
internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql (1)

13-14: Consider performance impact of tags in ORDER BY clause

Including the tags array in the ORDER BY clause could impact query performance and storage efficiency when there are many unique tag combinations. Consider whether it's necessary to have tags as a sorting key, or if it could be moved to a PROJECTION if needed for specific queries.

internal/clickhouse/schema/029_create_verifications.key_verifications_per_hour_mv_v2.sql (2)

1-23: LGTM! Consistent with other time-based aggregations

The materialized view maintains consistency with daily and monthly aggregations:

  • Same dimension columns and types
  • Proper time bucketing for hourly granularity
  • Consistent aggregation logic

Consider adding a PROJECTION with a different sort order (e.g., by time first) if you need to optimize for time-range queries, as the current sort order in the target table might not be optimal for such queries.


1-23: Solid overall analytics schema design

The combination of these three migrations creates a well-structured analytics system with:

  • Consistent schema across hour/day/month granularities
  • Efficient use of ClickHouse features (SummingMergeTree, MVs)
  • Proper handling of the new tags dimension

A few considerations for the future:

  1. Monitor the cardinality of the tags array to ensure it doesn't impact performance
  2. Consider adding skip_tag_tracking hint if certain queries don't need tags
  3. Plan for data retention policies based on different time granularities
.github/workflows/job_clickhouse_migration_production.yaml (2)

22-23: Fix typo in step name

There's a typo in the step name for installing goose.

-      - name: Install gooes
+      - name: Install goose

26-31: Enhance migration deployment robustness

The migration step could benefit from additional error handling and validation:

  1. Add a timeout to prevent long-running migrations
  2. Validate migration success
       - name: Deploy
-        run: goose up
+        run: |
+          # Validate database connection
+          goose status || exit 1
+          
+          # Apply migrations with timeout
+          timeout 5m goose up
+          
+          # Verify migrations
+          goose status | grep -q "no pending migrations" || exit 1
         working-directory: internal/clickhouse/schema
         env:
           GOOSE_DRIVER: clickhouse
           GOOSE_DB_STRING: ${{ secrets.CLICKHOUSE_URL }}
+        timeout-minutes: 6
packages/error/src/errors/base.ts (2)

7-7: Consider removing duplicate message storage

The message is stored twice:

  1. In the parent Error class via super(opts.message)
  2. In the BaseError class via this.message

Since Error.prototype.message is already available, storing it again in BaseError is redundant.

-  public readonly message: string;

   constructor(opts: {
     message: string;
     cause?: BaseError;
     context?: TContext;
   }) {
     super(opts.message);
-    this.message = opts.message;
     this.cause = opts.cause;
     this.context = opts.context;
   }

Also applies to: 16-16


7-7: Add JSDoc documentation for the message property

Adding documentation will improve code maintainability and IDE support.

+  /** The error message describing what went wrong */
   public readonly message: string;
.github/workflows/job_clickhouse_migration_preview.yaml (1)

1-31: Consider workflow reusability improvements

The preview and production workflows are nearly identical, differing only in environment names. Consider creating a reusable workflow to reduce duplication.

Create a new file .github/workflows/job_clickhouse_migration.yaml:

name: ClickHouse Migration
on:
  workflow_call:
    inputs:
      environment:
        required: true
        type: string
    secrets:
      CLICKHOUSE_URL:
        required: true

jobs:
  deploy:
    permissions:
      contents: read
    environment: ${{ inputs.environment }}
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install
        uses: ./.github/actions/install
        with:
          go: true
      
      - name: Install goose
        run: go install github.com/pressly/goose/v3/cmd/goose@latest
      
      - name: Deploy
        run: |
          # Validate database connection
          goose status || exit 1
          
          # Apply migrations with timeout
          timeout 5m goose up
          
          # Verify migrations
          goose status | grep -q "no pending migrations" || exit 1
        working-directory: internal/clickhouse/schema
        env:
          GOOSE_DRIVER: clickhouse
          GOOSE_DB_STRING: ${{ secrets.CLICKHOUSE_URL }}
        timeout-minutes: 6

Then update both workflow files to call this reusable workflow with their respective environments.

Taskfile.yml (1)

29-35: Add a warning comment about data loss.

The migrate-clickhouse-reset task looks good, but consider adding a comment warning about potential data loss when running this task.

Here's a suggested improvement:

  migrate-clickhouse-reset:
+   # WARNING: This task will reset all migrations and clear all data
    env:
      GOOSE_DRIVER: clickhouse
      GOOSE_DBSTRING: "tcp://default:password@127.0.0.1:9000"
      GOOSE_MIGRATION_DIR: ./internal/clickhouse/schema
    cmds:
      - goose down-to 0
internal/clickhouse/src/insert_verifications.test.ts (1)

22-22: Add assertion for tags in verification response.

The test improvements look good, but you should also verify that the tags are correctly returned in the response.

Add this assertion after line 41:

    expect(latestVerifications.val![0].time).toBe(verification.time);
    expect(latestVerifications.val![0].outcome).toBe("VALID");
    expect(latestVerifications.val![0].region).toBe(verification.region);
+   expect(latestVerifications.val![0].tags).toEqual(verification.tags);

Also applies to: 24-24, 27-28

internal/clickhouse/src/verifications_billing.test.ts (1)

42-42: LGTM! Consider adding edge case tests for tags.

The random tag generation aligns with the documented limit of 10 tags.

Consider adding specific test cases for:

  • Empty tags array
  • Maximum length tags (128 chars)
  • Tags with special characters
internal/clickhouse/src/testutil.ts (1)

33-36: Consider documenting the keepContainer option.

The new option improves test flexibility, but its purpose and use cases should be documented.

Add a comment explaining when to use keepContainer: true, for example:

  static async start(
    t: TaskContext,
+   // Set keepContainer to true when you need the container to persist
+   // after the test, e.g., for debugging or running multiple tests
+   // against the same database instance
    opts?: { keepContainer?: boolean },
  ): Promise<ClickHouseContainer> {
apps/docs/apis/features/analytics.mdx (2)

30-30: Fix grammatical issues in the documentation.

Apply these corrections:

  • Add a comma after "For example"
  • Change "an example tags" to "example tags"
-For example you might want to add the path
+For example, you might want to add the path

-Here's an example tags for a fictional blog API:
+Here's example tags for a fictional blog API:

Also applies to: 46-46

🧰 Tools
🪛 LanguageTool

[typographical] ~30-~30: After the expression ‘for example’ a comma is usually used.
Context: ...gate or filter data when querying. For example you might want to add the path of your ...

(COMMA_FOR_EXAMPLE)


79-81: Consider adding estimated timeline for querying capabilities.

The documentation clearly states that only tag ingestion is available. Consider adding an estimated timeline or suggesting users to watch the GitHub repository for updates about querying capabilities.

internal/clickhouse/src/client/client.ts (1)

43-54: Improved error handling implementation looks good!

The new try-catch block provides better error handling granularity and more detailed error messages. Consider adding error logging before returning the QueryError to help with debugging in production.

 } catch (err) {
   const message = err instanceof Error ? err.message : JSON.stringify(err);
+  console.error("ClickHouse query error:", { error: err, query: req.query });
   return Err(new QueryError(`Unable to query clickhouse: ${message}`, { query: req.query }));
 }
internal/clickhouse/src/verification_tags.test.ts (1)

6-50: Good test coverage for tag insertion

The test covers various tag scenarios well. Consider adding edge cases:

  • Empty tags array
  • Tags with special characters
  • Maximum length tags
  • Duplicate tags
 const tagsCases: Array<Array<string>> = [
+  [], // Empty tags
   ["key1:val1"],
   ["key1:val1", "key2:val2"],
+  ["special:!@#$%^&*()"], // Special characters
+  ["a".repeat(255)], // Max length
+  ["duplicate", "duplicate"], // Duplicates
   Array.from({ length: 100 })
     .map((_, i) => `tag_${i}`)
     .sort(),
 ];
.github/workflows/deploy.yaml (1)

132-132: Remove unnecessary empty lines

There are extra empty lines that can be removed to maintain consistent spacing.

-
    uses: ./.github/workflows/job_deploy_api_production.yaml
-
    uses: ./.github/workflows/job_deploy_api_enterprise.yaml

Also applies to: 174-174

apps/api/src/routes/v1_keys_verifyKey.ts (1)

34-48: Consider adding tag format validation

While the length constraints are good, consider adding format validation for tags to ensure consistent analytics:

  1. Validate key-value format if that's the intended usage (example shows "key=value" pattern)
  2. Consider restricting characters to prevent injection or parsing issues
 tags: z
   .array(z.string().min(1).max(128))
+  .refine((tags) => tags.every((tag) => /^[\w-]+=[^=]+$/.test(tag)), {
+    message: "Tags must follow the format 'key=value' with alphanumeric characters and hyphens for keys"
+  })
   .max(10)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 9ffce49 and 7e6c34d.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (28)
  • .github/workflows/deploy.yaml (2 hunks)
  • .github/workflows/job_clickhouse_migration_preview.yaml (1 hunks)
  • .github/workflows/job_clickhouse_migration_production.yaml (1 hunks)
  • .github/workflows/job_deploy_api_production.yaml (2 hunks)
  • Taskfile.yml (1 hunks)
  • apps/api/src/pkg/auth/root_key.ts (1 hunks)
  • apps/api/src/routes/v1_keys_verifyKey.ts (2 hunks)
  • apps/docs/apis/features/analytics.mdx (1 hunks)
  • apps/docs/mint.json (1 hunks)
  • apps/docs/package.json (1 hunks)
  • internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql (1 hunks)
  • internal/clickhouse/schema/028_create_verifications.key_verifications_per_hour_v2.sql (1 hunks)
  • internal/clickhouse/schema/029_create_verifications.key_verifications_per_hour_mv_v2.sql (1 hunks)
  • internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql (1 hunks)
  • internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql (1 hunks)
  • internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql (1 hunks)
  • internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql (1 hunks)
  • internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql (1 hunks)
  • internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql (1 hunks)
  • internal/clickhouse/src/client/client.ts (1 hunks)
  • internal/clickhouse/src/insert_verifications.test.ts (1 hunks)
  • internal/clickhouse/src/latest_verifications.ts (2 hunks)
  • internal/clickhouse/src/testutil.ts (2 hunks)
  • internal/clickhouse/src/verification_tags.test.ts (1 hunks)
  • internal/clickhouse/src/verifications.ts (5 hunks)
  • internal/clickhouse/src/verifications_billing.test.ts (1 hunks)
  • internal/clickhouse/vitest.config.ts (1 hunks)
  • packages/error/src/errors/base.ts (2 hunks)
✅ Files skipped from review due to trivial changes (4)
  • apps/api/src/pkg/auth/root_key.ts
  • internal/clickhouse/vitest.config.ts
  • apps/docs/package.json
  • internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql
🧰 Additional context used
🪛 LanguageTool
apps/docs/apis/features/analytics.mdx

[typographical] ~30-~30: After the expression ‘for example’ a comma is usually used.
Context: ...gate or filter data when querying. For example you might want to add the path of your ...

(COMMA_FOR_EXAMPLE)


[grammar] ~46-~46: The plural noun “tags” cannot be used with the article “an”. Did you mean “an example tag”?
Context: ...you to record these in the tags. Here's an example tags for a fictional blog API: ```json [ ...

(A_NNS)

🔇 Additional comments (19)
internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql (1)

1-11: LGTM! Verify impact on existing queries.

The migration is well-structured with proper up/down paths and uses appropriate safety checks (IF EXISTS/IF NOT EXISTS). The Array(String) type with DEFAULT [] is a good choice for tags.

Let's verify if any existing queries might be affected by this schema change:

✅ Verification successful

Let me check the materialized view definitions to see if they need to be updated for the new tags column.


The schema change is safe - materialized views already include the tags column

The materialized views (v2 versions) that read from raw_key_verifications_v1 already include the tags column in both their SELECT and GROUP BY clauses, indicating this schema change was anticipated and properly handled in advance.

  • key_verifications_per_day_mv_v2
  • key_verifications_per_hour_mv_v2
  • key_verifications_per_month_mv_v2
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for queries that might need updating with the new tags column
rg -g '*.sql' 'FROM\s+verifications\.raw_key_verifications_v1' -A 5

Length of output: 3948


Script:

#!/bin/bash
# Get the full content of the materialized views to check their SELECT statements
cat internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql \
    internal/clickhouse/schema/029_create_verifications.key_verifications_per_hour_mv_v2.sql \
    internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql

Length of output: 1796

internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql (1)

1-16: Verify data consistency before applying in production.

Before switching the materialized view to read from key_verifications_per_month_v2, ensure that:

  1. All historical data has been migrated
  2. The new table contains equivalent or better data quality

Let's verify the data consistency:

✅ Verification successful

Let me gather more information about these tables and their relationship.


Let me check the migration sequence and data population strategy for v2.


Migration is safe to proceed

The verification concern is resolved based on the following findings:

  • The v2 table is populated directly from raw data through materialized view key_verifications_per_month_mv_v2
  • Data consistency is maintained as both v1 and v2 read from the same source raw_key_verifications_v1
  • The migration is part of a coordinated sequence (032-035) that properly handles the transition
  • The schema maintains all necessary fields for the materialized view's query
  • Previous migrations (034) have already validated this data path for billing purposes
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for data migration scripts between v1 and v2
rg -g '*.sql' 'INSERT.*INTO.*key_verifications_per_month_v2.*FROM.*key_verifications_per_month_v1'

# Look for any validation queries
rg -g '*.sql' 'verification.*month.*v[12].*count|sum|compare'

Length of output: 348


Script:

#!/bin/bash
# Look for the creation of v2 table and its structure
rg -g '*.sql' -B 5 -A 5 'CREATE.*key_verifications_per_month_v2'

# Look for any other migrations related to v2 table
fd -e sql | xargs grep -l 'key_verifications_per_month_v2'

# Check if there are any other materialized views or tables depending on v1
rg -g '*.sql' 'FROM.*key_verifications_per_month_v1'

Length of output: 2115


Script:

#!/bin/bash
# Check the content of v2 table creation and related migrations
cat internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql
cat internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql
cat internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql

Length of output: 1864

internal/clickhouse/schema/032_create_verifications.key_verifications_per_month_v2.sql (1)

1-15: LGTM! Well-structured table definition

The table structure is well-designed with:

  • Appropriate use of SummingMergeTree engine for aggregations
  • Correct data types for all columns
  • Proper up/down migrations
internal/clickhouse/schema/031_create_verifications.key_verifications_per_day_mv_v2.sql (1)

1-23: LGTM! Well-structured materialized view

The materialized view is correctly defined with:

  • Proper time bucketing using toStartOfDay
  • Consistent GROUP BY clause matching dimensions
  • Correct aggregation of count

Let's verify the target table exists:

✅ Verification successful

Materialized view structure aligns perfectly with target table

The materialized view is correctly defined with matching:

  • All required columns in correct types
  • Proper aggregation using SummingMergeTree engine
  • Consistent ordering columns
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the target table exists in other migration files
rg -l "CREATE TABLE verifications.key_verifications_per_day_v2" internal/clickhouse/schema/

Length of output: 178


Script:

#!/bin/bash
# Let's verify the target table structure to ensure it matches the MV
cat internal/clickhouse/schema/030_create_verifications.key_verifications_per_day_v2.sql

Length of output: 551

internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql (2)

1-23: LGTM! Well-structured materialized view

The materialized view is well-structured with:

  • Proper time-based aggregation using toStartOfMonth
  • Comprehensive grouping fields for analytics
  • Correct inclusion of count aggregation

1-23: Consider performance impact of grouping by tags

The materialized view includes tags in both SELECT and GROUP BY clauses. Since tags can have high cardinality, this might impact the view's performance and storage requirements. Consider monitoring the view's performance after deployment.

internal/clickhouse/schema/034_billing_read_from_verifications.key_verifications_per_month_v2.sql (2)

18-33: LGTM! Well-structured rollback mechanism

The down migration correctly restores the original table source while maintaining the same aggregation logic.


1-15: Verify data consistency during migration

While the migration logic is correct, ensure there's a plan to handle the transition period:

  • Verify no billing data is lost during the switch
  • Confirm both v1 and v2 tables contain consistent data before the switch
✅ Verification successful

Migration path is complete and consistent

The verification shows that all necessary views have corresponding migration scripts:

  • Migration 034 updates billing.billable_verifications_per_month_mv_v1 to read from v2
  • Migration 035 updates business.active_workspaces_keys_per_month_mv_v1 to read from v2
  • Both migrations include proper rollback scripts
  • The aggregation logic and data filtering remain consistent across versions

The data consistency during migration is maintained as:

  1. The v2 table was created first (025)
  2. Views are updated sequentially (034, 035)
  3. All views maintain the same aggregation logic and essential fields
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for other billing-related views that might need similar updates
rg -l "FROM verifications.key_verifications_per_month_v1" --type sql

Length of output: 555


Script:

#!/bin/bash
# Let's check the content of these files to understand if they're already updated
echo "=== Checking 013_create_billing_billable_verifications_per_month_mv_v1.sql ==="
cat internal/clickhouse/schema/013_create_billing_billable_verifications_per_month_mv_v1.sql

echo -e "\n=== Checking 035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql ==="
cat internal/clickhouse/schema/035_business_update_active_workspaces_keys_per_month_mv_v1_read_from_verifications.key_verifications_per_month_v2.sql

echo -e "\n=== Checking 025_create_billable_verifications_v2.sql ==="
cat internal/clickhouse/schema/025_create_billable_verifications_v2.sql

echo -e "\n=== Checking 023_create_business_active_workspaces_per_month_mv_v1.sql ==="
cat internal/clickhouse/schema/023_create_business_active_workspaces_per_month_mv_v1.sql

Length of output: 3265

.github/workflows/job_deploy_api_production.yaml (1)

4-7: LGTM! Good security practice

Explicitly marking CLOUDFLARE_API_TOKEN as required is a good security practice.

internal/clickhouse/src/latest_verifications.ts (1)

17-17: LGTM! Verify schema compatibility.

The addition of the tags field to both the SQL query and schema validation looks correct. The implementation properly handles tags as an array of strings.

Let's verify the schema compatibility with the ClickHouse table definition:

Also applies to: 29-29

✅ Verification successful

Schema compatibility verified successfully

The tags column is correctly defined as Array(String) with a default empty array value in the ClickHouse table, which matches the schema implementation in the TypeScript code.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify tags column definition in ClickHouse schema
# Expected: Column 'tags' should be defined as Array(String)

# Search for tags column definition in migration files
fd -e sql . internal/clickhouse/schema -x grep -l "tags.*Array"

Length of output: 409


Script:

#!/bin/bash
# Let's check the actual column definition in the migration files
cat internal/clickhouse/schema/027_add_tags_to_verifications.raw_key_verifications_v1.sql

Length of output: 310

internal/clickhouse/src/verifications_billing.test.ts (1)

46-47: LGTM! Improved error handling for insertions.

Good addition of error checking for the insert operation.

internal/clickhouse/src/testutil.ts (1)

47-51: LGTM! Clean implementation of conditional cleanup.

The conditional container cleanup is well implemented and maintains backward compatibility.

internal/clickhouse/src/verifications.ts (3)

27-27: LGTM: Tags schema implementation with sorting

The tags implementation as a sorted string array is a good choice. Sorting the tags ensures consistent storage and comparison.

Also applies to: 40-40


61-70: Query updates look good with improved time range inclusivity

The query correctly includes tags in both SELECT and GROUP BY clauses. The change from < to <= for the end time makes the range more intuitive by including the end timestamp.


92-101: Changes mirror hourly query updates

internal/clickhouse/src/verification_tags.test.ts (1)

52-64: Well-structured test setup for materialized views

Good approach testing all view types with various tag combinations. The test cases provide good coverage of different tag scenarios.

.github/workflows/deploy.yaml (1)

175-187: LGTM: Well-structured ClickHouse migration jobs

The migration jobs are properly sequenced in the deployment pipeline with appropriate dependencies:

  • Preview migrations run after local tests
  • Production migrations run after preview tests and staging deployment
apps/docs/mint.json (1)

416-418: LGTM: Valid Mintlify configuration

The codeBlock configuration with auto mode is correctly added to the documentation configuration.

apps/api/src/routes/v1_keys_verifyKey.ts (1)

381-381: LGTM: Proper analytics integration

The tags are correctly passed to analytics with a safe fallback to an empty array when tags are not provided.

@@ -1,11 +1,11 @@
name: Deploy API Production
name: ClickHouse Migration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Workflow name appears inconsistent with its purpose

The workflow is named "ClickHouse Migration" but contains API deployment steps. This might cause confusion for other developers. Consider either:

  1. Keeping the original name "Deploy API Production" if this is primarily for API deployment
  2. Adding ClickHouse migration steps if this is intended for database migrations

@@ -0,0 +1,31 @@
name: ClickHouse Migration Preview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix workflow name inconsistency

The workflow name "ClickHouse Migration Preview" doesn't match its purpose, as it's using production configurations. This could lead to confusion.

-name: ClickHouse Migration Preview
+name: ClickHouse Migration Production
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: ClickHouse Migration Preview
name: ClickHouse Migration Production

Comment on lines +10 to +14
jobs:
deploy:
environment: Preview
runs-on: ubuntu-latest
steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add explicit permissions declaration

GitHub Actions follows the principle of least privilege. It's recommended to explicitly declare required permissions.

 jobs:
   deploy:
+    permissions:
+      contents: read
     environment: Preview
     runs-on: ubuntu-latest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
jobs:
deploy:
environment: Preview
runs-on: ubuntu-latest
steps:
jobs:
deploy:
permissions:
contents: read
environment: Preview
runs-on: ubuntu-latest
steps:

@@ -0,0 +1,31 @@
name: ClickHouse Migration Production
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix workflow name inconsistency

The workflow name "ClickHouse Migration Production" doesn't match its purpose, as it's using preview configurations.

-name: ClickHouse Migration Production
+name: ClickHouse Migration Preview
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
name: ClickHouse Migration Production
name: ClickHouse Migration Preview

Comment on lines +119 to +132
sum(count) as count,
tags
FROM verifications.key_verifications_per_month_v2
WHERE
workspace_id = {workspaceId: String}
AND key_space_id = {keySpaceId: String}
AND time >= fromUnixTimestamp64Milli({start: Int64})
AND time < fromUnixTimestamp64Milli({end: Int64})
AND time <= fromUnixTimestamp64Milli({end: Int64})
${args.keyId ? "AND key_id = {keyId: String}" : ""}
GROUP BY time, outcome
GROUP BY time, outcome, tags
ORDER BY time ASC
WITH FILL
FROM toStartOfMonth(fromUnixTimestamp64Milli({start: Int64}))
TO toStartOfMonth(fromUnixTimestamp64Milli({end: Int64}))
FROM toStartOfDay(fromUnixTimestamp64Milli({start: Int64}))
TO toStartOfDay(fromUnixTimestamp64Milli({end: Int64}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Time granularity mismatch detected between view and query

The materialized view key_verifications_per_month_v2 aggregates data using toStartOfMonth, but the query's FILL clause uses toStartOfDay. This creates an inconsistency:

  • The view stores monthly data: toStartOfMonth(fromUnixTimestamp64Milli(time)) AS time
  • The query fills daily gaps: FROM toStartOfDay(...) TO toStartOfDay(...)

This will lead to incorrect results as you cannot meaningfully fill daily gaps in monthly aggregated data.

🔗 Analysis chain

Verify time granularity change in monthly query

While the tags integration looks good, the change from toStartOfMonth to toStartOfDay in the FILL clause seems inconsistent with the monthly aggregation purpose.


Let me gather more context about the specific query and its usage.


Let me check the materialized view definition to understand how the data is aggregated at the source.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if this change was intentional by looking for related changes or discussions
rg -A 5 "toStartOfMonth|toStartOfDay.*MONTH" 

Length of output: 4075


Script:

#!/bin/bash
# Check the materialized view definition and the query usage
ast-grep --pattern 'key_verifications_per_month_v2'

# Also check for any GROUP BY time patterns
rg -B2 -A2 "GROUP BY.*time" internal/clickhouse/src/verifications.ts

Length of output: 668


Script:

#!/bin/bash
# Check the materialized view definition
cat internal/clickhouse/schema/033_create_verifications.key_verifications_per_month_mv_v2.sql

Length of output: 607

@ogzhanolguncu
Copy link
Contributor

Other than that 10 request part everything looks good to me. Reason for 10 request part was people might assume they are limited to 10 tag variants etc... Maybe I'm overthinking.

Copy link
Member

@perkinsjr perkinsjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment from oz in the docs

@chronark
Copy link
Collaborator Author

chronark commented Dec 5, 2024

agreed, fixed it, let's merge

@chronark chronark requested a review from perkinsjr December 5, 2024 12:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)
apps/docs/apis/features/analytics.mdx (4)

30-30: Add missing comma after "For example"

Add a comma for better readability.

-For example you might want to add the path of your API as a tag:
+For example, you might want to add the path of your API as a tag:
🧰 Tools
🪛 LanguageTool

[typographical] ~30-~30: After the expression ‘for example’ a comma is usually used.
Context: ...gate or filter data when querying. For example you might want to add the path of your ...

(COMMA_FOR_EXAMPLE)


38-38: Fix typo in "beneficial"

-That being said, having some structure for your tags could be benefitial for you.
+That being said, having some structure for your tags could be beneficial for you.

46-46: Fix grammar in example introduction

-Here's an example tags for a fictional blog API:
+Here are example tags for a fictional blog API:
🧰 Tools
🪛 LanguageTool

[grammar] ~46-~46: The plural noun “tags” cannot be used with the article “an”. Did you mean “an example tag”?
Context: ...you to record these in the tags. Here's an example tags for a fictional blog API: ```json [ ...

(A_NNS)


81-81: Add comma for better readability

-We're working on new query capabilities including filtering and aggregating by tags.
+We're working on new query capabilities, including filtering and aggregating by tags.
🧰 Tools
🪛 LanguageTool

[uncategorized] ~81-~81: Possible missing comma found.
Context: ...s possible. We're working on new query capabilities including filtering and aggregating by ...

(AI_HYDRA_LEO_MISSING_COMMA)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7e6c34d and 93e5a49.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (2)
  • apps/docs/apis/features/analytics.mdx (1 hunks)
  • internal/clickhouse/vitest.config.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/clickhouse/vitest.config.ts
🧰 Additional context used
🪛 LanguageTool
apps/docs/apis/features/analytics.mdx

[typographical] ~30-~30: After the expression ‘for example’ a comma is usually used.
Context: ...gate or filter data when querying. For example you might want to add the path of your ...

(COMMA_FOR_EXAMPLE)


[grammar] ~46-~46: The plural noun “tags” cannot be used with the article “an”. Did you mean “an example tag”?
Context: ...you to record these in the tags. Here's an example tags for a fictional blog API: ```json [ ...

(A_NNS)


[uncategorized] ~81-~81: Possible missing comma found.
Context: ...s possible. We're working on new query capabilities including filtering and aggregating by ...

(AI_HYDRA_LEO_MISSING_COMMA)

🔇 Additional comments (2)
apps/docs/apis/features/analytics.mdx (2)

35-35: Clarify the request limit

Based on previous discussions, let's make the limitation clearer to avoid confusion.

-You can specify up to 10 tags per request.
+You can specify up to 10 tags per verification request.

62-74: LGTM! Clear and well-structured example

The curl example effectively demonstrates how to include tags in the request, with good syntax highlighting and relevant line highlighting.

Comment on lines +49 to +56
```json
[
"path=/v1/posts.createPost",
"postId=post_1asofijqknslkfqWF",
"region=us-east-1",
"apiVersion=f8ad21bd", // a git sha of your deployment or semver
]
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove comment from JSON example

JSON doesn't support comments. Consider moving the explanation to the text above the example.

[
  "path=/v1/posts.createPost",
  "postId=post_1asofijqknslkfqWF",
  "region=us-east-1",
-  "apiVersion=f8ad21bd", // a git sha of your deployment or semver
+  "apiVersion=f8ad21bd"
]

Add the explanation above the JSON example:

Here are example tags for a fictional blog API (where apiVersion represents a git sha or semver of your deployment):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants