Skip to content

Feat/clickhouse mv support#69

Merged
Vunovati merged 3 commits into
mainfrom
feat/clickhouse-mv-support
Feb 27, 2026
Merged

Feat/clickhouse mv support#69
Vunovati merged 3 commits into
mainfrom
feat/clickhouse-mv-support

Conversation

@Vunovati
Copy link
Copy Markdown
Collaborator

@Vunovati Vunovati commented Feb 27, 2026

Summary by CodeRabbit

  • New Features

    • Materialized view–based metric discovery to speed up metric listing, with automatic fallback to full table scans when views are unavailable.
    • New API to generate the required materialized-view schema for deployment.
  • Tests

    • Comprehensive tests covering MV fast-paths, fallbacks, schema validation, multi-tenant isolation, and MV population/backfill scenarios.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Feb 27, 2026

📝 Walkthrough

Walkthrough

Adds materialized-view (MV) based metric discovery: new MV DDL generator, MV detection and MV-backed discovery queries in the datasource, fallback to existing full-table-scan queries, and extensive tests covering MV paths, validation, and multi-tenant isolation.

Changes

Cohort / File(s) Summary
MV Schema Module
packages/clickhouse-datasource/src/discover-mv-schema.ts
New export getDiscoverMVSchema(database: string) that validates DB identifier and returns DDL arrays for discovery target tables and materialized views.
Query builders & constants
packages/clickhouse-datasource/src/query-metrics.ts
Adds METRIC_TABLES, TABLE_MAP derived from it, DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE, and new builders buildDetectDiscoverMVQuery() and buildDiscoverMetricsFromMV() for MV detection and MV-backed discovery queries.
Datasource integration & public API
packages/clickhouse-datasource/src/datasource.ts, packages/clickhouse-datasource/src/index.ts
Datasource now checks for MV tables via hasDiscoverMVs() and attempts MV fast-path (buildDiscoverMetricsFromMV) with parallel execution; falls back to legacy buildDiscoverMetricsQueries() on failure. Public index now exports getDiscoverMVSchema.
Tests
packages/clickhouse-datasource/src/datasource.test.ts
Adds extensive tests for MV fast-path, MV population/backfill, MV fallback cases, schema validation (SQL injection, empty/leading-digit names), attribute handling, and MV multi-tenant isolation; ensures test isolation by dropping MV discovery tables.
Release metadata
.changeset/vast-cats-own.md
New changeset declaring a patch release and noting MV-based discovery as a feature.

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Datasource as Datasource
    participant MVDetector as MV Detector
    participant QueryBuilder as Query Builder
    participant ClickHouse as ClickHouse DB

    Client->>Datasource: discoverMetrics(request)
    Datasource->>MVDetector: hasDiscoverMVs()
    MVDetector->>ClickHouse: buildDetectDiscoverMVQuery()
    ClickHouse-->>MVDetector: detection rows (names/attrs tables exist?)
    MVDetector-->>Datasource: detection result

    alt MV Tables Found
        Datasource->>QueryBuilder: buildDiscoverMetricsFromMV()
        QueryBuilder-->>Datasource: namesQuery + attributesQuery
        Datasource->>ClickHouse: execute namesQuery & attributesQuery (parallel)
        ClickHouse-->>Datasource: names + attributes (MV fast path)
        Datasource-->>Client: discovered metrics (from MV)
    else MV Tables Not Found or MV query fails
        Datasource->>QueryBuilder: buildDiscoverMetricsQueries()
        QueryBuilder-->>Datasource: legacy namesQuery + attributesQuery
        Datasource->>ClickHouse: execute legacy queries
        ClickHouse-->>Datasource: names + attributes (full-scan)
        Datasource-->>Client: discovered metrics (fallback)
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Feat/clickhouse read datasource #55 — Modifies ClickHouse datasource discovery flow and query builders; likely related to adding MV-based discovery on top of the same components.

Poem

🐰 Hop-hop, I scoped the MV scene,

DDL sprouted, tidy and clean,
Fast-path zips when tables are found,
Fallback hums when they aren't around,
A rabbit cheers discovery keen!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Feat/clickhouse mv support' is partially related to the changeset—it references MV (materialized view) support, which is a key aspect of the changes, but it's overly broad and lacks specificity about the main change (adding MV-based discovery optimization for metrics). Consider a more specific title such as 'Add materialized view support for metrics discovery optimization' to better convey the primary change and improve clarity when scanning commit history.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/clickhouse-mv-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/clickhouse-datasource/src/datasource.ts (1)

238-256: ⚠️ Potential issue | 🟠 Major

Add error handling to degrade gracefully when MV detection or queries fail.

The current code bypasses the fallback to buildDiscoverMetricsQueries() if hasDiscoverMVs() or the subsequent queries throw. Any error (network, auth, timeout) will reject discoverMetrics entirely, despite a full-scan fallback being available. Wrap MV detection and queries in try-catch to attempt the fallback on failure.

💡 Suggested hardening (fallback-on-error)
-    // Detect MV tables and choose fast or fallback path
-    const useMV = await this.hasDiscoverMVs({ username, password, database });
-    const { namesQuery, attributesQuery } = useMV
-      ? buildDiscoverMetricsFromMV()
-      : buildDiscoverMetricsQueries();
-
-    const [nameRows, attrRows] = await Promise.all([
-      this.client
-        .query({ query: namesQuery, format: "JSONEachRow", auth, http_headers })
-        .then((rs) => streamParse(rs, chDiscoverNameRowSchema)),
-      this.client
-        .query({
-          query: attributesQuery,
-          format: "JSONEachRow",
-          auth,
-          http_headers,
-        })
-        .then((rs) => streamParse(rs, chDiscoverAttrRowSchema)),
-    ]);
+    const runDiscoverQueries = async (queries: {
+      namesQuery: string;
+      attributesQuery: string;
+    }) =>
+      Promise.all([
+        this.client
+          .query({
+            query: queries.namesQuery,
+            format: "JSONEachRow",
+            auth,
+            http_headers,
+          })
+          .then((rs) => streamParse(rs, chDiscoverNameRowSchema)),
+        this.client
+          .query({
+            query: queries.attributesQuery,
+            format: "JSONEachRow",
+            auth,
+            http_headers,
+          })
+          .then((rs) => streamParse(rs, chDiscoverAttrRowSchema)),
+      ]);
+
+    let useMV = false;
+    try {
+      useMV = await this.hasDiscoverMVs({ username, password, database });
+    } catch {
+      useMV = false;
+    }
+
+    let nameRows: z.output<typeof chDiscoverNameRowSchema>[];
+    let attrRows: z.output<typeof chDiscoverAttrRowSchema>[];
+    try {
+      [nameRows, attrRows] = await runDiscoverQueries(
+        useMV ? buildDiscoverMetricsFromMV() : buildDiscoverMetricsQueries()
+      );
+    } catch (error) {
+      if (!useMV) throw error;
+      [nameRows, attrRows] = await runDiscoverQueries(
+        buildDiscoverMetricsQueries()
+      );
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/datasource.ts` around lines 238 - 256,
Wrap the MV detection and MV-query execution in a try-catch inside
discoverMetrics so failures degrade to the full-scan path: call
hasDiscoverMVs(...) and, only if it succeeds and returns true, run the MV
queries built by buildDiscoverMetricsFromMV() (executed via
this.client.query(...) and streamParse(...)); if any of hasDiscoverMVs or the MV
queries throw, catch the error, log it, set useMV=false and fall back to
buildDiscoverMetricsQueries() to build namesQuery/attributesQuery and run those
queries instead—ensure the rest of the function uses the resolved
nameRows/attrRows from the fallback path when MV path fails.
🧹 Nitpick comments (2)
packages/clickhouse-datasource/src/datasource.test.ts (1)

11-11: Consider reusing shared discover table constants in tests.

The same discover table names are hardcoded in several places. Importing DISCOVER_NAMES_TABLE / DISCOVER_ATTRS_TABLE would reduce drift risk if names change.

Also applies to: 1400-1404, 1526-1530, 1544-1546

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/datasource.test.ts` at line 11, Tests
hardcode the discover table names; import and reuse the shared constants
DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE instead of string literals to
avoid drift. Edit the test file to add these imports (e.g. import {
getDiscoverMVSchema, DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from
"./discover-mv-schema.js") and replace every hardcoded table-name string
(including the occurrences around the referenced ranges) with the corresponding
constant (DISCOVER_NAMES_TABLE or DISCOVER_ATTRS_TABLE) wherever table names are
asserted or used.
packages/clickhouse-datasource/src/discover-mv-schema.ts (1)

21-27: Avoid maintaining metric table inventory in two places.

Line 21 defines METRIC_TABLES, but the same mapping already exists in packages/clickhouse-datasource/src/query-metrics.ts. If one side changes, MV DDL generation can silently fall out of sync. Consider extracting a shared source-of-truth constant/module consumed by both files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/discover-mv-schema.ts` around lines 21 -
27, Extract the metric table inventory into a single shared exported constant
and consume it from both places: move the existing METRIC_TABLES array into a
new module (export const METRIC_TABLES = [...] as const), remove the local
METRIC_TABLES declaration from discover-mv-schema.ts, and update both
discover-mv-schema and the code that previously duplicated the list in
query-metrics.ts to import and use the shared METRIC_TABLES; keep the exact
structure and the "as const" typing so downstream code (e.g., MV DDL generation
and any functions referencing METRIC_TABLES) continues to type-check and behave
the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 238-256: Wrap the MV detection and MV-query execution in a
try-catch inside discoverMetrics so failures degrade to the full-scan path: call
hasDiscoverMVs(...) and, only if it succeeds and returns true, run the MV
queries built by buildDiscoverMetricsFromMV() (executed via
this.client.query(...) and streamParse(...)); if any of hasDiscoverMVs or the MV
queries throw, catch the error, log it, set useMV=false and fall back to
buildDiscoverMetricsQueries() to build namesQuery/attributesQuery and run those
queries instead—ensure the rest of the function uses the resolved
nameRows/attrRows from the fallback path when MV path fails.

---

Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.test.ts`:
- Line 11: Tests hardcode the discover table names; import and reuse the shared
constants DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE instead of string
literals to avoid drift. Edit the test file to add these imports (e.g. import {
getDiscoverMVSchema, DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from
"./discover-mv-schema.js") and replace every hardcoded table-name string
(including the occurrences around the referenced ranges) with the corresponding
constant (DISCOVER_NAMES_TABLE or DISCOVER_ATTRS_TABLE) wherever table names are
asserted or used.

In `@packages/clickhouse-datasource/src/discover-mv-schema.ts`:
- Around line 21-27: Extract the metric table inventory into a single shared
exported constant and consume it from both places: move the existing
METRIC_TABLES array into a new module (export const METRIC_TABLES = [...] as
const), remove the local METRIC_TABLES declaration from discover-mv-schema.ts,
and update both discover-mv-schema and the code that previously duplicated the
list in query-metrics.ts to import and use the shared METRIC_TABLES; keep the
exact structure and the "as const" typing so downstream code (e.g., MV DDL
generation and any functions referencing METRIC_TABLES) continues to type-check
and behave the same.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c765de and e6a66e3.

📒 Files selected for processing (5)
  • packages/clickhouse-datasource/src/datasource.test.ts
  • packages/clickhouse-datasource/src/datasource.ts
  • packages/clickhouse-datasource/src/discover-mv-schema.ts
  • packages/clickhouse-datasource/src/index.ts
  • packages/clickhouse-datasource/src/query-metrics.ts

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/clickhouse-datasource/src/datasource.test.ts (1)

12-13: Avoid duplicating metric table/type mappings in tests.

This local metricTypes list can drift from production definitions. Reuse METRIC_TABLES from query-metrics.ts to keep test setup aligned automatically.

♻️ Suggested refactor
-import { DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from "./query-metrics.js";
+import {
+  DISCOVER_NAMES_TABLE,
+  DISCOVER_ATTRS_TABLE,
+  METRIC_TABLES,
+} from "./query-metrics.js";
...
-      const metricTypes = [
-        { type: "Gauge", table: "otel_metrics_gauge" },
-        { type: "Sum", table: "otel_metrics_sum" },
-        { type: "Histogram", table: "otel_metrics_histogram" },
-        {
-          type: "ExponentialHistogram",
-          table: "otel_metrics_exponential_histogram",
-        },
-        { type: "Summary", table: "otel_metrics_summary" },
-      ];
-      for (const { type, table } of metricTypes) {
+      for (const { type, table } of METRIC_TABLES) {

Also applies to: 1579-1589

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/datasource.test.ts` around lines 12 - 13,
Replace the duplicated local metricTypes mapping in the test with the canonical
mapping exported from query-metrics; import METRIC_TABLES (or the appropriate
exported constant) from "./query-metrics.js" and use that instead of the local
metricTypes variable (also update the occurrences around lines where metricTypes
is used, including the block at 1579-1589) so the test relies on the production
METRIC_TABLES definition and cannot drift out of sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 207-226: hasDiscoverMVs currently returns true if the two target
tables exist, which can false-positive during setup; update hasDiscoverMVs (and
the analogous check at the other occurrence) to also verify the tables contain
data before enabling the MV fast-path: after detecting DISCOVER_NAMES_TABLE and
DISCOVER_ATTRS_TABLE, run a lightweight count (e.g., SELECT count() or SELECT
any() LIMIT 1) or equivalent minimal row-existence check against both
DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE and only return true when both
exist and have at least one row; modify the functions referencing hasDiscoverMVs
accordingly so the fast-path is gated by existence+non-empty checks instead of
existence-only.

---

Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.test.ts`:
- Around line 12-13: Replace the duplicated local metricTypes mapping in the
test with the canonical mapping exported from query-metrics; import
METRIC_TABLES (or the appropriate exported constant) from "./query-metrics.js"
and use that instead of the local metricTypes variable (also update the
occurrences around lines where metricTypes is used, including the block at
1579-1589) so the test relies on the production METRIC_TABLES definition and
cannot drift out of sync.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6a66e3 and 625afaf.

📒 Files selected for processing (5)
  • .changeset/vast-cats-own.md
  • packages/clickhouse-datasource/src/datasource.test.ts
  • packages/clickhouse-datasource/src/datasource.ts
  • packages/clickhouse-datasource/src/discover-mv-schema.ts
  • packages/clickhouse-datasource/src/query-metrics.ts
✅ Files skipped from review due to trivial changes (1)
  • .changeset/vast-cats-own.md

Comment on lines +207 to +226
private async hasDiscoverMVs(auth: {
username: string;
password: string;
database: string;
}): Promise<boolean> {
const rs = await this.client.query({
query: buildDetectDiscoverMVQuery(),
format: "JSONEachRow",
auth: { username: auth.username, password: auth.password },
http_headers: { "X-ClickHouse-Database": auth.database },
});
const found = new Set<string>();
for await (const batch of rs.stream()) {
for (const row of batch) {
const json = row.json() as { name: string };
found.add(json.name);
}
}
return found.has(DISCOVER_NAMES_TABLE) && found.has(DISCOVER_ATTRS_TABLE);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

MV fast-path gating can return incomplete discovery during setup.

hasDiscoverMVs only checks target tables. If targets are created before MVs/backfill (the documented setup order), discoverMetrics can take MV path too early and return incomplete/empty results instead of degrading to full-scan.

💡 Suggested guard to prevent false-positive MV fast-path
     if (useMV) {
       try {
         const { namesQuery, attributesQuery } = buildDiscoverMetricsFromMV();
         [nameRows, attrRows] = await Promise.all([
           this.client
             .query({
               query: namesQuery,
               format: "JSONEachRow",
               auth,
               http_headers,
             })
             .then((rs) => streamParse(rs, chDiscoverNameRowSchema)),
           this.client
             .query({
               query: attributesQuery,
               format: "JSONEachRow",
               auth,
               http_headers,
             })
             .then((rs) => streamParse(rs, chDiscoverAttrRowSchema)),
         ]);
+        // Guard rollout/setup window: target tables may exist before MV pipeline is ready.
+        if (nameRows.length === 0 && attrRows.length === 0) {
+          useMV = false;
+        }
       } catch {
         // MV query failed — fall back to full-scan
         useMV = false;
       }
     }

Also applies to: 249-276

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/datasource.ts` around lines 207 - 226,
hasDiscoverMVs currently returns true if the two target tables exist, which can
false-positive during setup; update hasDiscoverMVs (and the analogous check at
the other occurrence) to also verify the tables contain data before enabling the
MV fast-path: after detecting DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE, run
a lightweight count (e.g., SELECT count() or SELECT any() LIMIT 1) or equivalent
minimal row-existence check against both DISCOVER_NAMES_TABLE and
DISCOVER_ATTRS_TABLE and only return true when both exist and have at least one
row; modify the functions referencing hasDiscoverMVs accordingly so the
fast-path is gated by existence+non-empty checks instead of existence-only.

@Vunovati Vunovati merged commit ed34f9e into main Feb 27, 2026
2 checks passed
@coderabbitai coderabbitai Bot mentioned this pull request Feb 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant