Feat/clickhouse mv support#69
Conversation
📝 WalkthroughWalkthroughAdds materialized-view (MV) based metric discovery: new MV DDL generator, MV detection and MV-backed discovery queries in the datasource, fallback to existing full-table-scan queries, and extensive tests covering MV paths, validation, and multi-tenant isolation. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Datasource as Datasource
participant MVDetector as MV Detector
participant QueryBuilder as Query Builder
participant ClickHouse as ClickHouse DB
Client->>Datasource: discoverMetrics(request)
Datasource->>MVDetector: hasDiscoverMVs()
MVDetector->>ClickHouse: buildDetectDiscoverMVQuery()
ClickHouse-->>MVDetector: detection rows (names/attrs tables exist?)
MVDetector-->>Datasource: detection result
alt MV Tables Found
Datasource->>QueryBuilder: buildDiscoverMetricsFromMV()
QueryBuilder-->>Datasource: namesQuery + attributesQuery
Datasource->>ClickHouse: execute namesQuery & attributesQuery (parallel)
ClickHouse-->>Datasource: names + attributes (MV fast path)
Datasource-->>Client: discovered metrics (from MV)
else MV Tables Not Found or MV query fails
Datasource->>QueryBuilder: buildDiscoverMetricsQueries()
QueryBuilder-->>Datasource: legacy namesQuery + attributesQuery
Datasource->>ClickHouse: execute legacy queries
ClickHouse-->>Datasource: names + attributes (full-scan)
Datasource-->>Client: discovered metrics (fallback)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/clickhouse-datasource/src/datasource.ts (1)
238-256:⚠️ Potential issue | 🟠 MajorAdd error handling to degrade gracefully when MV detection or queries fail.
The current code bypasses the fallback to
buildDiscoverMetricsQueries()ifhasDiscoverMVs()or the subsequent queries throw. Any error (network, auth, timeout) will rejectdiscoverMetricsentirely, despite a full-scan fallback being available. Wrap MV detection and queries in try-catch to attempt the fallback on failure.💡 Suggested hardening (fallback-on-error)
- // Detect MV tables and choose fast or fallback path - const useMV = await this.hasDiscoverMVs({ username, password, database }); - const { namesQuery, attributesQuery } = useMV - ? buildDiscoverMetricsFromMV() - : buildDiscoverMetricsQueries(); - - const [nameRows, attrRows] = await Promise.all([ - this.client - .query({ query: namesQuery, format: "JSONEachRow", auth, http_headers }) - .then((rs) => streamParse(rs, chDiscoverNameRowSchema)), - this.client - .query({ - query: attributesQuery, - format: "JSONEachRow", - auth, - http_headers, - }) - .then((rs) => streamParse(rs, chDiscoverAttrRowSchema)), - ]); + const runDiscoverQueries = async (queries: { + namesQuery: string; + attributesQuery: string; + }) => + Promise.all([ + this.client + .query({ + query: queries.namesQuery, + format: "JSONEachRow", + auth, + http_headers, + }) + .then((rs) => streamParse(rs, chDiscoverNameRowSchema)), + this.client + .query({ + query: queries.attributesQuery, + format: "JSONEachRow", + auth, + http_headers, + }) + .then((rs) => streamParse(rs, chDiscoverAttrRowSchema)), + ]); + + let useMV = false; + try { + useMV = await this.hasDiscoverMVs({ username, password, database }); + } catch { + useMV = false; + } + + let nameRows: z.output<typeof chDiscoverNameRowSchema>[]; + let attrRows: z.output<typeof chDiscoverAttrRowSchema>[]; + try { + [nameRows, attrRows] = await runDiscoverQueries( + useMV ? buildDiscoverMetricsFromMV() : buildDiscoverMetricsQueries() + ); + } catch (error) { + if (!useMV) throw error; + [nameRows, attrRows] = await runDiscoverQueries( + buildDiscoverMetricsQueries() + ); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/datasource.ts` around lines 238 - 256, Wrap the MV detection and MV-query execution in a try-catch inside discoverMetrics so failures degrade to the full-scan path: call hasDiscoverMVs(...) and, only if it succeeds and returns true, run the MV queries built by buildDiscoverMetricsFromMV() (executed via this.client.query(...) and streamParse(...)); if any of hasDiscoverMVs or the MV queries throw, catch the error, log it, set useMV=false and fall back to buildDiscoverMetricsQueries() to build namesQuery/attributesQuery and run those queries instead—ensure the rest of the function uses the resolved nameRows/attrRows from the fallback path when MV path fails.
🧹 Nitpick comments (2)
packages/clickhouse-datasource/src/datasource.test.ts (1)
11-11: Consider reusing shared discover table constants in tests.The same discover table names are hardcoded in several places. Importing
DISCOVER_NAMES_TABLE/DISCOVER_ATTRS_TABLEwould reduce drift risk if names change.Also applies to: 1400-1404, 1526-1530, 1544-1546
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/datasource.test.ts` at line 11, Tests hardcode the discover table names; import and reuse the shared constants DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE instead of string literals to avoid drift. Edit the test file to add these imports (e.g. import { getDiscoverMVSchema, DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from "./discover-mv-schema.js") and replace every hardcoded table-name string (including the occurrences around the referenced ranges) with the corresponding constant (DISCOVER_NAMES_TABLE or DISCOVER_ATTRS_TABLE) wherever table names are asserted or used.packages/clickhouse-datasource/src/discover-mv-schema.ts (1)
21-27: Avoid maintaining metric table inventory in two places.Line 21 defines
METRIC_TABLES, but the same mapping already exists inpackages/clickhouse-datasource/src/query-metrics.ts. If one side changes, MV DDL generation can silently fall out of sync. Consider extracting a shared source-of-truth constant/module consumed by both files.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/discover-mv-schema.ts` around lines 21 - 27, Extract the metric table inventory into a single shared exported constant and consume it from both places: move the existing METRIC_TABLES array into a new module (export const METRIC_TABLES = [...] as const), remove the local METRIC_TABLES declaration from discover-mv-schema.ts, and update both discover-mv-schema and the code that previously duplicated the list in query-metrics.ts to import and use the shared METRIC_TABLES; keep the exact structure and the "as const" typing so downstream code (e.g., MV DDL generation and any functions referencing METRIC_TABLES) continues to type-check and behave the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 238-256: Wrap the MV detection and MV-query execution in a
try-catch inside discoverMetrics so failures degrade to the full-scan path: call
hasDiscoverMVs(...) and, only if it succeeds and returns true, run the MV
queries built by buildDiscoverMetricsFromMV() (executed via
this.client.query(...) and streamParse(...)); if any of hasDiscoverMVs or the MV
queries throw, catch the error, log it, set useMV=false and fall back to
buildDiscoverMetricsQueries() to build namesQuery/attributesQuery and run those
queries instead—ensure the rest of the function uses the resolved
nameRows/attrRows from the fallback path when MV path fails.
---
Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.test.ts`:
- Line 11: Tests hardcode the discover table names; import and reuse the shared
constants DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE instead of string
literals to avoid drift. Edit the test file to add these imports (e.g. import {
getDiscoverMVSchema, DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from
"./discover-mv-schema.js") and replace every hardcoded table-name string
(including the occurrences around the referenced ranges) with the corresponding
constant (DISCOVER_NAMES_TABLE or DISCOVER_ATTRS_TABLE) wherever table names are
asserted or used.
In `@packages/clickhouse-datasource/src/discover-mv-schema.ts`:
- Around line 21-27: Extract the metric table inventory into a single shared
exported constant and consume it from both places: move the existing
METRIC_TABLES array into a new module (export const METRIC_TABLES = [...] as
const), remove the local METRIC_TABLES declaration from discover-mv-schema.ts,
and update both discover-mv-schema and the code that previously duplicated the
list in query-metrics.ts to import and use the shared METRIC_TABLES; keep the
exact structure and the "as const" typing so downstream code (e.g., MV DDL
generation and any functions referencing METRIC_TABLES) continues to type-check
and behave the same.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
packages/clickhouse-datasource/src/datasource.test.tspackages/clickhouse-datasource/src/datasource.tspackages/clickhouse-datasource/src/discover-mv-schema.tspackages/clickhouse-datasource/src/index.tspackages/clickhouse-datasource/src/query-metrics.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/clickhouse-datasource/src/datasource.test.ts (1)
12-13: Avoid duplicating metric table/type mappings in tests.This local
metricTypeslist can drift from production definitions. ReuseMETRIC_TABLESfromquery-metrics.tsto keep test setup aligned automatically.♻️ Suggested refactor
-import { DISCOVER_NAMES_TABLE, DISCOVER_ATTRS_TABLE } from "./query-metrics.js"; +import { + DISCOVER_NAMES_TABLE, + DISCOVER_ATTRS_TABLE, + METRIC_TABLES, +} from "./query-metrics.js"; ... - const metricTypes = [ - { type: "Gauge", table: "otel_metrics_gauge" }, - { type: "Sum", table: "otel_metrics_sum" }, - { type: "Histogram", table: "otel_metrics_histogram" }, - { - type: "ExponentialHistogram", - table: "otel_metrics_exponential_histogram", - }, - { type: "Summary", table: "otel_metrics_summary" }, - ]; - for (const { type, table } of metricTypes) { + for (const { type, table } of METRIC_TABLES) {Also applies to: 1579-1589
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/datasource.test.ts` around lines 12 - 13, Replace the duplicated local metricTypes mapping in the test with the canonical mapping exported from query-metrics; import METRIC_TABLES (or the appropriate exported constant) from "./query-metrics.js" and use that instead of the local metricTypes variable (also update the occurrences around lines where metricTypes is used, including the block at 1579-1589) so the test relies on the production METRIC_TABLES definition and cannot drift out of sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 207-226: hasDiscoverMVs currently returns true if the two target
tables exist, which can false-positive during setup; update hasDiscoverMVs (and
the analogous check at the other occurrence) to also verify the tables contain
data before enabling the MV fast-path: after detecting DISCOVER_NAMES_TABLE and
DISCOVER_ATTRS_TABLE, run a lightweight count (e.g., SELECT count() or SELECT
any() LIMIT 1) or equivalent minimal row-existence check against both
DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE and only return true when both
exist and have at least one row; modify the functions referencing hasDiscoverMVs
accordingly so the fast-path is gated by existence+non-empty checks instead of
existence-only.
---
Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.test.ts`:
- Around line 12-13: Replace the duplicated local metricTypes mapping in the
test with the canonical mapping exported from query-metrics; import
METRIC_TABLES (or the appropriate exported constant) from "./query-metrics.js"
and use that instead of the local metricTypes variable (also update the
occurrences around lines where metricTypes is used, including the block at
1579-1589) so the test relies on the production METRIC_TABLES definition and
cannot drift out of sync.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.changeset/vast-cats-own.mdpackages/clickhouse-datasource/src/datasource.test.tspackages/clickhouse-datasource/src/datasource.tspackages/clickhouse-datasource/src/discover-mv-schema.tspackages/clickhouse-datasource/src/query-metrics.ts
✅ Files skipped from review due to trivial changes (1)
- .changeset/vast-cats-own.md
| private async hasDiscoverMVs(auth: { | ||
| username: string; | ||
| password: string; | ||
| database: string; | ||
| }): Promise<boolean> { | ||
| const rs = await this.client.query({ | ||
| query: buildDetectDiscoverMVQuery(), | ||
| format: "JSONEachRow", | ||
| auth: { username: auth.username, password: auth.password }, | ||
| http_headers: { "X-ClickHouse-Database": auth.database }, | ||
| }); | ||
| const found = new Set<string>(); | ||
| for await (const batch of rs.stream()) { | ||
| for (const row of batch) { | ||
| const json = row.json() as { name: string }; | ||
| found.add(json.name); | ||
| } | ||
| } | ||
| return found.has(DISCOVER_NAMES_TABLE) && found.has(DISCOVER_ATTRS_TABLE); | ||
| } |
There was a problem hiding this comment.
MV fast-path gating can return incomplete discovery during setup.
hasDiscoverMVs only checks target tables. If targets are created before MVs/backfill (the documented setup order), discoverMetrics can take MV path too early and return incomplete/empty results instead of degrading to full-scan.
💡 Suggested guard to prevent false-positive MV fast-path
if (useMV) {
try {
const { namesQuery, attributesQuery } = buildDiscoverMetricsFromMV();
[nameRows, attrRows] = await Promise.all([
this.client
.query({
query: namesQuery,
format: "JSONEachRow",
auth,
http_headers,
})
.then((rs) => streamParse(rs, chDiscoverNameRowSchema)),
this.client
.query({
query: attributesQuery,
format: "JSONEachRow",
auth,
http_headers,
})
.then((rs) => streamParse(rs, chDiscoverAttrRowSchema)),
]);
+ // Guard rollout/setup window: target tables may exist before MV pipeline is ready.
+ if (nameRows.length === 0 && attrRows.length === 0) {
+ useMV = false;
+ }
} catch {
// MV query failed — fall back to full-scan
useMV = false;
}
}Also applies to: 249-276
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/clickhouse-datasource/src/datasource.ts` around lines 207 - 226,
hasDiscoverMVs currently returns true if the two target tables exist, which can
false-positive during setup; update hasDiscoverMVs (and the analogous check at
the other occurrence) to also verify the tables contain data before enabling the
MV fast-path: after detecting DISCOVER_NAMES_TABLE and DISCOVER_ATTRS_TABLE, run
a lightweight count (e.g., SELECT count() or SELECT any() LIMIT 1) or equivalent
minimal row-existence check against both DISCOVER_NAMES_TABLE and
DISCOVER_ATTRS_TABLE and only return true when both exist and have at least one
row; modify the functions referencing hasDiscoverMVs accordingly so the
fast-path is gated by existence+non-empty checks instead of existence-only.
Summary by CodeRabbit
New Features
Tests