Feat/metrics aggregation#105
Conversation
📝 WalkthroughWalkthroughAdds aggregated metric queries and ingestion telemetry across stack: schema/types, ClickHouse/SQLite datasource implementations, API/CLI/client/UI plumbing, collector-side ingestion metrics emission, and tests to validate aggregation and grouping behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant APIRoute as API Route\n/signals/metrics/search
participant Datasource as ReadMetricsDatasource\n(ClickHouse/SQLite)
participant Database
Client->>APIRoute: POST body (aggregate?, groupBy?)
APIRoute->>APIRoute: validate body
alt aggregate present
APIRoute->>Datasource: getAggregatedMetrics(params)
Datasource->>Database: buildAggregatedMetricsQuery / SQL aggregate + GROUP BY
Database-->>Datasource: aggregated rows
Datasource->>APIRoute: { data: AggregatedMetricRow[], nextCursor: null }
else no aggregate
APIRoute->>Datasource: getMetrics(params)
Datasource->>Database: buildMetricsQuery
Database-->>Datasource: metric rows (paged)
Datasource->>APIRoute: searchResponse (may include cursor)
end
APIRoute-->>Client: 200 JSON
sequenceDiagram
participant TelemetryClient
participant CollectorRoute as Collector Route\n(/v1/traces,etc.)
participant WriteDS as WriteMetricsDatasource
participant Ingestion as emitIngestionMetrics
TelemetryClient->>CollectorRoute: POST payload
CollectorRoute->>WriteDS: writeTraces/Metrics/Logs
WriteDS-->>CollectorRoute: success
alt ingestionMetricsDatasource provided
CollectorRoute->>Ingestion: emitIngestionMetrics(signal, contentLength) (async)
Ingestion->>WriteDS: writeMetrics(kopai.ingestion.bytes/requests)
end
CollectorRoute-->>TelemetryClient: 200 OK
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (6)
packages/collector/src/routes/ingestion-metrics.ts (1)
3-4: Consider exposing a reset function for test isolation.The module-level
lastEmitNsMap persists state across test runs. While tests currently use unique signal names to avoid interference, exposing aresetLastEmitNs()function (or usingvi.resetModules()in tests) would provide more robust test isolation.♻️ Optional: Add reset function for testing
/** Tracks the last emit time per signal for Delta startTimeUnixNano. */ const lastEmitNs = new Map<string, string>(); + +/** `@internal` Reset state for testing. */ +export function _resetLastEmitNs(): void { + lastEmitNs.clear(); +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/collector/src/routes/ingestion-metrics.ts` around lines 3 - 4, The module-level Map lastEmitNs persists across tests; add and export a resetLastEmitNs() function that clears lastEmitNs to allow test isolation. Implement a simple function named resetLastEmitNs that calls lastEmitNs.clear(), export it from the same module (alongside any existing exports like functions that read/write lastEmitNs), and update tests to call resetLastEmitNs() where needed or document its existence for test helpers. Ensure the function name is exactly resetLastEmitNs so callers and tests can reference it consistently.packages/collector/src/collector.test.ts (1)
1001-1003: Replace the fixed sleeps with a deterministic barrier.These assertions depend on the background write settling within 50/100ms, which is a common source of CI flakes and just adds idle time on fast runs. Have the mock resolve a promise and await that promise after
server.inject()instead.Also applies to: 1031-1032
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/collector/src/collector.test.ts` around lines 1001 - 1003, Replace the fixed setTimeout-based wait after server.inject() with a deterministic barrier: modify the mock for the background write (ingestionWriteSpy) so it returns a Promise you control (e.g., expose resolveIngestionWrite), have the test call server.inject(), then await that controlled promise instead of awaiting new Promise(r => setTimeout(r, 100)); ensure you wire the mock used by ingestionWriteSpy to call the resolver when the background work would complete and apply the same change to the other occurrence that currently uses a fixed sleep.packages/ui/src/pages/observability.tsx (1)
798-805: Avoid polling unbounded lifetime totals from the UI.Both cards issue all-time
sumqueries and refetch them every 10s because there is notimeUnixMin/timeUnixMaxbound. That makes every open metrics tab repeatedly aggregate the fullkopai.ingestion.*history. If these numbers need to be lifetime counters, I'd serve them from a pre-aggregated/cached source instead of recomputing them in the browser polling path.Also applies to: 825-833
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ui/src/pages/observability.tsx` around lines 798 - 805, The UI is polling unbounded all-time `sum` metrics every 10s (see dataSource with method "searchMetricsPage", params metricType/metricName/aggregate and refetchIntervalMs), which forces expensive full-history aggregation; fix by adding time bounds (timeUnixMin/timeUnixMax) to the params to limit the query window or replace the query with a pre-aggregated/cached endpoint that returns lifetime counters, and reduce or remove the refetchIntervalMs polling if using a cached source; apply the same change to the other card (lines referencing the same dataSource pattern around 825-833).packages/clickhouse-datasource/src/datasource.ts (1)
347-359: Validate the constructed aggregate rows before returning them.Every other ClickHouse read path parses streamed data through a schema, but this method hand-builds
{ groups, value }. If the query shape changes orjson.valueis missing,Number(json.value)can becomeNaNand still escape as a typed result. Parsing the constructed object withdenormalizedSignals.aggregatedMetricSchemawould keep this path consistent with the rest of the datasource.♻️ Keep this path aligned with the others
- data.push({ groups, value: Number(json.value) }); + data.push( + denormalizedSignals.aggregatedMetricSchema.parse({ + groups, + value: Number(json.value), + }) + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/datasource.ts` around lines 347 - 359, The constructed aggregate rows (objects pushed into data) are not validated and can contain NaN or malformed fields; after building each { groups, value } use denormalizedSignals.aggregatedMetricSchema.safeParse (or parse) to validate the object and only push the parsed.value (or the validated result) into data, otherwise handle the error (throw, log, or skip) so invalid rows don't escape as typed AggregatedMetricRow; update the loop around denormalizedSignals.AggregatedMetricRow creation to perform this schema validation for each row.packages/clickhouse-datasource/src/query-metrics.ts (1)
314-315: Add a deterministic tie-breaker for grouped top-N queries.When
groupByis present, Line 314 orders only byvalue, so tied groups can swap in and out of the limited result set between refreshes. Appending thegroup_*aliases as secondary sort keys would make this stable; the SQLite implementation should mirror the same rule too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/query-metrics.ts` around lines 314 - 315, When groupBy is used the ORDER BY clause "ORDER BY value DESC LIMIT {limit:UInt32}" must include deterministic tie-breakers: append the grouped column aliases (the generated group_* aliases) as secondary sort keys after value DESC (e.g. ORDER BY value DESC, group_1, group_2, ...), and apply the same change in the SQLite path that builds the grouped top-N query; locate the SQL assembly that emits the ORDER BY...LIMIT string in query-metrics.ts and update it to enumerate the group_* aliases when groupBy is present so results are stable across refreshes.packages/clickhouse-datasource/src/query-metrics.test.ts (1)
28-135: Cover the rejection paths too.The new builder throws on unsupported
metricTypevalues and unknownaggregatekeys, but this suite only exercises happy paths. A couple oftoThrowcases would lock down that validation contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/clickhouse-datasource/src/query-metrics.test.ts` around lines 28 - 135, Add negative tests to cover rejection paths for buildAggregatedMetricsQuery: call buildAggregatedMetricsQuery with an unsupported metricType (e.g., metricType: "UnknownType") and assert it throws, and call it with an invalid aggregate (e.g., aggregate: "unsupportedAgg") and assert it throws; use expect(() => buildAggregatedMetricsQuery({...})).toThrow(...) or toThrowError(...) and match the error text produced by the function so the validation contract for metricType and aggregate is locked down.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/cli/src/commands/metrics.ts`:
- Around line 162-165: toAggregateFn currently throws a generic Error for bad
--aggregate values; change it to throw a distinctive error (e.g., new Error with
name "InvalidArgumentError" or set error.code = 2) when the value is invalid,
and then update the generic catch block in the CLI command handler that
currently exits with code 1 to detect that distinctive error (error.name ===
"InvalidArgumentError" or error.code === 2) and call process.exit(2); this
ensures invalid-argument failures from toAggregateFn are exited with code 2
while other runtime/API errors still exit with code 1.
In `@packages/collector/src/index.ts`:
- Around line 36-39: The current hook sets request.ingestContentLength only from
request.headers and leaves it 0 when Content-Length is absent; instead
initialize request.ingestContentLength =
Number(request.headers["content-length"]) || 0 and, when that value is 0, attach
a short-lived listener to the incoming raw stream (request.raw) to accumulate
chunk lengths (on 'data' increment request.ingestContentLength by chunk.length)
and remove listeners on 'end'/'close'/'error' so routes like the handlers for
/v1/logs, /v1/metrics, and /v1/traces see the actual received bytes even for
chunked or header-stripped requests.
In `@packages/sdk/src/client.ts`:
- Around line 256-274: searchAggregatedMetrics currently parses the generic
metricsDataFilter but then assumes aggregated-only fields exist; add a guard
after parsing (validatedFilter) to ensure validatedFilter.aggregate is present
and has the expected shape (e.g., an array of { groups, value } entries) before
calling request, and if not present or invalid throw a clear error; reference
validatedFilter and searchAggregatedMetrics (and
aggregatedMetricsResponseSchema) so the check is colocated with the existing
parse and prevents hitting the non-aggregated server branch or failing
downstream parsing.
In `@packages/sqlite-datasource/src/db-datasource.ts`:
- Around line 756-762: The mapping of query rows in rows.map currently coerces
NULL aggregate results into 0 via Number(row.value) which hides empty-window
cases; in the rows.map callback (the block building groups and returning {
groups, value: Number(row.value) }) change the conversion to explicitly detect
row.value == null and preserve a null/undefined sentinel (or another explicit
marker) for denormalizedSignals.AggregatedMetricRow.value instead of converting
to 0, and only convert to Number(row.value) when row.value is non-null (for
sum=0 ensure you treat the numeric 0 as a valid number rather than as "no
data").
In `@packages/ui/src/components/observability/renderers/OtelMetricStat.tsx`:
- Around line 27-33: The current aggregated branch (isAggregatedRequest) wrongly
collapses grouped results by using response?.data[0]; instead, detect if the
aggregated response contains multiple rows (response?.data?.length > 1) and
fail-fast: either throw or return a clear error/UI indicating grouped aggregates
are unsupported by OtelMetricStat (e.g., "grouped aggregates not supported,
provide ungrouped aggregate"), or alternatively pass all AggregatedMetricRow
entries into MetricStat (use rows = response.data and adjust value handling).
Update the isAggregatedRequest branch to perform this check and choose one
behavior (reject grouped aggregates or accept them) so you don’t silently drop
buckets.
---
Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 347-359: The constructed aggregate rows (objects pushed into data)
are not validated and can contain NaN or malformed fields; after building each {
groups, value } use denormalizedSignals.aggregatedMetricSchema.safeParse (or
parse) to validate the object and only push the parsed.value (or the validated
result) into data, otherwise handle the error (throw, log, or skip) so invalid
rows don't escape as typed AggregatedMetricRow; update the loop around
denormalizedSignals.AggregatedMetricRow creation to perform this schema
validation for each row.
In `@packages/clickhouse-datasource/src/query-metrics.test.ts`:
- Around line 28-135: Add negative tests to cover rejection paths for
buildAggregatedMetricsQuery: call buildAggregatedMetricsQuery with an
unsupported metricType (e.g., metricType: "UnknownType") and assert it throws,
and call it with an invalid aggregate (e.g., aggregate: "unsupportedAgg") and
assert it throws; use expect(() =>
buildAggregatedMetricsQuery({...})).toThrow(...) or toThrowError(...) and match
the error text produced by the function so the validation contract for
metricType and aggregate is locked down.
In `@packages/clickhouse-datasource/src/query-metrics.ts`:
- Around line 314-315: When groupBy is used the ORDER BY clause "ORDER BY value
DESC LIMIT {limit:UInt32}" must include deterministic tie-breakers: append the
grouped column aliases (the generated group_* aliases) as secondary sort keys
after value DESC (e.g. ORDER BY value DESC, group_1, group_2, ...), and apply
the same change in the SQLite path that builds the grouped top-N query; locate
the SQL assembly that emits the ORDER BY...LIMIT string in query-metrics.ts and
update it to enumerate the group_* aliases when groupBy is present so results
are stable across refreshes.
In `@packages/collector/src/collector.test.ts`:
- Around line 1001-1003: Replace the fixed setTimeout-based wait after
server.inject() with a deterministic barrier: modify the mock for the background
write (ingestionWriteSpy) so it returns a Promise you control (e.g., expose
resolveIngestionWrite), have the test call server.inject(), then await that
controlled promise instead of awaiting new Promise(r => setTimeout(r, 100));
ensure you wire the mock used by ingestionWriteSpy to call the resolver when the
background work would complete and apply the same change to the other occurrence
that currently uses a fixed sleep.
In `@packages/collector/src/routes/ingestion-metrics.ts`:
- Around line 3-4: The module-level Map lastEmitNs persists across tests; add
and export a resetLastEmitNs() function that clears lastEmitNs to allow test
isolation. Implement a simple function named resetLastEmitNs that calls
lastEmitNs.clear(), export it from the same module (alongside any existing
exports like functions that read/write lastEmitNs), and update tests to call
resetLastEmitNs() where needed or document its existence for test helpers.
Ensure the function name is exactly resetLastEmitNs so callers and tests can
reference it consistently.
In `@packages/ui/src/pages/observability.tsx`:
- Around line 798-805: The UI is polling unbounded all-time `sum` metrics every
10s (see dataSource with method "searchMetricsPage", params
metricType/metricName/aggregate and refetchIntervalMs), which forces expensive
full-history aggregation; fix by adding time bounds (timeUnixMin/timeUnixMax) to
the params to limit the query window or replace the query with a
pre-aggregated/cached endpoint that returns lifetime counters, and reduce or
remove the refetchIntervalMs polling if using a cached source; apply the same
change to the other card (lines referencing the same dataSource pattern around
825-833).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dc9c13a6-18ec-416a-9241-50f28606d502
⛔ Files ignored due to path filters (1)
packages/ui/src/lib/__snapshots__/generate-prompt-instructions.test.ts.snapis excluded by!**/*.snap
📒 Files selected for processing (35)
packages/api/src/index.test.tspackages/api/src/routes/metrics.tspackages/api/src/signals.test.tspackages/app/src/collector/index.tspackages/cli/src/commands/metrics.tspackages/clickhouse-datasource/src/datasource.tspackages/clickhouse-datasource/src/query-metrics.test.tspackages/clickhouse-datasource/src/query-metrics.tspackages/collector/src/collector.test.tspackages/collector/src/index.tspackages/collector/src/routes/ingestion-metrics.test.tspackages/collector/src/routes/ingestion-metrics.tspackages/collector/src/routes/logs.tspackages/collector/src/routes/metrics.tspackages/collector/src/routes/traces.tspackages/core/src/data-filters-zod.tspackages/core/src/denormalized-signals-zod.tspackages/core/src/telemetry-datasource.tspackages/sdk/src/client.test.tspackages/sdk/src/client.tspackages/sdk/src/mocks/handlers.tspackages/sdk/src/types.tspackages/sqlite-datasource/src/datasource-read.test.tspackages/sqlite-datasource/src/db-datasource.tspackages/sqlite-datasource/src/optimized-datasource.tspackages/ui/src/components/observability/DynamicDashboard/DynamicDashboard.test.tsxpackages/ui/src/components/observability/MetricStat/index.tsxpackages/ui/src/components/observability/renderers/OtelMetricStat.tsxpackages/ui/src/hooks/use-kopai-data.test.tspackages/ui/src/hooks/use-kopai-data.tspackages/ui/src/hooks/use-live-logs.test.tspackages/ui/src/lib/renderer.test.tsxpackages/ui/src/pages/observability.test.tsxpackages/ui/src/pages/observability.tsxpackages/ui/src/providers/kopai-provider.tsx
| function toAggregateFn(value: string | undefined): AggregateFn | undefined { | ||
| if (value === undefined) return undefined; | ||
| if (isAggregateFn(value)) return value; | ||
| throw new Error(`Invalid aggregate function: ${value}`); |
There was a problem hiding this comment.
Exit code for invalid arguments should be 2, not 1.
When toAggregateFn throws for an invalid --aggregate value, the error is caught in the generic catch block (line 117-120) which exits with code 1. Per coding guidelines, invalid arguments should exit with code 2.
Proposed fix
+class InvalidArgumentError extends Error {
+ constructor(message: string) {
+ super(message);
+ this.name = "InvalidArgumentError";
+ }
+}
+
function toAggregateFn(value: string | undefined): AggregateFn | undefined {
if (value === undefined) return undefined;
if (isAggregateFn(value)) return value;
- throw new Error(`Invalid aggregate function: ${value}`);
+ throw new InvalidArgumentError(`Invalid aggregate function: ${value}`);
}Then update the catch block to differentiate:
} catch (err) {
outputError(err, format === "json");
- process.exit(1);
+ process.exit(err instanceof InvalidArgumentError ? 2 : 1);
}As per coding guidelines: "Use appropriate exit codes: 0 for success, 1 for API/runtime error, 2 for invalid arguments, 3 for config error".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/cli/src/commands/metrics.ts` around lines 162 - 165, toAggregateFn
currently throws a generic Error for bad --aggregate values; change it to throw
a distinctive error (e.g., new Error with name "InvalidArgumentError" or set
error.code = 2) when the value is invalid, and then update the generic catch
block in the CLI command handler that currently exits with code 1 to detect that
distinctive error (error.name === "InvalidArgumentError" or error.code === 2)
and call process.exit(2); this ensures invalid-argument failures from
toAggregateFn are exited with code 2 while other runtime/API errors still exit
with code 1.
| fastify.addHook("onRequest", async (request) => { | ||
| request.ingestContentLength = | ||
| Number(request.headers["content-length"]) || 0; | ||
| }); |
There was a problem hiding this comment.
ingestContentLength falls back to 0 for requests without Content-Length.
The new ingestion metrics derive bytes entirely from the header here. Any accepted chunked upload or intermediary that strips Content-Length will still reach /v1/logs, /v1/metrics, and /v1/traces, but each route will emit kopai.ingestion.bytes = 0 for that request. If this metric is meant to reflect actual bytes received, it needs a stream-length fallback instead of a header-only read.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/collector/src/index.ts` around lines 36 - 39, The current hook sets
request.ingestContentLength only from request.headers and leaves it 0 when
Content-Length is absent; instead initialize request.ingestContentLength =
Number(request.headers["content-length"]) || 0 and, when that value is 0, attach
a short-lived listener to the incoming raw stream (request.raw) to accumulate
chunk lengths (on 'data' increment request.ingestContentLength by chunk.length)
and remove listeners on 'end'/'close'/'error' so routes like the handlers for
/v1/logs, /v1/metrics, and /v1/traces see the actual received bytes even for
chunked or header-stripped requests.
| const data: denormalizedSignals.AggregatedMetricRow[] = rows.map( | ||
| (row) => { | ||
| const groups: Record<string, string> = {}; | ||
| for (const [i, groupKey] of groupByKeys.entries()) { | ||
| groups[groupKey] = String(row[`group_${String(i)}`] ?? ""); | ||
| } | ||
| return { groups, value: Number(row.value) }; |
There was a problem hiding this comment.
Don’t coerce empty aggregates into zero.
When the filter matches no rows and there is no groupBy, SQLite still returns a single aggregate row with value = NULL for avg/min/max (and sum). Line 762 turns that into 0, so an empty window becomes indistinguishable from real data. Handle row.value == null explicitly before converting it; if sum=0 is intentional, special-case that instead of relying on Number(null).
Possible fix
- const data: denormalizedSignals.AggregatedMetricRow[] = rows.map(
+ const data: denormalizedSignals.AggregatedMetricRow[] = rows.flatMap(
(row) => {
+ if (row.value == null) {
+ return [];
+ }
const groups: Record<string, string> = {};
for (const [i, groupKey] of groupByKeys.entries()) {
groups[groupKey] = String(row[`group_${String(i)}`] ?? "");
}
- return { groups, value: Number(row.value) };
+ return [{ groups, value: Number(row.value) }];
}
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const data: denormalizedSignals.AggregatedMetricRow[] = rows.map( | |
| (row) => { | |
| const groups: Record<string, string> = {}; | |
| for (const [i, groupKey] of groupByKeys.entries()) { | |
| groups[groupKey] = String(row[`group_${String(i)}`] ?? ""); | |
| } | |
| return { groups, value: Number(row.value) }; | |
| const data: denormalizedSignals.AggregatedMetricRow[] = rows.flatMap( | |
| (row) => { | |
| if (row.value == null) { | |
| return []; | |
| } | |
| const groups: Record<string, string> = {}; | |
| for (const [i, groupKey] of groupByKeys.entries()) { | |
| groups[groupKey] = String(row[`group_${String(i)}`] ?? ""); | |
| } | |
| return [{ groups, value: Number(row.value) }]; | |
| } | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/sqlite-datasource/src/db-datasource.ts` around lines 756 - 762, The
mapping of query rows in rows.map currently coerces NULL aggregate results into
0 via Number(row.value) which hides empty-window cases; in the rows.map callback
(the block building groups and returning { groups, value: Number(row.value) })
change the conversion to explicitly detect row.value == null and preserve a
null/undefined sentinel (or another explicit marker) for
denormalizedSignals.AggregatedMetricRow.value instead of converting to 0, and
only convert to Number(row.value) when row.value is non-null (for sum=0 ensure
you treat the numeric 0 as a valid number rather than as "no data").
There was a problem hiding this comment.
♻️ Duplicate comments (2)
packages/sdk/src/client.ts (1)
256-264:⚠️ Potential issue | 🟠 MajorAdd a runtime guard for
aggregatein aggregated search.Type-level enforcement helps TS callers, but at runtime (especially JS consumers) this can still call the raw branch and then fail response parsing. Add an explicit guard after parsing.
🐛 Suggested fix
async searchAggregatedMetrics( - filter: MetricsDataFilter & { + filter: Omit<MetricsDataFilter, "cursor"> & { aggregate: NonNullable<MetricsDataFilter["aggregate"]>; }, opts?: RequestOptions ): Promise<{ data: AggregatedMetricRow[]; nextCursor: null }> { const validatedFilter = dataFilterSchemas.metricsDataFilterSchema.parse(filter); + if (!validatedFilter.aggregate) { + throw new Error("searchAggregatedMetrics requires filter.aggregate"); + } return request( `${this.baseUrl}/signals/metrics/search`,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/sdk/src/client.ts` around lines 256 - 264, In searchAggregatedMetrics, after calling dataFilterSchemas.metricsDataFilterSchema.parse(filter) assign to validatedFilter and add an explicit runtime guard that verifies validatedFilter.aggregate is present and non-null/undefined; if the check fails, throw a clear error (or return a rejected Promise) before proceeding to the raw/response-parsing branch so JS consumers don't reach the parsing logic with a missing aggregate. Ensure the guard references validatedFilter and the function name searchAggregatedMetrics so it’s easy to locate.packages/cli/src/commands/metrics.ts (1)
165-169:⚠️ Potential issue | 🟠 MajorInvalid CLI args still terminate with exit code 1.
Line 168 throws
InvalidArgumentError, but the command catch path still exits with1, so invalid arguments are misclassified.As per coding guidelines: "Use appropriate exit codes: 0 for success, 1 for API/runtime error, 2 for invalid arguments, 3 for config error (missing url)".🐛 Suggested fix
} catch (err) { outputError(err, format === "json"); - process.exit(1); + process.exit(err instanceof InvalidArgumentError ? 2 : 1); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/cli/src/commands/metrics.ts` around lines 165 - 169, The toAggregateFn helper currently throws InvalidArgumentError for bad CLI values but the command's catch path still exits with code 1; update the error handling so invalid argument errors result in exit code 2. Specifically, either (A) adjust the top-level catch in the metrics command to detect InvalidArgumentError (using the InvalidArgumentError symbol) and call process.exit(2) or set process.exitCode = 2, or (B) change the command runner to map InvalidArgumentError to exit code 2 before calling process.exit(1). Ensure you reference toAggregateFn and InvalidArgumentError when locating the code to change and keep all other runtime errors exiting with code 1.
🧹 Nitpick comments (1)
skills/create-dashboard/rules/workflow.md (1)
60-60: Consider clarifying aggregation options and MetricTable reference.The documentation is clear overall, but could be enhanced with:
- Specifying whether
"sum"is the only valid aggregation function or if others (e.g.,"avg","min","max") are supported- Briefly introducing
MetricTablesince this is its first mention in the documentThese additions would help users understand the full range of options and avoid confusion about the referenced component.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@skills/create-dashboard/rules/workflow.md` at line 60, Clarify the docs around MetricStat and MetricTable: update the MetricStat line that currently shows `aggregate: "sum"` to list supported aggregation options (e.g., `"sum"`, `"avg"`, `"min"`, `"max"`, etc.) and explicitly state which are valid; keep the guidance that `groupBy` should not be used with MetricStat and instead add a one-sentence introduction to `MetricTable` (what it is and that it should be used for grouped results) so readers know why MetricTable is referenced and when to choose it over MetricStat.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@packages/cli/src/commands/metrics.ts`:
- Around line 165-169: The toAggregateFn helper currently throws
InvalidArgumentError for bad CLI values but the command's catch path still exits
with code 1; update the error handling so invalid argument errors result in exit
code 2. Specifically, either (A) adjust the top-level catch in the metrics
command to detect InvalidArgumentError (using the InvalidArgumentError symbol)
and call process.exit(2) or set process.exitCode = 2, or (B) change the command
runner to map InvalidArgumentError to exit code 2 before calling
process.exit(1). Ensure you reference toAggregateFn and InvalidArgumentError
when locating the code to change and keep all other runtime errors exiting with
code 1.
In `@packages/sdk/src/client.ts`:
- Around line 256-264: In searchAggregatedMetrics, after calling
dataFilterSchemas.metricsDataFilterSchema.parse(filter) assign to
validatedFilter and add an explicit runtime guard that verifies
validatedFilter.aggregate is present and non-null/undefined; if the check fails,
throw a clear error (or return a rejected Promise) before proceeding to the
raw/response-parsing branch so JS consumers don't reach the parsing logic with a
missing aggregate. Ensure the guard references validatedFilter and the function
name searchAggregatedMetrics so it’s easy to locate.
---
Nitpick comments:
In `@skills/create-dashboard/rules/workflow.md`:
- Line 60: Clarify the docs around MetricStat and MetricTable: update the
MetricStat line that currently shows `aggregate: "sum"` to list supported
aggregation options (e.g., `"sum"`, `"avg"`, `"min"`, `"max"`, etc.) and
explicitly state which are valid; keep the guidance that `groupBy` should not be
used with MetricStat and instead add a one-sentence introduction to
`MetricTable` (what it is and that it should be used for grouped results) so
readers know why MetricTable is referenced and when to choose it over
MetricStat.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 08d7cbeb-5ac8-4176-9536-4d0a87f73abf
📒 Files selected for processing (16)
.changeset/yellow-sites-rhyme.mdpackages/cli/README.mdpackages/cli/src/commands/metrics.tspackages/sdk/src/client.test.tspackages/sdk/src/client.tspackages/ui/src/components/observability/renderers/OtelMetricStat.tsxpackages/ui/src/hooks/use-kopai-data.tsskills/create-dashboard/SKILL.mdskills/create-dashboard/rules/workflow.mdskills/create-dashboard/tile.jsonskills/otel-instrumentation/SKILL.mdskills/otel-instrumentation/references/cli-reference.mdskills/otel-instrumentation/tile.jsonskills/root-cause-analysis/SKILL.mdskills/root-cause-analysis/references/metric-filters.mdskills/root-cause-analysis/tile.json
✅ Files skipped from review due to trivial changes (9)
- skills/otel-instrumentation/SKILL.md
- skills/create-dashboard/tile.json
- skills/otel-instrumentation/references/cli-reference.md
- skills/otel-instrumentation/tile.json
- skills/create-dashboard/SKILL.md
- skills/root-cause-analysis/SKILL.md
- skills/root-cause-analysis/references/metric-filters.md
- .changeset/yellow-sites-rhyme.md
- skills/root-cause-analysis/tile.json
🚧 Files skipped from review as they are similar to previous changes (3)
- packages/ui/src/hooks/use-kopai-data.ts
- packages/sdk/src/client.test.ts
- packages/ui/src/components/observability/renderers/OtelMetricStat.tsx
Summary by CodeRabbit
New Features
metrics searchadds--aggregateand repeatable--group-byDocumentation