Skip to content

Feat/metrics aggregation#105

Merged
Vunovati merged 6 commits into
mainfrom
feat/metrics-aggregation
Mar 23, 2026
Merged

Feat/metrics aggregation#105
Vunovati merged 6 commits into
mainfrom
feat/metrics-aggregation

Conversation

@Vunovati
Copy link
Copy Markdown
Collaborator

@Vunovati Vunovati commented Mar 20, 2026

Summary by CodeRabbit

  • New Features

    • Metric aggregation support (sum, avg, min, max, count) and grouping by attributes
    • CLI: metrics search adds --aggregate and repeatable --group-by
    • Ingestion telemetry: emit per-endpoint ingestion metrics (bytes, requests) and dashboard widgets for total bytes/requests
    • SDK and UI: client + components support aggregated metric responses
  • Documentation

    • CLI and skill docs updated with aggregation examples and guidance

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Adds aggregated metric queries and ingestion telemetry across stack: schema/types, ClickHouse/SQLite datasource implementations, API/CLI/client/UI plumbing, collector-side ingestion metrics emission, and tests to validate aggregation and grouping behavior.

Changes

Cohort / File(s) Summary
Core Types & Schemas
packages/core/src/data-filters-zod.ts, packages/core/src/denormalized-signals-zod.ts, packages/core/src/telemetry-datasource.ts
Added aggregate and groupBy to metrics filter schema (with runtime refinements); added aggregatedMetricSchema/AggregatedMetricRow; extended ReadMetricsDatasource with getAggregatedMetrics.
ClickHouse Datasource
packages/clickhouse-datasource/src/query-metrics.ts, packages/clickhouse-datasource/src/datasource.ts, packages/clickhouse-datasource/src/query-metrics.test.ts
Added buildAggregatedMetricsQuery and ClickHouseReadDatasource.getAggregatedMetrics to run parameterized aggregate queries, stream rows, extract groups, and return aggregated rows; tests validate SQL generation and params.
SQLite Datasource
packages/sqlite-datasource/src/db-datasource.ts, packages/sqlite-datasource/src/optimized-datasource.ts, packages/sqlite-datasource/src/datasource-read.test.ts
Implemented getAggregatedMetrics with JSON-path-safe grouping, aggregation selection, ordering and limit; added isRecord and escapeJsonPath; OptimizedDatasource delegates to DbDatasource; comprehensive tests added.
API Routes & Tests
packages/api/src/routes/metrics.ts, packages/api/src/signals.test.ts, packages/api/src/index.test.ts
POST /signals/metrics/search now accepts aggregation: conditionally calls getAggregatedMetrics vs getMetrics, response schema union added, and tests cover aggregate and invalid-parameter cases.
Collector: ingestion metrics
packages/collector/src/routes/ingestion-metrics.ts, packages/collector/src/routes/ingestion-metrics.test.ts, packages/collector/src/index.ts, packages/collector/src/routes/{traces,metrics,logs}.ts, packages/collector/src/collector.test.ts
New functions to build/emit OTLP ingestion DELTA metrics per signal; request ingestContentLength tracked; optional ingestionMetricsDatasource threaded into routes and emitted asynchronously; tests added.
SDK & Client
packages/sdk/src/client.ts, packages/sdk/src/types.ts, packages/sdk/src/client.test.ts, packages/sdk/src/mocks/handlers.ts
Added KopaiClient.searchAggregatedMetrics and aggregated response schema; exported AggregatedMetricRow type; mocks and tests updated for aggregated responses and validation.
CLI
packages/cli/src/commands/metrics.ts, packages/cli/README.md
Added --aggregate <fn> and repeatable --group-by <attr> options, validation helpers, and routing to searchAggregatedMetrics when aggregate provided; docs updated.
UI Components & Hooks
packages/ui/src/components/.../MetricStat/index.tsx, packages/ui/src/components/.../OtelMetricStat.tsx, packages/ui/src/pages/observability.tsx, packages/ui/src/providers/kopai-provider.tsx, packages/ui/src/hooks/use-kopai-data.ts, plus multiple tests
MetricStat can accept direct value/unit; OtelMetricStat detects aggregated responses and renders single-value case (no sparkline) or grouped-error; dashboard adds OTEL Ingestion cards; hooks route aggregated requests to searchAggregatedMetrics; test mocks updated.
App Integration
packages/app/src/collector/index.ts
Passes telemetry datasource as ingestionMetricsDatasource into collector route registration.
Changeset & Skills Docs
.changeset/*, skills/*, packages/cli/README.md
Release changeset added and several skill/tile docs version bumps and CLI examples documenting aggregation/groupBy.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant APIRoute as API Route\n/signals/metrics/search
    participant Datasource as ReadMetricsDatasource\n(ClickHouse/SQLite)
    participant Database

    Client->>APIRoute: POST body (aggregate?, groupBy?)
    APIRoute->>APIRoute: validate body
    alt aggregate present
        APIRoute->>Datasource: getAggregatedMetrics(params)
        Datasource->>Database: buildAggregatedMetricsQuery / SQL aggregate + GROUP BY
        Database-->>Datasource: aggregated rows
        Datasource->>APIRoute: { data: AggregatedMetricRow[], nextCursor: null }
    else no aggregate
        APIRoute->>Datasource: getMetrics(params)
        Datasource->>Database: buildMetricsQuery
        Database-->>Datasource: metric rows (paged)
        Datasource->>APIRoute: searchResponse (may include cursor)
    end
    APIRoute-->>Client: 200 JSON
Loading
sequenceDiagram
    participant TelemetryClient
    participant CollectorRoute as Collector Route\n(/v1/traces,etc.)
    participant WriteDS as WriteMetricsDatasource
    participant Ingestion as emitIngestionMetrics

    TelemetryClient->>CollectorRoute: POST payload
    CollectorRoute->>WriteDS: writeTraces/Metrics/Logs
    WriteDS-->>CollectorRoute: success
    alt ingestionMetricsDatasource provided
        CollectorRoute->>Ingestion: emitIngestionMetrics(signal, contentLength) (async)
        Ingestion->>WriteDS: writeMetrics(kopai.ingestion.bytes/requests)
    end
    CollectorRoute-->>TelemetryClient: 200 OK
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

🐰 I nibbled on code and found a sum,

Groups and bytes now hum and drum.
Ingested metrics hop in line,
Aggregates sparkle, neat and fine.
Rabbity cheers — the dashboards shine!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Feat/metrics aggregation' is specific and directly related to the main changeset, which introduces comprehensive metrics aggregation functionality across multiple packages including new aggregate query methods, grouping capabilities, and CLI/UI support.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/metrics-aggregation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
packages/collector/src/routes/ingestion-metrics.ts (1)

3-4: Consider exposing a reset function for test isolation.

The module-level lastEmitNs Map persists state across test runs. While tests currently use unique signal names to avoid interference, exposing a resetLastEmitNs() function (or using vi.resetModules() in tests) would provide more robust test isolation.

♻️ Optional: Add reset function for testing
 /** Tracks the last emit time per signal for Delta startTimeUnixNano. */
 const lastEmitNs = new Map<string, string>();
+
+/** `@internal` Reset state for testing. */
+export function _resetLastEmitNs(): void {
+  lastEmitNs.clear();
+}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/collector/src/routes/ingestion-metrics.ts` around lines 3 - 4, The
module-level Map lastEmitNs persists across tests; add and export a
resetLastEmitNs() function that clears lastEmitNs to allow test isolation.
Implement a simple function named resetLastEmitNs that calls lastEmitNs.clear(),
export it from the same module (alongside any existing exports like functions
that read/write lastEmitNs), and update tests to call resetLastEmitNs() where
needed or document its existence for test helpers. Ensure the function name is
exactly resetLastEmitNs so callers and tests can reference it consistently.
packages/collector/src/collector.test.ts (1)

1001-1003: Replace the fixed sleeps with a deterministic barrier.

These assertions depend on the background write settling within 50/100ms, which is a common source of CI flakes and just adds idle time on fast runs. Have the mock resolve a promise and await that promise after server.inject() instead.

Also applies to: 1031-1032

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/collector/src/collector.test.ts` around lines 1001 - 1003, Replace
the fixed setTimeout-based wait after server.inject() with a deterministic
barrier: modify the mock for the background write (ingestionWriteSpy) so it
returns a Promise you control (e.g., expose resolveIngestionWrite), have the
test call server.inject(), then await that controlled promise instead of
awaiting new Promise(r => setTimeout(r, 100)); ensure you wire the mock used by
ingestionWriteSpy to call the resolver when the background work would complete
and apply the same change to the other occurrence that currently uses a fixed
sleep.
packages/ui/src/pages/observability.tsx (1)

798-805: Avoid polling unbounded lifetime totals from the UI.

Both cards issue all-time sum queries and refetch them every 10s because there is no timeUnixMin/timeUnixMax bound. That makes every open metrics tab repeatedly aggregate the full kopai.ingestion.* history. If these numbers need to be lifetime counters, I'd serve them from a pre-aggregated/cached source instead of recomputing them in the browser polling path.

Also applies to: 825-833

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/ui/src/pages/observability.tsx` around lines 798 - 805, The UI is
polling unbounded all-time `sum` metrics every 10s (see dataSource with method
"searchMetricsPage", params metricType/metricName/aggregate and
refetchIntervalMs), which forces expensive full-history aggregation; fix by
adding time bounds (timeUnixMin/timeUnixMax) to the params to limit the query
window or replace the query with a pre-aggregated/cached endpoint that returns
lifetime counters, and reduce or remove the refetchIntervalMs polling if using a
cached source; apply the same change to the other card (lines referencing the
same dataSource pattern around 825-833).
packages/clickhouse-datasource/src/datasource.ts (1)

347-359: Validate the constructed aggregate rows before returning them.

Every other ClickHouse read path parses streamed data through a schema, but this method hand-builds { groups, value }. If the query shape changes or json.value is missing, Number(json.value) can become NaN and still escape as a typed result. Parsing the constructed object with denormalizedSignals.aggregatedMetricSchema would keep this path consistent with the rest of the datasource.

♻️ Keep this path aligned with the others
-          data.push({ groups, value: Number(json.value) });
+          data.push(
+            denormalizedSignals.aggregatedMetricSchema.parse({
+              groups,
+              value: Number(json.value),
+            })
+          );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/datasource.ts` around lines 347 - 359, The
constructed aggregate rows (objects pushed into data) are not validated and can
contain NaN or malformed fields; after building each { groups, value } use
denormalizedSignals.aggregatedMetricSchema.safeParse (or parse) to validate the
object and only push the parsed.value (or the validated result) into data,
otherwise handle the error (throw, log, or skip) so invalid rows don't escape as
typed AggregatedMetricRow; update the loop around
denormalizedSignals.AggregatedMetricRow creation to perform this schema
validation for each row.
packages/clickhouse-datasource/src/query-metrics.ts (1)

314-315: Add a deterministic tie-breaker for grouped top-N queries.

When groupBy is present, Line 314 orders only by value, so tied groups can swap in and out of the limited result set between refreshes. Appending the group_* aliases as secondary sort keys would make this stable; the SQLite implementation should mirror the same rule too.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/query-metrics.ts` around lines 314 - 315,
When groupBy is used the ORDER BY clause "ORDER BY value DESC LIMIT
{limit:UInt32}" must include deterministic tie-breakers: append the grouped
column aliases (the generated group_* aliases) as secondary sort keys after
value DESC (e.g. ORDER BY value DESC, group_1, group_2, ...), and apply the same
change in the SQLite path that builds the grouped top-N query; locate the SQL
assembly that emits the ORDER BY...LIMIT string in query-metrics.ts and update
it to enumerate the group_* aliases when groupBy is present so results are
stable across refreshes.
packages/clickhouse-datasource/src/query-metrics.test.ts (1)

28-135: Cover the rejection paths too.

The new builder throws on unsupported metricType values and unknown aggregate keys, but this suite only exercises happy paths. A couple of toThrow cases would lock down that validation contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/clickhouse-datasource/src/query-metrics.test.ts` around lines 28 -
135, Add negative tests to cover rejection paths for
buildAggregatedMetricsQuery: call buildAggregatedMetricsQuery with an
unsupported metricType (e.g., metricType: "UnknownType") and assert it throws,
and call it with an invalid aggregate (e.g., aggregate: "unsupportedAgg") and
assert it throws; use expect(() =>
buildAggregatedMetricsQuery({...})).toThrow(...) or toThrowError(...) and match
the error text produced by the function so the validation contract for
metricType and aggregate is locked down.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/cli/src/commands/metrics.ts`:
- Around line 162-165: toAggregateFn currently throws a generic Error for bad
--aggregate values; change it to throw a distinctive error (e.g., new Error with
name "InvalidArgumentError" or set error.code = 2) when the value is invalid,
and then update the generic catch block in the CLI command handler that
currently exits with code 1 to detect that distinctive error (error.name ===
"InvalidArgumentError" or error.code === 2) and call process.exit(2); this
ensures invalid-argument failures from toAggregateFn are exited with code 2
while other runtime/API errors still exit with code 1.

In `@packages/collector/src/index.ts`:
- Around line 36-39: The current hook sets request.ingestContentLength only from
request.headers and leaves it 0 when Content-Length is absent; instead
initialize request.ingestContentLength =
Number(request.headers["content-length"]) || 0 and, when that value is 0, attach
a short-lived listener to the incoming raw stream (request.raw) to accumulate
chunk lengths (on 'data' increment request.ingestContentLength by chunk.length)
and remove listeners on 'end'/'close'/'error' so routes like the handlers for
/v1/logs, /v1/metrics, and /v1/traces see the actual received bytes even for
chunked or header-stripped requests.

In `@packages/sdk/src/client.ts`:
- Around line 256-274: searchAggregatedMetrics currently parses the generic
metricsDataFilter but then assumes aggregated-only fields exist; add a guard
after parsing (validatedFilter) to ensure validatedFilter.aggregate is present
and has the expected shape (e.g., an array of { groups, value } entries) before
calling request, and if not present or invalid throw a clear error; reference
validatedFilter and searchAggregatedMetrics (and
aggregatedMetricsResponseSchema) so the check is colocated with the existing
parse and prevents hitting the non-aggregated server branch or failing
downstream parsing.

In `@packages/sqlite-datasource/src/db-datasource.ts`:
- Around line 756-762: The mapping of query rows in rows.map currently coerces
NULL aggregate results into 0 via Number(row.value) which hides empty-window
cases; in the rows.map callback (the block building groups and returning {
groups, value: Number(row.value) }) change the conversion to explicitly detect
row.value == null and preserve a null/undefined sentinel (or another explicit
marker) for denormalizedSignals.AggregatedMetricRow.value instead of converting
to 0, and only convert to Number(row.value) when row.value is non-null (for
sum=0 ensure you treat the numeric 0 as a valid number rather than as "no
data").

In `@packages/ui/src/components/observability/renderers/OtelMetricStat.tsx`:
- Around line 27-33: The current aggregated branch (isAggregatedRequest) wrongly
collapses grouped results by using response?.data[0]; instead, detect if the
aggregated response contains multiple rows (response?.data?.length > 1) and
fail-fast: either throw or return a clear error/UI indicating grouped aggregates
are unsupported by OtelMetricStat (e.g., "grouped aggregates not supported,
provide ungrouped aggregate"), or alternatively pass all AggregatedMetricRow
entries into MetricStat (use rows = response.data and adjust value handling).
Update the isAggregatedRequest branch to perform this check and choose one
behavior (reject grouped aggregates or accept them) so you don’t silently drop
buckets.

---

Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.ts`:
- Around line 347-359: The constructed aggregate rows (objects pushed into data)
are not validated and can contain NaN or malformed fields; after building each {
groups, value } use denormalizedSignals.aggregatedMetricSchema.safeParse (or
parse) to validate the object and only push the parsed.value (or the validated
result) into data, otherwise handle the error (throw, log, or skip) so invalid
rows don't escape as typed AggregatedMetricRow; update the loop around
denormalizedSignals.AggregatedMetricRow creation to perform this schema
validation for each row.

In `@packages/clickhouse-datasource/src/query-metrics.test.ts`:
- Around line 28-135: Add negative tests to cover rejection paths for
buildAggregatedMetricsQuery: call buildAggregatedMetricsQuery with an
unsupported metricType (e.g., metricType: "UnknownType") and assert it throws,
and call it with an invalid aggregate (e.g., aggregate: "unsupportedAgg") and
assert it throws; use expect(() =>
buildAggregatedMetricsQuery({...})).toThrow(...) or toThrowError(...) and match
the error text produced by the function so the validation contract for
metricType and aggregate is locked down.

In `@packages/clickhouse-datasource/src/query-metrics.ts`:
- Around line 314-315: When groupBy is used the ORDER BY clause "ORDER BY value
DESC LIMIT {limit:UInt32}" must include deterministic tie-breakers: append the
grouped column aliases (the generated group_* aliases) as secondary sort keys
after value DESC (e.g. ORDER BY value DESC, group_1, group_2, ...), and apply
the same change in the SQLite path that builds the grouped top-N query; locate
the SQL assembly that emits the ORDER BY...LIMIT string in query-metrics.ts and
update it to enumerate the group_* aliases when groupBy is present so results
are stable across refreshes.

In `@packages/collector/src/collector.test.ts`:
- Around line 1001-1003: Replace the fixed setTimeout-based wait after
server.inject() with a deterministic barrier: modify the mock for the background
write (ingestionWriteSpy) so it returns a Promise you control (e.g., expose
resolveIngestionWrite), have the test call server.inject(), then await that
controlled promise instead of awaiting new Promise(r => setTimeout(r, 100));
ensure you wire the mock used by ingestionWriteSpy to call the resolver when the
background work would complete and apply the same change to the other occurrence
that currently uses a fixed sleep.

In `@packages/collector/src/routes/ingestion-metrics.ts`:
- Around line 3-4: The module-level Map lastEmitNs persists across tests; add
and export a resetLastEmitNs() function that clears lastEmitNs to allow test
isolation. Implement a simple function named resetLastEmitNs that calls
lastEmitNs.clear(), export it from the same module (alongside any existing
exports like functions that read/write lastEmitNs), and update tests to call
resetLastEmitNs() where needed or document its existence for test helpers.
Ensure the function name is exactly resetLastEmitNs so callers and tests can
reference it consistently.

In `@packages/ui/src/pages/observability.tsx`:
- Around line 798-805: The UI is polling unbounded all-time `sum` metrics every
10s (see dataSource with method "searchMetricsPage", params
metricType/metricName/aggregate and refetchIntervalMs), which forces expensive
full-history aggregation; fix by adding time bounds (timeUnixMin/timeUnixMax) to
the params to limit the query window or replace the query with a
pre-aggregated/cached endpoint that returns lifetime counters, and reduce or
remove the refetchIntervalMs polling if using a cached source; apply the same
change to the other card (lines referencing the same dataSource pattern around
825-833).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dc9c13a6-18ec-416a-9241-50f28606d502

📥 Commits

Reviewing files that changed from the base of the PR and between 7e911c3 and deae6c1.

⛔ Files ignored due to path filters (1)
  • packages/ui/src/lib/__snapshots__/generate-prompt-instructions.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (35)
  • packages/api/src/index.test.ts
  • packages/api/src/routes/metrics.ts
  • packages/api/src/signals.test.ts
  • packages/app/src/collector/index.ts
  • packages/cli/src/commands/metrics.ts
  • packages/clickhouse-datasource/src/datasource.ts
  • packages/clickhouse-datasource/src/query-metrics.test.ts
  • packages/clickhouse-datasource/src/query-metrics.ts
  • packages/collector/src/collector.test.ts
  • packages/collector/src/index.ts
  • packages/collector/src/routes/ingestion-metrics.test.ts
  • packages/collector/src/routes/ingestion-metrics.ts
  • packages/collector/src/routes/logs.ts
  • packages/collector/src/routes/metrics.ts
  • packages/collector/src/routes/traces.ts
  • packages/core/src/data-filters-zod.ts
  • packages/core/src/denormalized-signals-zod.ts
  • packages/core/src/telemetry-datasource.ts
  • packages/sdk/src/client.test.ts
  • packages/sdk/src/client.ts
  • packages/sdk/src/mocks/handlers.ts
  • packages/sdk/src/types.ts
  • packages/sqlite-datasource/src/datasource-read.test.ts
  • packages/sqlite-datasource/src/db-datasource.ts
  • packages/sqlite-datasource/src/optimized-datasource.ts
  • packages/ui/src/components/observability/DynamicDashboard/DynamicDashboard.test.tsx
  • packages/ui/src/components/observability/MetricStat/index.tsx
  • packages/ui/src/components/observability/renderers/OtelMetricStat.tsx
  • packages/ui/src/hooks/use-kopai-data.test.ts
  • packages/ui/src/hooks/use-kopai-data.ts
  • packages/ui/src/hooks/use-live-logs.test.ts
  • packages/ui/src/lib/renderer.test.tsx
  • packages/ui/src/pages/observability.test.tsx
  • packages/ui/src/pages/observability.tsx
  • packages/ui/src/providers/kopai-provider.tsx

Comment thread packages/cli/src/commands/metrics.ts Outdated
Comment on lines +162 to +165
function toAggregateFn(value: string | undefined): AggregateFn | undefined {
if (value === undefined) return undefined;
if (isAggregateFn(value)) return value;
throw new Error(`Invalid aggregate function: ${value}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Exit code for invalid arguments should be 2, not 1.

When toAggregateFn throws for an invalid --aggregate value, the error is caught in the generic catch block (line 117-120) which exits with code 1. Per coding guidelines, invalid arguments should exit with code 2.

Proposed fix
+class InvalidArgumentError extends Error {
+  constructor(message: string) {
+    super(message);
+    this.name = "InvalidArgumentError";
+  }
+}
+
 function toAggregateFn(value: string | undefined): AggregateFn | undefined {
   if (value === undefined) return undefined;
   if (isAggregateFn(value)) return value;
-  throw new Error(`Invalid aggregate function: ${value}`);
+  throw new InvalidArgumentError(`Invalid aggregate function: ${value}`);
 }

Then update the catch block to differentiate:

     } catch (err) {
       outputError(err, format === "json");
-      process.exit(1);
+      process.exit(err instanceof InvalidArgumentError ? 2 : 1);
     }

As per coding guidelines: "Use appropriate exit codes: 0 for success, 1 for API/runtime error, 2 for invalid arguments, 3 for config error".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/commands/metrics.ts` around lines 162 - 165, toAggregateFn
currently throws a generic Error for bad --aggregate values; change it to throw
a distinctive error (e.g., new Error with name "InvalidArgumentError" or set
error.code = 2) when the value is invalid, and then update the generic catch
block in the CLI command handler that currently exits with code 1 to detect that
distinctive error (error.name === "InvalidArgumentError" or error.code === 2)
and call process.exit(2); this ensures invalid-argument failures from
toAggregateFn are exited with code 2 while other runtime/API errors still exit
with code 1.

Comment on lines +36 to +39
fastify.addHook("onRequest", async (request) => {
request.ingestContentLength =
Number(request.headers["content-length"]) || 0;
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ingestContentLength falls back to 0 for requests without Content-Length.

The new ingestion metrics derive bytes entirely from the header here. Any accepted chunked upload or intermediary that strips Content-Length will still reach /v1/logs, /v1/metrics, and /v1/traces, but each route will emit kopai.ingestion.bytes = 0 for that request. If this metric is meant to reflect actual bytes received, it needs a stream-length fallback instead of a header-only read.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/collector/src/index.ts` around lines 36 - 39, The current hook sets
request.ingestContentLength only from request.headers and leaves it 0 when
Content-Length is absent; instead initialize request.ingestContentLength =
Number(request.headers["content-length"]) || 0 and, when that value is 0, attach
a short-lived listener to the incoming raw stream (request.raw) to accumulate
chunk lengths (on 'data' increment request.ingestContentLength by chunk.length)
and remove listeners on 'end'/'close'/'error' so routes like the handlers for
/v1/logs, /v1/metrics, and /v1/traces see the actual received bytes even for
chunked or header-stripped requests.

Comment thread packages/sdk/src/client.ts
Comment on lines +756 to +762
const data: denormalizedSignals.AggregatedMetricRow[] = rows.map(
(row) => {
const groups: Record<string, string> = {};
for (const [i, groupKey] of groupByKeys.entries()) {
groups[groupKey] = String(row[`group_${String(i)}`] ?? "");
}
return { groups, value: Number(row.value) };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t coerce empty aggregates into zero.

When the filter matches no rows and there is no groupBy, SQLite still returns a single aggregate row with value = NULL for avg/min/max (and sum). Line 762 turns that into 0, so an empty window becomes indistinguishable from real data. Handle row.value == null explicitly before converting it; if sum=0 is intentional, special-case that instead of relying on Number(null).

Possible fix
-      const data: denormalizedSignals.AggregatedMetricRow[] = rows.map(
+      const data: denormalizedSignals.AggregatedMetricRow[] = rows.flatMap(
         (row) => {
+          if (row.value == null) {
+            return [];
+          }
           const groups: Record<string, string> = {};
           for (const [i, groupKey] of groupByKeys.entries()) {
             groups[groupKey] = String(row[`group_${String(i)}`] ?? "");
           }
-          return { groups, value: Number(row.value) };
+          return [{ groups, value: Number(row.value) }];
         }
       );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const data: denormalizedSignals.AggregatedMetricRow[] = rows.map(
(row) => {
const groups: Record<string, string> = {};
for (const [i, groupKey] of groupByKeys.entries()) {
groups[groupKey] = String(row[`group_${String(i)}`] ?? "");
}
return { groups, value: Number(row.value) };
const data: denormalizedSignals.AggregatedMetricRow[] = rows.flatMap(
(row) => {
if (row.value == null) {
return [];
}
const groups: Record<string, string> = {};
for (const [i, groupKey] of groupByKeys.entries()) {
groups[groupKey] = String(row[`group_${String(i)}`] ?? "");
}
return [{ groups, value: Number(row.value) }];
}
);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sqlite-datasource/src/db-datasource.ts` around lines 756 - 762, The
mapping of query rows in rows.map currently coerces NULL aggregate results into
0 via Number(row.value) which hides empty-window cases; in the rows.map callback
(the block building groups and returning { groups, value: Number(row.value) })
change the conversion to explicitly detect row.value == null and preserve a
null/undefined sentinel (or another explicit marker) for
denormalizedSignals.AggregatedMetricRow.value instead of converting to 0, and
only convert to Number(row.value) when row.value is non-null (for sum=0 ensure
you treat the numeric 0 as a valid number rather than as "no data").

Comment thread packages/ui/src/components/observability/renderers/OtelMetricStat.tsx Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
packages/sdk/src/client.ts (1)

256-264: ⚠️ Potential issue | 🟠 Major

Add a runtime guard for aggregate in aggregated search.

Type-level enforcement helps TS callers, but at runtime (especially JS consumers) this can still call the raw branch and then fail response parsing. Add an explicit guard after parsing.

🐛 Suggested fix
   async searchAggregatedMetrics(
-    filter: MetricsDataFilter & {
+    filter: Omit<MetricsDataFilter, "cursor"> & {
       aggregate: NonNullable<MetricsDataFilter["aggregate"]>;
     },
     opts?: RequestOptions
   ): Promise<{ data: AggregatedMetricRow[]; nextCursor: null }> {
     const validatedFilter =
       dataFilterSchemas.metricsDataFilterSchema.parse(filter);
+    if (!validatedFilter.aggregate) {
+      throw new Error("searchAggregatedMetrics requires filter.aggregate");
+    }

     return request(
       `${this.baseUrl}/signals/metrics/search`,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/sdk/src/client.ts` around lines 256 - 264, In
searchAggregatedMetrics, after calling
dataFilterSchemas.metricsDataFilterSchema.parse(filter) assign to
validatedFilter and add an explicit runtime guard that verifies
validatedFilter.aggregate is present and non-null/undefined; if the check fails,
throw a clear error (or return a rejected Promise) before proceeding to the
raw/response-parsing branch so JS consumers don't reach the parsing logic with a
missing aggregate. Ensure the guard references validatedFilter and the function
name searchAggregatedMetrics so it’s easy to locate.
packages/cli/src/commands/metrics.ts (1)

165-169: ⚠️ Potential issue | 🟠 Major

Invalid CLI args still terminate with exit code 1.

Line 168 throws InvalidArgumentError, but the command catch path still exits with 1, so invalid arguments are misclassified.

🐛 Suggested fix
   } catch (err) {
     outputError(err, format === "json");
-    process.exit(1);
+    process.exit(err instanceof InvalidArgumentError ? 2 : 1);
   }
As per coding guidelines: "Use appropriate exit codes: 0 for success, 1 for API/runtime error, 2 for invalid arguments, 3 for config error (missing url)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/commands/metrics.ts` around lines 165 - 169, The
toAggregateFn helper currently throws InvalidArgumentError for bad CLI values
but the command's catch path still exits with code 1; update the error handling
so invalid argument errors result in exit code 2. Specifically, either (A)
adjust the top-level catch in the metrics command to detect InvalidArgumentError
(using the InvalidArgumentError symbol) and call process.exit(2) or set
process.exitCode = 2, or (B) change the command runner to map
InvalidArgumentError to exit code 2 before calling process.exit(1). Ensure you
reference toAggregateFn and InvalidArgumentError when locating the code to
change and keep all other runtime errors exiting with code 1.
🧹 Nitpick comments (1)
skills/create-dashboard/rules/workflow.md (1)

60-60: Consider clarifying aggregation options and MetricTable reference.

The documentation is clear overall, but could be enhanced with:

  1. Specifying whether "sum" is the only valid aggregation function or if others (e.g., "avg", "min", "max") are supported
  2. Briefly introducing MetricTable since this is its first mention in the document

These additions would help users understand the full range of options and avoid confusion about the referenced component.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/create-dashboard/rules/workflow.md` at line 60, Clarify the docs
around MetricStat and MetricTable: update the MetricStat line that currently
shows `aggregate: "sum"` to list supported aggregation options (e.g., `"sum"`,
`"avg"`, `"min"`, `"max"`, etc.) and explicitly state which are valid; keep the
guidance that `groupBy` should not be used with MetricStat and instead add a
one-sentence introduction to `MetricTable` (what it is and that it should be
used for grouped results) so readers know why MetricTable is referenced and when
to choose it over MetricStat.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@packages/cli/src/commands/metrics.ts`:
- Around line 165-169: The toAggregateFn helper currently throws
InvalidArgumentError for bad CLI values but the command's catch path still exits
with code 1; update the error handling so invalid argument errors result in exit
code 2. Specifically, either (A) adjust the top-level catch in the metrics
command to detect InvalidArgumentError (using the InvalidArgumentError symbol)
and call process.exit(2) or set process.exitCode = 2, or (B) change the command
runner to map InvalidArgumentError to exit code 2 before calling
process.exit(1). Ensure you reference toAggregateFn and InvalidArgumentError
when locating the code to change and keep all other runtime errors exiting with
code 1.

In `@packages/sdk/src/client.ts`:
- Around line 256-264: In searchAggregatedMetrics, after calling
dataFilterSchemas.metricsDataFilterSchema.parse(filter) assign to
validatedFilter and add an explicit runtime guard that verifies
validatedFilter.aggregate is present and non-null/undefined; if the check fails,
throw a clear error (or return a rejected Promise) before proceeding to the
raw/response-parsing branch so JS consumers don't reach the parsing logic with a
missing aggregate. Ensure the guard references validatedFilter and the function
name searchAggregatedMetrics so it’s easy to locate.

---

Nitpick comments:
In `@skills/create-dashboard/rules/workflow.md`:
- Line 60: Clarify the docs around MetricStat and MetricTable: update the
MetricStat line that currently shows `aggregate: "sum"` to list supported
aggregation options (e.g., `"sum"`, `"avg"`, `"min"`, `"max"`, etc.) and
explicitly state which are valid; keep the guidance that `groupBy` should not be
used with MetricStat and instead add a one-sentence introduction to
`MetricTable` (what it is and that it should be used for grouped results) so
readers know why MetricTable is referenced and when to choose it over
MetricStat.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 08d7cbeb-5ac8-4176-9536-4d0a87f73abf

📥 Commits

Reviewing files that changed from the base of the PR and between deae6c1 and 65d49a8.

📒 Files selected for processing (16)
  • .changeset/yellow-sites-rhyme.md
  • packages/cli/README.md
  • packages/cli/src/commands/metrics.ts
  • packages/sdk/src/client.test.ts
  • packages/sdk/src/client.ts
  • packages/ui/src/components/observability/renderers/OtelMetricStat.tsx
  • packages/ui/src/hooks/use-kopai-data.ts
  • skills/create-dashboard/SKILL.md
  • skills/create-dashboard/rules/workflow.md
  • skills/create-dashboard/tile.json
  • skills/otel-instrumentation/SKILL.md
  • skills/otel-instrumentation/references/cli-reference.md
  • skills/otel-instrumentation/tile.json
  • skills/root-cause-analysis/SKILL.md
  • skills/root-cause-analysis/references/metric-filters.md
  • skills/root-cause-analysis/tile.json
✅ Files skipped from review due to trivial changes (9)
  • skills/otel-instrumentation/SKILL.md
  • skills/create-dashboard/tile.json
  • skills/otel-instrumentation/references/cli-reference.md
  • skills/otel-instrumentation/tile.json
  • skills/create-dashboard/SKILL.md
  • skills/root-cause-analysis/SKILL.md
  • skills/root-cause-analysis/references/metric-filters.md
  • .changeset/yellow-sites-rhyme.md
  • skills/root-cause-analysis/tile.json
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/ui/src/hooks/use-kopai-data.ts
  • packages/sdk/src/client.test.ts
  • packages/ui/src/components/observability/renderers/OtelMetricStat.tsx

@Vunovati Vunovati merged commit 3894c34 into main Mar 23, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant