Skip to content

Feat/cc observability improvements#140

Open
Vunovati wants to merge 3 commits into
mainfrom
feat/cc-obsdervability-improvements
Open

Feat/cc observability improvements#140
Vunovati wants to merge 3 commits into
mainfrom
feat/cc-obsdervability-improvements

Conversation

@Vunovati
Copy link
Copy Markdown
Collaborator

@Vunovati Vunovati commented May 15, 2026

Summary by CodeRabbit

  • New Features

    • Log aggregation (grouping + counts) available via CLI, API and UI.
    • Metric time‑series bucketing (1m/5m/1h/1d) with grouping support across CLI, SDK and UI.
    • Three new metric visualizations: Bar Chart, Donut Chart, and Leaderboard.
    • Updated observability dashboard fixtures showing aggregated metrics, time‑series, and log panels.
  • Documentation

    • CLI docs updated with aggregation and time‑bucket flag examples.

Review Change Stack

Vunovati added 2 commits May 15, 2026 14:48
Adds the catalog components and data source methods needed to render
the Claude Code monitoring dashboard from kopai's UITree system.

New catalog components (categorical visualisations over searchAggregatedMetrics
/ searchLogsAggregate):
- MetricBarChart: horizontal/vertical bars, log-scale support, group-label coercion
- MetricDonutChart: proportional slices with "Other" overflow bucket
- MetricLeaderboard: ranked rows with inline proportional bars

New data source methods:
- searchLogsAggregate: COUNT(*) GROUP BY attribute keys over logs (LIMIT 1000).
  ClickHouse via LogAttributes Map accessor, SQLite via json_extract.
- searchMetricsTimeSeries: time-bucketed metric aggregates (1m/5m/1h/1d).
  ClickHouse via toStartOfInterval, SQLite via raw bigint bucket arithmetic.
  Separate /signals/metrics/timeseries route to keep response-schema union bounded.

CLI:
- kopai logs search --aggregate count --group-by <attr>
- kopai metrics search --time-bucket <1m|5m|1h|1d>

OtelMetricTimeSeries renderer adapts timeseries shape to the existing
MetricTimeSeries component without touching it.

All schemas validated test-first; refinements enforce groupBy requires
aggregate, cursor incompatible with aggregate, timeBucket requires
aggregate+groupBy.
UITree matching PRD §Acceptance Criteria — KPIs (cost/sessions/commits/
PRs/LOC/active time/cache tokens), Cost (time-series + by-model bars),
Tokens (donut + log-scale bars), Attribution (top users/skill/agent
leaderboards), Tool Usage (acceptance + most-used over searchLogsAggregate),
Activity Stream (LogTimeline), Trace Explorer (TraceDetail).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

<review_stack_artifact>

</review_stack_artifact>

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cc-obsdervability-improvements

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (9)
packages/ui/src/components/observability/MetricDonutChart/MetricDonutChart.stories.tsx (1)

59-59: 💤 Low value

Use new Error() instead of new globalThis.Error().

Same issue as in MetricBarChart stories: prefer the standard new Error() constructor.

✨ Simplify error construction
-  args: { rows: [], error: new globalThis.Error("Failed to load donut") },
+  args: { rows: [], error: new Error("Failed to load donut") },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/ui/src/components/observability/MetricDonutChart/MetricDonutChart.stories.tsx`
at line 59, Replace the nonstandard constructor usage in the MetricDonutChart
story: change the error construction from new globalThis.Error("Failed to load
donut") to the standard new Error("Failed to load donut") in the story args
(look for the args object where rows and error are set in
MetricDonutChart.stories.tsx) so the story uses the normal Error constructor.
packages/ui/src/components/observability/MetricBarChart/MetricBarChart.stories.tsx (1)

73-73: 💤 Low value

Use new Error() instead of new globalThis.Error().

The globalThis.Error reference is unnecessarily verbose. Standard practice is to use new Error() directly, which is more concise and conventional.

✨ Simplify error construction
-    error: new globalThis.Error("Failed to load bar chart data"),
+    error: new Error("Failed to load bar chart data"),
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/ui/src/components/observability/MetricBarChart/MetricBarChart.stories.tsx`
at line 73, Replace the verbose globalThis.Error usage with the standard Error
constructor: change the error property assignment (in
MetricBarChart.stories.tsx, the object containing error: new
globalThis.Error("Failed to load bar chart data")) to use new Error("Failed to
load bar chart data") so the story uses the conventional Error instantiation.
packages/ui/src/components/observability/MetricBarChart/index.tsx (1)

61-68: ⚡ Quick win

Consider extracting shared label coercion logic.

This coerceGroupLabel helper is duplicated with subtle differences across MetricBarChart (lines 61-68), MetricDonutChart (coerceLabel, lines 53-60), and MetricLeaderboard (lines 36-48). The implementations differ in how they handle empty/null values, which could lead to inconsistent "(no value)" display across components.

For example:

  • {a: "", b: "foo"} → MetricBarChart: "(no value) / foo" vs. MetricLeaderboard: " / foo"
💡 Suggestion: Extract to shared utility

Consider moving this to ../utils/labels.ts with a consistent implementation that all three components can import, ensuring uniform behavior across the observability UI.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ui/src/components/observability/MetricBarChart/index.tsx` around
lines 61 - 68, The three label helpers (coerceGroupLabel in MetricBarChart,
coerceLabel in MetricDonutChart, and the label function in MetricLeaderboard)
duplicate logic and handle empty/null differently; extract a single utility
(e.g., export function coerceMetricLabel or coerceGroupLabel) into a shared
module (suggested ../utils/labels.ts) that normalizes empty string, null, and
undefined to NO_VALUE_LABEL and joins parts with " / " consistently, then import
and use that utility from MetricBarChart (coerceGroupLabel), MetricDonutChart
(coerceLabel), and MetricLeaderboard so all components display "(no value)"
uniformly.
packages/ui-core/src/lib/observability-catalog.ts (1)

182-193: ⚡ Quick win

Consider adding searchLogsAggregate to MetricDonutChart's accepted data sources.

The component currently accepts only searchAggregatedMetrics, while MetricBarChart and MetricLeaderboard both accept searchAggregatedMetrics and searchLogsAggregate. Since all three use the same AggregatedMetricRow data shape ({groups, value}), this asymmetry may be unintentional. No usage of searchLogsAggregate with donut charts was found in the codebase, but adding it would maintain consistency across similar aggregation-based components.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ui-core/src/lib/observability-catalog.ts` around lines 182 - 193,
MetricDonutChart's acceptsDataFrom only lists "searchAggregatedMetrics" but
should also include "searchLogsAggregate" to match MetricBarChart and
MetricLeaderboard and support the same AggregatedMetricRow shape; update the
MetricDonutChart entry (symbol: MetricDonutChart in observability-catalog) to
add "searchLogsAggregate" to the acceptsDataFrom array so it accepts both
sources.
packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts (1)

255-261: 💤 Low value

Consider adjusting grid columns for single-child row.

The kpi-row-3 Grid uses columns: 4 but contains only 1 child (kpi-cache-writes). This creates unnecessary whitespace. Consider setting columns: 1 or adding additional KPI metrics to fill the row for visual balance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`
around lines 255 - 261, The Grid node "kpi-row-3" defines props.columns as 4
while only containing a single child "kpi-cache-writes", causing excess
whitespace; update the Grid props for "kpi-row-3" to use columns: 1 (or add more
KPI children) so the layout fits a single-item row, locate the "kpi-row-3"
object in the fixture and change its props.columns value (or append additional
child keys) to resolve the imbalance.
packages/sqlite-datasource/src/db-datasource.ts (1)

1108-1115: ⚡ Quick win

getAggregatedLogs ignores filter.limit and hardcodes 1000.

getAggregatedMetrics (Line 745) honors filter.limit ?? 100, while getAggregatedLogs always emits LIMIT 1000 regardless of what the caller requested. That divergence means an API/CLI consumer asking for, say, top‑10 groupings will still get up to 1000 rows back from this datasource, forcing trimming upstream and burning DB work. Prefer the same filter.limit ?? <default> pattern (capped at 1000 if you want a hard ceiling) for consistency across aggregations.

♻️ Proposed change
   async getAggregatedLogs(filter: dataFilterSchemas.LogsDataFilter): Promise<{
     data: denormalizedSignals.AggregatedLogRow[];
     nextCursor: null;
   }> {
     try {
+      const limit = Math.min(filter.limit ?? 100, 1000);
       const groupByKeys = filter.groupBy ?? [];
@@
-      // ORDER BY value DESC, LIMIT 1000 (max page size for aggregations)
-      query = query.orderBy(kyselySql`value`, "desc").limit(1000);
+      // ORDER BY value DESC, LIMIT (capped at 1000 for aggregations)
+      query = query.orderBy(kyselySql`value`, "desc").limit(limit);

Also applies to: 1197-1198

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sqlite-datasource/src/db-datasource.ts` around lines 1108 - 1115,
getAggregatedLogs currently ignores filter.limit and always applies LIMIT 1000;
change it to use the same pattern as getAggregatedMetrics by reading
filter.limit and defaulting to the intended default (e.g., filter.limit ?? 100),
then apply a cap (Math.min(limit, 1000)) before passing it to the query builder;
update the LIMIT usage in getAggregatedLogs (and the related query/limit calls
around the other noted spots) to use that computed cappedLimit so callers
requesting fewer rows (e.g., top-10) get the correct number while still
enforcing a 1000 hard ceiling.
packages/clickhouse-datasource/src/datasource.test.ts (2)

2500-2528: ⚡ Quick win

Include getMetricsTimeSeries in clickhouseSettings forwarding coverage.

The new method should be part of the forwarding matrix to prevent silent regressions in clickhouse_settings propagation.

✅ Suggested test addition
     it.each([
       ["getLogs", () => localDs.getLogs({ requestContext: ctx })],
       [
         "getAggregatedLogs",
         () => localDs.getAggregatedLogs({ requestContext: ctx }),
       ],
       [
         "getMetrics",
         () => localDs.getMetrics({ metricType: "Gauge", requestContext: ctx }),
       ],
+      [
+        "getMetricsTimeSeries",
+        () =>
+          localDs.getMetricsTimeSeries({
+            metricType: "Gauge",
+            timeBucket: "1d",
+            requestContext: ctx,
+          }),
+      ],
       [
         "getAggregatedMetrics",
         () =>
           localDs.getAggregatedMetrics({
             metricType: "Gauge",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clickhouse-datasource/src/datasource.test.ts` around lines 2500 -
2528, The test matrix misses getMetricsTimeSeries: add an entry to the it.each
array to call localDs.getMetricsTimeSeries with the same pattern as the other
metric tests (e.g., () => localDs.getMetricsTimeSeries({ metricType: "Gauge",
requestContext: ctx })) so the "%s forwards clickhouseSettings" test covers
propagation for getMetricsTimeSeries alongside getMetrics/getAggregatedMetrics.

1952-2049: ⚡ Quick win

Add one Gauge test case for getMetricsTimeSeries.

This suite currently validates only Sum. Since production supports both Gauge and Sum, a minimal Gauge case would protect that branch from regressions.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clickhouse-datasource/src/datasource.test.ts` around lines 1952 -
2049, The test suite only covers metricType "Sum" for getMetricsTimeSeries; add
a parallel test case that calls ds.getMetricsTimeSeries with metricType: "Gauge"
(use the same timeBucket, groupBy, aggregate and requestContext helpers) to
exercise the Gauge code-path, assert nextCursor/null, validate data rows have
groups, timeBucketNs (compare to DAY_1_NS/ DAY_2_NS constants where
appropriate), numeric value types, and ordering (use expectAscending on
BigInt(timeBucketNs)) similar to the existing "Sum" tests so the Gauge branch is
protected from regressions.
packages/sqlite-datasource/src/datasource-read.test.ts (1)

2315-2692: ⚡ Quick win

Add a Gauge scenario in getMetricsTimeSeries tests.

Current coverage validates only Sum, but the implementation supports Gauge too. Adding one Gauge case would close that branch gap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/sqlite-datasource/src/datasource-read.test.ts` around lines 2315 -
2692, Add a new test case exercising the Gauge branch of getMetricsTimeSeries:
create a short it(...) that inserts gauge datapoints (use the existing helper
analogous to createInsertSum, e.g. createInsertGauge or insertGauge) with
timestamps that bucket together, call readDs.getMetricsTimeSeries with
metricType: "Gauge" (same metricName, aggregate and groupBy as appropriate), and
assert the returned data length, timeBucketNs values and numeric values match
expected gauge aggregation; reference getMetricsTimeSeries and the insert helper
so the test mirrors one of the existing Sum scenarios (e.g., 1m or 1h) but uses
"Gauge" to cover that branch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/api/src/routes/metrics.ts`:
- Around line 28-34: The timeseriesBodySchema refinement only checks timeBucket
but tests expect aggregate and groupBy to be required too; update the refinement
on dataFilterSchemas.metricsDataFilterSchema (the timeseriesBodySchema
definition) to also validate that d.aggregate and d.groupBy are present (and
non-empty if appropriate) and return the same 400 behavior when missing, or
alternatively ensure the base schema (dataFilterSchemas.metricsDataFilterSchema)
includes required checks for aggregate and groupBy so timeseriesBodySchema no
longer needs extra refinements; locate the timeseriesBodySchema symbol and add
the additional refine clauses (or extend the existing refine to check all three
fields) so signals.test.ts passes.

In `@packages/clickhouse-datasource/src/query-logs.ts`:
- Around line 197-236: The aggregated query builder buildAggregatedLogsQuery
currently ignores filter.limit and always uses AGGREGATED_LOGS_LIMIT; update the
function to respect a user-provided limit but cap it by AGGREGATED_LOGS_LIMIT
(e.g., const limit = Math.min(Number.isFinite(filter.limit) ? Math.max(0,
Math.floor(filter.limit)) : AGGREGATED_LOGS_LIMIT, AGGREGATED_LOGS_LIMIT)) and
use that computed limit in the final query LIMIT clause (replace LIMIT
${String(AGGREGATED_LOGS_LIMIT)} with LIMIT ${String(limit)}); ensure you
coerce/validate filter.limit to a non-negative integer before applying Math.min
to avoid injection or NaN issues.

In `@packages/clickhouse-datasource/src/query-metrics.ts`:
- Line 370: After looking up interval with const interval =
TIME_BUCKET_INTERVAL_MAP[timeBucket]; add an explicit guard that checks whether
interval is undefined (or falsy) and handle it immediately: throw a clear,
descriptive error (or return a rejected Promise) that includes the invalid
timeBucket value and allowed keys from TIME_BUCKET_INTERVAL_MAP so SQL
generation stops early; update the code around the TIME_BUCKET_INTERVAL_MAP
lookup (the timeBucket variable and any calling function that builds the
ClickHouse SQL) to perform this validation before proceeding.

In `@packages/sqlite-datasource/src/db-datasource.ts`:
- Around line 785-790: The error message inside getMetricsTimeSeries is
misleading: when metricType is neither "Gauge" nor "Sum" it currently throws
`aggregate is not supported for ${metricType}` (copied from
getAggregatedMetrics). Update the throw in the getMetricsTimeSeries function to
a clear, distinct message such as `"time-series metrics not supported for
${metricType}"` (or similar) so callers can distinguish time-bucket/path errors
from aggregation errors; keep the check on the metricType variable and leave the
subsequent timeBucket-required check unchanged.

In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`:
- Around line 79-91: The Grid "kpi-row-1" currently lists five children
("kpi-cost", "kpi-sessions", "kpi-commits", "kpi-prs", "kpi-loc-added") but its
props set columns: 4, causing the last child to wrap; fix this by either
updating the Grid props.columns to 5 on the "kpi-row-1" object or
removing/relocating the extra child "kpi-loc-added" so the number of children
matches the columns, and ensure any consumer expecting a single-row KPI layout
references the updated "kpi-row-1" structure.

In `@packages/ui/src/components/observability/MetricBarChart/index.tsx`:
- Around line 168-170: The current renderData transformation (const renderData =
logScale ? data.map((d) => ({ ...d, value: Math.max(1, d.value) })) : data)
mutates the value used everywhere, causing tooltips to show clamped values;
instead preserve original values for display by creating a separate clampedValue
used only for rendering (e.g., keep objects as { ...d, originalValue: d.value,
clampedValue: Math.max(1, d.value) } or similar) and update the chart rendering
code to use clampedValue while the tooltip/label logic uses originalValue (or
show an explicit "adjusted for log scale" note when originalValue < 1). Ensure
changes touch renderData and any tooltip/display code that currently reads
d.value so tooltips use originalValue while bars use clampedValue.

In `@packages/ui/src/components/observability/MetricLeaderboard/index.tsx`:
- Around line 36-48: coerceGroupLabel currently preserves empty segments (e.g.
"{skill: '', user: 'alice'}" -> " / alice") which differs from MetricBarChart
and MetricDonutChart that render empty segments as "(no value)"; update
coerceGroupLabel to replace each null/undefined/empty-string segment with "(no
value)" before joining so it produces "(no value) / alice", and consider
extracting the logic into a shared utility used by MetricBarChart,
MetricDonutChart and coerceGroupLabel to ensure consistent labeling across
components.

In `@packages/ui/src/components/observability/renderers/OtelMetricTimeSeries.tsx`:
- Around line 36-46: The synthesiseRows function is incorrectly marking
time-bucketed points as MetricType: "Sum"; change it to MetricType: "Gauge" and
ensure StartTimeUnix is the fixed zero epoch used by fixtures (e.g. ts(0)) while
TimeUnix remains row.timeBucketNs, so the produced OtelMetricsRow objects (from
synthesiseRows handling TimeseriesMetricRow → OtelMetricsRow) reflect
instantaneous/gauge semantics rather than cumulative Sum semantics.

---

Nitpick comments:
In `@packages/clickhouse-datasource/src/datasource.test.ts`:
- Around line 2500-2528: The test matrix misses getMetricsTimeSeries: add an
entry to the it.each array to call localDs.getMetricsTimeSeries with the same
pattern as the other metric tests (e.g., () => localDs.getMetricsTimeSeries({
metricType: "Gauge", requestContext: ctx })) so the "%s forwards
clickhouseSettings" test covers propagation for getMetricsTimeSeries alongside
getMetrics/getAggregatedMetrics.
- Around line 1952-2049: The test suite only covers metricType "Sum" for
getMetricsTimeSeries; add a parallel test case that calls
ds.getMetricsTimeSeries with metricType: "Gauge" (use the same timeBucket,
groupBy, aggregate and requestContext helpers) to exercise the Gauge code-path,
assert nextCursor/null, validate data rows have groups, timeBucketNs (compare to
DAY_1_NS/ DAY_2_NS constants where appropriate), numeric value types, and
ordering (use expectAscending on BigInt(timeBucketNs)) similar to the existing
"Sum" tests so the Gauge branch is protected from regressions.

In `@packages/sqlite-datasource/src/datasource-read.test.ts`:
- Around line 2315-2692: Add a new test case exercising the Gauge branch of
getMetricsTimeSeries: create a short it(...) that inserts gauge datapoints (use
the existing helper analogous to createInsertSum, e.g. createInsertGauge or
insertGauge) with timestamps that bucket together, call
readDs.getMetricsTimeSeries with metricType: "Gauge" (same metricName, aggregate
and groupBy as appropriate), and assert the returned data length, timeBucketNs
values and numeric values match expected gauge aggregation; reference
getMetricsTimeSeries and the insert helper so the test mirrors one of the
existing Sum scenarios (e.g., 1m or 1h) but uses "Gauge" to cover that branch.

In `@packages/sqlite-datasource/src/db-datasource.ts`:
- Around line 1108-1115: getAggregatedLogs currently ignores filter.limit and
always applies LIMIT 1000; change it to use the same pattern as
getAggregatedMetrics by reading filter.limit and defaulting to the intended
default (e.g., filter.limit ?? 100), then apply a cap (Math.min(limit, 1000))
before passing it to the query builder; update the LIMIT usage in
getAggregatedLogs (and the related query/limit calls around the other noted
spots) to use that computed cappedLimit so callers requesting fewer rows (e.g.,
top-10) get the correct number while still enforcing a 1000 hard ceiling.

In `@packages/ui-core/src/lib/observability-catalog.ts`:
- Around line 182-193: MetricDonutChart's acceptsDataFrom only lists
"searchAggregatedMetrics" but should also include "searchLogsAggregate" to match
MetricBarChart and MetricLeaderboard and support the same AggregatedMetricRow
shape; update the MetricDonutChart entry (symbol: MetricDonutChart in
observability-catalog) to add "searchLogsAggregate" to the acceptsDataFrom array
so it accepts both sources.

In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`:
- Around line 255-261: The Grid node "kpi-row-3" defines props.columns as 4
while only containing a single child "kpi-cache-writes", causing excess
whitespace; update the Grid props for "kpi-row-3" to use columns: 1 (or add more
KPI children) so the layout fits a single-item row, locate the "kpi-row-3"
object in the fixture and change its props.columns value (or append additional
child keys) to resolve the imbalance.

In `@packages/ui/src/components/observability/MetricBarChart/index.tsx`:
- Around line 61-68: The three label helpers (coerceGroupLabel in
MetricBarChart, coerceLabel in MetricDonutChart, and the label function in
MetricLeaderboard) duplicate logic and handle empty/null differently; extract a
single utility (e.g., export function coerceMetricLabel or coerceGroupLabel)
into a shared module (suggested ../utils/labels.ts) that normalizes empty
string, null, and undefined to NO_VALUE_LABEL and joins parts with " / "
consistently, then import and use that utility from MetricBarChart
(coerceGroupLabel), MetricDonutChart (coerceLabel), and MetricLeaderboard so all
components display "(no value)" uniformly.

In
`@packages/ui/src/components/observability/MetricBarChart/MetricBarChart.stories.tsx`:
- Line 73: Replace the verbose globalThis.Error usage with the standard Error
constructor: change the error property assignment (in
MetricBarChart.stories.tsx, the object containing error: new
globalThis.Error("Failed to load bar chart data")) to use new Error("Failed to
load bar chart data") so the story uses the conventional Error instantiation.

In
`@packages/ui/src/components/observability/MetricDonutChart/MetricDonutChart.stories.tsx`:
- Line 59: Replace the nonstandard constructor usage in the MetricDonutChart
story: change the error construction from new globalThis.Error("Failed to load
donut") to the standard new Error("Failed to load donut") in the story args
(look for the args object where rows and error are set in
MetricDonutChart.stories.tsx) so the story uses the normal Error constructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e6e53902-ac61-4168-bd04-8580ed8c2bb4

📥 Commits

Reviewing files that changed from the base of the PR and between d2bbff8 and 8ec4d7e.

⛔ Files ignored due to path filters (1)
  • packages/ui-core/src/lib/__snapshots__/generate-prompt-instructions.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (57)
  • examples/ui-react-app/src/custom-observability-catalog.tsx
  • packages/api/src/index.test.ts
  • packages/api/src/routes/logs.ts
  • packages/api/src/routes/metrics.ts
  • packages/api/src/signals.test.ts
  • packages/cli/AGENTS.md
  • packages/cli/src/commands/logs.test.ts
  • packages/cli/src/commands/logs.ts
  • packages/cli/src/commands/metrics.test.ts
  • packages/cli/src/commands/metrics.ts
  • packages/clickhouse-datasource/src/datasource.test.ts
  • packages/clickhouse-datasource/src/datasource.ts
  • packages/clickhouse-datasource/src/query-logs.test.ts
  • packages/clickhouse-datasource/src/query-logs.ts
  • packages/clickhouse-datasource/src/query-metrics.test.ts
  • packages/clickhouse-datasource/src/query-metrics.ts
  • packages/core/package.json
  • packages/core/src/data-filters-zod.test.ts
  • packages/core/src/data-filters-zod.ts
  • packages/core/src/denormalized-signals-zod.test.ts
  • packages/core/src/denormalized-signals-zod.ts
  • packages/core/src/telemetry-datasource.ts
  • packages/sdk/src/client.test.ts
  • packages/sdk/src/client.ts
  • packages/sdk/src/index.ts
  • packages/sdk/src/mocks/handlers.ts
  • packages/sdk/src/types.ts
  • packages/sqlite-datasource/src/datasource-read.test.ts
  • packages/sqlite-datasource/src/db-datasource.ts
  • packages/sqlite-datasource/src/optimized-datasource.ts
  • packages/ui-core/src/hooks/use-kopai-data.test.ts
  • packages/ui-core/src/hooks/use-kopai-data.ts
  • packages/ui-core/src/hooks/use-live-logs.test.ts
  • packages/ui-core/src/lib/component-catalog.ts
  • packages/ui-core/src/lib/observability-catalog.ts
  • packages/ui-core/src/providers/kopai-provider.tsx
  • packages/ui/src/components/observability/DynamicDashboard/DynamicDashboard.test.tsx
  • packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.test.ts
  • packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts
  • packages/ui/src/components/observability/DynamicDashboard/fixtures/index.ts
  • packages/ui/src/components/observability/DynamicDashboard/index.tsx
  • packages/ui/src/components/observability/MetricBarChart/MetricBarChart.stories.tsx
  • packages/ui/src/components/observability/MetricBarChart/index.test.tsx
  • packages/ui/src/components/observability/MetricBarChart/index.tsx
  • packages/ui/src/components/observability/MetricDonutChart/MetricDonutChart.stories.tsx
  • packages/ui/src/components/observability/MetricDonutChart/index.test.tsx
  • packages/ui/src/components/observability/MetricDonutChart/index.tsx
  • packages/ui/src/components/observability/MetricLeaderboard/MetricLeaderboard.stories.tsx
  • packages/ui/src/components/observability/MetricLeaderboard/index.test.tsx
  • packages/ui/src/components/observability/MetricLeaderboard/index.tsx
  • packages/ui/src/components/observability/index.ts
  • packages/ui/src/components/observability/renderers/OtelMetricBarChart.tsx
  • packages/ui/src/components/observability/renderers/OtelMetricDonutChart.tsx
  • packages/ui/src/components/observability/renderers/OtelMetricLeaderboard.tsx
  • packages/ui/src/components/observability/renderers/OtelMetricTimeSeries.tsx
  • packages/ui/src/components/observability/renderers/index.ts
  • packages/ui/src/pages/observability.test.tsx

Comment on lines +28 to +34
const timeseriesBodySchema = dataFilterSchemas.metricsDataFilterSchema.refine(
(d) => d.timeBucket !== undefined,
{
message: "timeBucket is required for /timeseries endpoint",
path: ["timeBucket"],
}
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Incomplete validation for timeseries endpoint.

The route description (line 62) states that the endpoint requires aggregate, groupBy, AND timeBucket, but the refinement only validates timeBucket. The tests in signals.test.ts (lines 570-604) expect all three fields to be validated and return 400 when missing.

Either add refinements for aggregate and groupBy, or verify that the base schema already enforces these requirements and update the tests accordingly.

🔧 Proposed fix to validate all required fields
-const timeseriesBodySchema = dataFilterSchemas.metricsDataFilterSchema.refine(
-  (d) => d.timeBucket !== undefined,
-  {
-    message: "timeBucket is required for /timeseries endpoint",
-    path: ["timeBucket"],
-  }
-);
+const timeseriesBodySchema = dataFilterSchemas.metricsDataFilterSchema
+  .refine((d) => d.aggregate !== undefined, {
+    message: "aggregate is required for /timeseries endpoint",
+    path: ["aggregate"],
+  })
+  .refine((d) => d.groupBy !== undefined && d.groupBy.length > 0, {
+    message: "groupBy is required for /timeseries endpoint",
+    path: ["groupBy"],
+  })
+  .refine((d) => d.timeBucket !== undefined, {
+    message: "timeBucket is required for /timeseries endpoint",
+    path: ["timeBucket"],
+  });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const timeseriesBodySchema = dataFilterSchemas.metricsDataFilterSchema.refine(
(d) => d.timeBucket !== undefined,
{
message: "timeBucket is required for /timeseries endpoint",
path: ["timeBucket"],
}
);
const timeseriesBodySchema = dataFilterSchemas.metricsDataFilterSchema
.refine((d) => d.aggregate !== undefined, {
message: "aggregate is required for /timeseries endpoint",
path: ["aggregate"],
})
.refine((d) => d.groupBy !== undefined && d.groupBy.length > 0, {
message: "groupBy is required for /timeseries endpoint",
path: ["groupBy"],
})
.refine((d) => d.timeBucket !== undefined, {
message: "timeBucket is required for /timeseries endpoint",
path: ["timeBucket"],
});
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/api/src/routes/metrics.ts` around lines 28 - 34, The
timeseriesBodySchema refinement only checks timeBucket but tests expect
aggregate and groupBy to be required too; update the refinement on
dataFilterSchemas.metricsDataFilterSchema (the timeseriesBodySchema definition)
to also validate that d.aggregate and d.groupBy are present (and non-empty if
appropriate) and return the same 400 behavior when missing, or alternatively
ensure the base schema (dataFilterSchemas.metricsDataFilterSchema) includes
required checks for aggregate and groupBy so timeseriesBodySchema no longer
needs extra refinements; locate the timeseriesBodySchema symbol and add the
additional refine clauses (or extend the existing refine to check all three
fields) so signals.test.ts passes.

Comment thread packages/clickhouse-datasource/src/query-logs.ts
if (!timeBucket) {
throw new Error("buildMetricsTimeSeriesQuery requires filter.timeBucket");
}
const interval = TIME_BUCKET_INTERVAL_MAP[timeBucket];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add an explicit guard for unsupported timeBucket values.

Line 370 indexes the interval map but never validates the result. If an unvalidated caller passes an unexpected bucket, SQL generation fails later with a less clear ClickHouse parse error.

💡 Suggested fix
   const timeBucket = filter.timeBucket;
   if (!timeBucket) {
     throw new Error("buildMetricsTimeSeriesQuery requires filter.timeBucket");
   }
   const interval = TIME_BUCKET_INTERVAL_MAP[timeBucket];
+  if (!interval) {
+    throw new Error(`Unsupported timeBucket: ${String(timeBucket)}`);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const interval = TIME_BUCKET_INTERVAL_MAP[timeBucket];
const timeBucket = filter.timeBucket;
if (!timeBucket) {
throw new Error("buildMetricsTimeSeriesQuery requires filter.timeBucket");
}
const interval = TIME_BUCKET_INTERVAL_MAP[timeBucket];
if (!interval) {
throw new Error(`Unsupported timeBucket: ${String(timeBucket)}`);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/clickhouse-datasource/src/query-metrics.ts` at line 370, After
looking up interval with const interval = TIME_BUCKET_INTERVAL_MAP[timeBucket];
add an explicit guard that checks whether interval is undefined (or falsy) and
handle it immediately: throw a clear, descriptive error (or return a rejected
Promise) that includes the invalid timeBucket value and allowed keys from
TIME_BUCKET_INTERVAL_MAP so SQL generation stops early; update the code around
the TIME_BUCKET_INTERVAL_MAP lookup (the timeBucket variable and any calling
function that builds the ClickHouse SQL) to perform this validation before
proceeding.

Comment thread packages/sqlite-datasource/src/db-datasource.ts
Comment thread packages/ui/src/components/observability/MetricBarChart/index.tsx
Comment on lines +36 to +46
function synthesiseRows(rows: TimeseriesMetricRow[]): OtelMetricsRow[] {
return rows.map(
(row) =>
({
MetricType: "Sum",
TimeUnix: row.timeBucketNs,
StartTimeUnix: row.timeBucketNs,
Attributes: row.groups,
Value: row.value,
}) satisfies OtelMetricsRow
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how MetricTimeSeries component handles different metric types
ast-grep --pattern 'MetricType.*Sum|MetricType.*Gauge' 

# Look for buildMetrics or similar functions that process these rows
rg -nP --type=ts 'buildMetrics|MetricType.*===.*["\'](?:Sum|Gauge)' packages/ui/src/components/observability/

# Check the OtelMetricsRow schema definition
rg -nA10 --type=ts 'type OtelMetricsRow|interface OtelMetricsRow' packages/core/

Repository: kopai-app/kopai-mono

Length of output: 261


🏁 Script executed:

# First, let's find the OtelMetricsRow type definition
rg -n "type OtelMetricsRow|interface OtelMetricsRow" --type ts -A 15

Repository: kopai-app/kopai-mono

Length of output: 13203


🏁 Script executed:

# Look for MetricTimeSeries component and how it uses MetricType
fd -e tsx -e ts "MetricTimeSeries" | head -20

Repository: kopai-app/kopai-mono

Length of output: 227


🏁 Script executed:

# Search for Sum vs Gauge usage in the codebase
rg "MetricType.*Sum|MetricType.*Gauge" --type ts | head -20

Repository: kopai-app/kopai-mono

Length of output: 1984


🏁 Script executed:

# Check how StartTimeUnix and TimeUnix are used in other metric creation code
rg "StartTimeUnix|TimeUnix" --type ts -B 2 -A 2 | head -50

Repository: kopai-app/kopai-mono

Length of output: 3683


🏁 Script executed:

rg -n "otelMetricsSchema|const otelMetricsSchema" --type ts -A 20 packages/core/src/denormalized-signals-zod.ts

Repository: kopai-app/kopai-mono

Length of output: 1519


🏁 Script executed:

# Also check TimeseriesMetricRow definition
rg -n "timeseriesMetricSchema|type TimeseriesMetricRow" --type ts -A 15 packages/core/src/denormalized-signals-zod.ts

Repository: kopai-app/kopai-mono

Length of output: 653


🏁 Script executed:

# Check the MetricTimeSeries component to see if it uses StartTimeUnix differently
cat packages/ui/src/components/observability/MetricTimeSeries/index.tsx | head -100

Repository: kopai-app/kopai-mono

Length of output: 2437


🏁 Script executed:

rg -n "otelSumSchema" --type ts -B 5 -A 30 packages/core/src/denormalized-signals-zod.ts | head -60

Repository: kopai-app/kopai-mono

Length of output: 2392


🏁 Script executed:

# Also check how otelGaugeSchema differs from otelSumSchema
rg -n "otelGaugeSchema" --type ts -B 5 -A 30 packages/core/src/denormalized-signals-zod.ts | head -60

Repository: kopai-app/kopai-mono

Length of output: 2345


🏁 Script executed:

# Check if there are any comments or documentation about the semantic meaning
rg -n "StartTimeUnix|TimeUnix" --type ts -B 3 -A 3 packages/core/src/denormalized-signals-zod.ts | head -80

Repository: kopai-app/kopai-mono

Length of output: 1157


🏁 Script executed:

rg -n "AggregationTemporality|DELTA|CUMULATIVE" --type ts packages/ui/src/components/observability/renderers/OtelMetricTimeSeries.tsx packages/core/src/denormalized-signals-zod.ts

Repository: kopai-app/kopai-mono

Length of output: 643


🏁 Script executed:

# Check test fixtures to understand the semantic intent
cat packages/ui/src/components/observability/__fixtures__/metrics.ts | grep -A 30 "Sum\|Gauge" | head -80

Repository: kopai-app/kopai-mono

Length of output: 2390


🏁 Script executed:

# Check if any component actually uses StartTimeUnix for logic beyond display
rg "StartTimeUnix" --type ts packages/ui/src/components/observability/

Repository: kopai-app/kopai-mono

Length of output: 1091


🏁 Script executed:

# Check if MetricTimeSeries or related components filter/process based on metric type or time fields
rg -A 10 "MetricType.*Sum|row.MetricType" --type ts packages/ui/src/components/observability/MetricTimeSeries/ packages/ui/src/components/observability/MetricStat/

Repository: kopai-app/kopai-mono

Length of output: 2485


Use Gauge instead of Sum for time-bucketed metrics.

The synthesised rows set MetricType: "Sum" with StartTimeUnix and TimeUnix both equal to row.timeBucketNs. This violates OTEL semantics:

  • Sum represents cumulative or delta aggregations requiring distinct StartTimeUnix (period start) and TimeUnix (period end). Setting both equal implies an instantaneous measurement.
  • Gauge represents instantaneous values, which matches the per-bucket aggregation semantics of timeseries queries. The fixtures confirm this: Gauge uses StartTimeUnix: ts(0) (fixed) and TimeUnix: ts(varying).

Change to MetricType: "Gauge" to align with the semantic meaning of time-bucketed data points.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/ui/src/components/observability/renderers/OtelMetricTimeSeries.tsx`
around lines 36 - 46, The synthesiseRows function is incorrectly marking
time-bucketed points as MetricType: "Sum"; change it to MetricType: "Gauge" and
ensure StartTimeUnix is the fixed zero epoch used by fixtures (e.g. ts(0)) while
TimeUnix remains row.timeBucketNs, so the produced OtelMetricsRow objects (from
synthesiseRows handling TimeseriesMetricRow → OtelMetricsRow) reflect
instantaneous/gauge semantics rather than cumulative Sum semantics.

@Vunovati Vunovati changed the title Feat/cc obsdervability improvements Feat/cc observability improvements May 16, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts (1)

24-740: ⚡ Quick win

Extract repeated literals into constants to prevent config drift.

"claude-code" and claude_code.* prefixes are repeated across many data sources. A local constant (or small helper) would reduce typo risk and make future service/namespace renames safer.

Suggested refactor
+const CLAUDE_CODE_SERVICE = "claude-code";
+const CLAUDE_CODE_METRIC_PREFIX = "claude_code";
+
 export const claudeCodeDashboard = {
   root: "root",
   elements: {
@@
-          metricName: "claude_code.cost.usage",
+          metricName: `${CLAUDE_CODE_METRIC_PREFIX}.cost.usage`,
@@
-          serviceName: "claude-code",
+          serviceName: CLAUDE_CODE_SERVICE,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`
around lines 24 - 740, The dashboard repeats the service name "claude-code" and
metric prefix "claude_code." across many dataSource entries; extract these into
local constants (e.g. SERVICE_NAME and METRIC_PREFIX) at the top of the file and
use them when building dataSource.params (replace literal serviceName values in
entries like "activity-timeline", "tool-usage-chart", "tool-acceptance-chart",
"trace-detail" and metricName values in Metric* entries such as
"cost-time-chart", "cost-by-model-chart", KPI entries like
"kpi-cost"/"kpi-sessions"/"kpi-commits"/"kpi-prs"/"kpi-loc-added" etc.) so that
all occurrences are composed from the constants (e.g. serviceName: SERVICE_NAME
and metricName: `${METRIC_PREFIX}cost.usage`) to avoid duplication and reduce
typo risk while keeping the existing keys like claudeCodeDashboard unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`:
- Around line 13-14: Update the docblock that currently reads "MetricStat × 9"
to reflect the actual number of KPIs defined in this fixture by changing it to
"MetricStat × 10" (look for the docblock line containing the text "MetricStat ×
9" in the claude-code-dashboard fixture and update it).

---

Nitpick comments:
In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`:
- Around line 24-740: The dashboard repeats the service name "claude-code" and
metric prefix "claude_code." across many dataSource entries; extract these into
local constants (e.g. SERVICE_NAME and METRIC_PREFIX) at the top of the file and
use them when building dataSource.params (replace literal serviceName values in
entries like "activity-timeline", "tool-usage-chart", "tool-acceptance-chart",
"trace-detail" and metricName values in Metric* entries such as
"cost-time-chart", "cost-by-model-chart", KPI entries like
"kpi-cost"/"kpi-sessions"/"kpi-commits"/"kpi-prs"/"kpi-loc-added" etc.) so that
all occurrences are composed from the constants (e.g. serviceName: SERVICE_NAME
and metricName: `${METRIC_PREFIX}cost.usage`) to avoid duplication and reduce
typo risk while keeping the existing keys like claudeCodeDashboard unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d6735b2-fb56-49fe-bb51-a3136e3388d0

📥 Commits

Reviewing files that changed from the base of the PR and between 8ec4d7e and 9b93aad.

📒 Files selected for processing (6)
  • packages/clickhouse-datasource/src/query-logs.ts
  • packages/sqlite-datasource/src/db-datasource.ts
  • packages/ui-core/src/lib/observability-catalog.ts
  • packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts
  • packages/ui/src/components/observability/MetricBarChart/index.tsx
  • packages/ui/src/components/observability/MetricLeaderboard/index.tsx
🚧 Files skipped from review as they are similar to previous changes (5)
  • packages/ui-core/src/lib/observability-catalog.ts
  • packages/ui/src/components/observability/MetricLeaderboard/index.tsx
  • packages/sqlite-datasource/src/db-datasource.ts
  • packages/ui/src/components/observability/MetricBarChart/index.tsx
  • packages/clickhouse-datasource/src/query-logs.ts

Comment on lines +13 to +14
* 1. KPIs — MetricStat × 9 (cost, sessions, commits, PRs, LOC ± ,
* active cli/user, cache reads/writes)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Docblock KPI count is incorrect.

Line 13 says MetricStat × 9, but this fixture defines 10 KPI stats (4 + 4 + 2). Please update the comment to avoid drift/confusion during maintenance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/ui/src/components/observability/DynamicDashboard/fixtures/claude-code-dashboard.ts`
around lines 13 - 14, Update the docblock that currently reads "MetricStat × 9"
to reflect the actual number of KPIs defined in this fixture by changing it to
"MetricStat × 10" (look for the docblock line containing the text "MetricStat ×
9" in the claude-code-dashboard fixture and update it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant