Skip to content

feat: derived entity types (SQL-view backing) + measure inference#1161

Merged
buremba merged 30 commits into
mainfrom
feat/metric-derived-entities
May 31, 2026
Merged

feat: derived entity types (SQL-view backing) + measure inference#1161
buremba merged 30 commits into
mainfrom
feat/metric-derived-entities

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 30, 2026

What

Adds derived entity types — an entity type can be a read-only SQL view over other relations instead of inserted rows. Its aggregate columns are inferred as measures (with a re-aggregation rule); grouping columns become dimensions. Stored entities are the default and are unchanged.

This is the substrate for filling things like subscriptions/spend from the event stream: define a view, get governed measures, no hand-annotation for the common cases.

Model (kept deliberately small)

  • One new authored key: defineEntityType({ backing: { sql, grain?, source? } }). Presence of backing ⇒ derived; absence ⇒ stored. No mode flag (derived ⟺ backing_sql IS NOT NULL).
  • Measures aren't a separate noun — they're columns of the view, annotated x-measure on the existing properties/x- convention. measure()/dimension() helpers + a ReaggRule type.
  • DB (decision B): typed backing_sql / backing_grain (text[]) / backing_source columns on entity_types (idempotent migration). The tool/CLI surface uses a single nested backing arg.

Inference / re-agg classifier

Parses the view SELECT (node-sql-parser) and classifies each projection column:
SUM/COUNT → additive · COUNT(DISTINCT)/MEDIAN/PERCENTILE → holistic · AVG and a/b → ratio · MIN/MAX → extremum · grouping cols → dimension. Inferred roles merge into metadata_schema; author-declared annotations always win. Reverting a type to stored strips the derived-only annotations.

Apply round-trip

defineEntityType → map-config → desired-state → diff → client upsert → server create/update → list/hoist. Stored entities carry no backing on either side, so they never churn the diff.

Tests / validation

  • Server integration (real embedded Postgres): create → get → update → revert-to-stored, the migration, and inference — 9 pass. (This run caught + fixed a real text[] write-encoding bug, now pgTextArray + ::text[].)
  • CLI wire payload (stubbed fetch, exact HTTP body): upsertEntityType sends nested backing for derived / null for stored.
  • Unit: the classifier (infer-measures) + applyInferredMeasures declared-wins + mapEntityType backing mapping.
  • typecheck: cli + server green.
  • Not run: a full live lobu run + lobu apply — it would boot a long-lived gateway against the shared dev DATABASE_URL (dev-DB pollution risk) and only adds the unchanged routing/transport hop, which is bracketed by the wire-payload + server-integration tests above.

CI note

If the integration job shows red, it's pre-existing environmental failures (isolated-vm not built for the runner, url-builder/public-origin/public-pages env-var tests) — none in files this branch touches; this branch's own suites pass in the same run.

Scope / deferred (intentional)

  • Cross-relation measures (trip.cost over events in a window — over/attribution) are deferred; trips/assets can be modeled as derived views in the meantime.
  • node-sql-parser retained: polyglot 0.1.15 doesn't expose usable table/projection extraction (getTables internal; getSourceTables throws on its untyped parse().ast), and the org-scoping path can't ride a reverse-engineered AST.
  • A clean single-pi code verdict is pending the Codex weekly-quota reset.

Follow-ups

  1. Trips/assets as derived views.
  2. The read path (query_sql / view template) that surfaces these measures with re-aggregation.

🤖 Generated with Claude Code


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag @codesmith with what you need. Autofix is disabled.

Summary by CodeRabbit

  • New Features

    • Add support for SQL-view–backed (derived) entity types with configurable backing SQL, canonical grain, and source; can be set, updated, or cleared.
    • New declarative helpers for marking measures/dimensions in entity configs.
  • Improvements

    • Automatic inference of measure/dimension annotations from backing SQL, merged with author-declared annotations.
    • More robust query parsing and scoping/validation for derived backings; deterministic handling when writing backing state.
  • Bug Fixes

    • Diffing ignores server-inferred annotations unless explicitly declared, preventing spurious updates.
  • Tests

    • Added unit and integration tests for backing lifecycle, inference, querying, normalization, and diff idempotency.

Review Change Stack

buremba added 3 commits May 29, 2026 18:46
…tests

Add backing to entity types: a derived type is a read-only SQL view
(backing_sql) over other relations whose aggregate columns become measures;
stored is the default (absence of backing). Typed columns backing_sql/
backing_grain/backing_source on entity_types; full lobu apply round-trip
(map-config -> desired-state -> diff -> client upsert -> server create/update
-> list/hoist) plus SDK measure()/dimension() helpers and ReaggRule.

Validated: cli unit tests + server integration round-trip against real PG
(create/get/update/revert/stored), which caught a text[] write-encoding bug
(fixed via pgTextArray + ::text[]).
Step 2: parse a derived entity's backing SQL (node-sql-parser astify) and
classify each projection column — aggregates become measures with a re-agg
rule (SUM/COUNT->additive, COUNT DISTINCT/MEDIAN->holistic, AVG and a/b->ratio,
MIN/MAX->extremum), grouping columns become dimensions. Inferred roles merge
into metadata_schema as x-measure/x-dimension on create/update; author-declared
annotations win. Unit + integration tests.

node-sql-parser retained: polyglot 0.1.15 does not export usable table/
projection extraction (getTables internal; getSourceTables throws on parse().ast;
ast is untyped), so it stays the table-extraction + projection parser.
When an entity type reverts from derived to stored (backing=null), drop the
x-measure/x-dimension annotations that inference wrote — they're meaningless on
a row-backed type. Also re-infer on a metadata-only update of a still-derived
type (using the current backing_sql). Adds the revert assertion to the
integration test and CLI wire-payload tests (upsertEntityType sends nested
backing / null).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds end-to-end support for SQL-view-backed derived entity types: DB migration columns, config and API contracts, CLI mapping/diff/upsert changes, server manage tool inference/stripping of measure/dimension annotations, SQL parsing migration, and unit/integration tests.

Changes

Derived Entity Types with SQL-View Backing

Layer / File(s) Summary
Database migration and config contracts
db/migrations/20260529130000_entity_types_backing.sql, packages/cli/src/config/define.ts
Adds backing_sql, backing_grain, backing_source columns; defines EntityBacking, extends EntityType with backing, properties, metadata, and exports measure/dimension helpers.
SQL measure/dimension inference utilities
packages/server/src/utils/infer-measures.ts
Parses SELECT projections to infer measure/dimension roles and reagg rules (inferColumns), merges inferred annotations into metadata_schema without overwriting author-declared entries (applyInferredMeasures), and strips derived-only annotations (stripMeasureAnnotations).
Inference unit tests
packages/server/src/utils/__tests__/infer-measures.test.ts
Tests classification for aggregates, ratios, SELECT * and parse failures, and verifies inference merges/preserves author annotations and tags server-inferred entries.
Server API input contracts and imports
packages/server/src/sandbox/namespaces/entity-schema.ts, packages/server/src/tools/admin/manage_entity_schema.ts
Add optional backing to createType/updateType contracts; import inference helpers and add BackingInputSchema to management tool.
manage_entity_schema implementation
packages/server/src/tools/admin/manage_entity_schema.ts
Extend entity-type result rows and SELECTs for backing fields; on create/update compute effective metadata_schema with inference/stripping depending on final backing state; persist backing fields conditionally in INSERT/UPDATE.
SQL parsing migration and validations
packages/server/src/utils/execute-data-sources.ts
Replace node-sql-parser with @polyglot-sql/sdk for parsing, table extraction, projection analysis, and query validation; add placeholder-stripping and AST helpers.
CLI types and mapping
packages/cli/src/commands/_lib/apply/client.ts, packages/cli/src/commands/_lib/apply/desired-state.ts, packages/cli/src/commands/_lib/apply/map-config.ts
Add optional backing to remote and desired types; map config backing into desired state, normalize backing.grain, and resolve backing.source to connector key.
CLI hoist/list/diff/upsert pipeline
packages/cli/src/commands/_lib/apply/client.ts, packages/cli/src/commands/_lib/apply/diff.ts
hoistEntityTypeSchema reconstructs top-level backing from backing_* columns; listEntityTypes uses hoist; diffEntityType strips server-inferred annotations before comparing and treats backing mismatches as changes; upsertEntityType deterministically sends backing object or null.
CLI and server tests
packages/cli/src/commands/_lib/apply/__tests__/client.test.ts, packages/cli/src/commands/_lib/apply/__tests__/map-config.test.ts, packages/cli/src/commands/_lib/apply/__tests__/diff-idempotency.test.ts, packages/server/src/__tests__/integration/entity-schema/entity-types.test.ts, packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts
Unit and integration tests covering mapping, client POST payloads (backing vs null), normalization of Postgres text[] literals, diff idempotency vs server-inferred annotations, derived entity lifecycle round-trip, and org-scoped execution of backing SQL.
Query scoping/security tests
packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts
Parametrized tests asserting schema-qualified table references are rejected and safe unqualified queries are correctly scoped.
Dependency updates
packages/cli/package.json, packages/server/package.json
Remove node-sql-parser dependency after switching parsing to @polyglot-sql/sdk.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • lobu-ai/lobu#1146: Overlaps on packages/server/src/utils/execute-data-sources.ts changes around queryProjectsIdColumn and SQL parsing logic.

Suggested labels

skip-size-check

Poem

🐰 I nibble rows where SQL views bend light,
Inferring measures through the quiet night.
Config seeds and server's careful sweep,
Backings bloom, then rest—no churn to make us weep.
A hop, a test, the types now hold their song.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 65.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding derived entity types backed by SQL views with automatic measure inference. It is specific, on-topic, and directly reflects the primary objective of the changeset.
Description check ✅ Passed The description is comprehensive and well-structured, covering what, model, inference, apply round-trip, tests, and scope. It includes a test plan checklist and notes section, though the test plan boxes are not explicitly checked—however, the descriptive text indicates validation was performed. All key sections from the template are meaningfully addressed.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/metric-derived-entities

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 30, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
packages/cli/src/commands/_lib/apply/diff.ts (1)

471-489: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Entity metadata changes still diff as noop.

This adds backing, but DesiredEntityType.metadata is still excluded from the entity-type field list. That means authored measure/dimension annotation changes won’t produce an update row, so apply can’t reconcile override metadata after the first create.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/commands/_lib/apply/diff.ts` around lines 471 - 489, The
entity-type diff omits DesiredEntityType.metadata so metadata-only changes
(e.g., authored measure/dimension annotations) show as noop; add a fields entry
for "metadata" in the same fields array (similar to "properties" and "backing")
with a changed predicate like (d, r) => !deepEqual(d.metadata, r.metadata) so
that DesiredEntityType.metadata differences create update rows and allow apply
to reconcile override metadata after create.
packages/cli/src/commands/_lib/apply/client.ts (1)

196-233: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Preserve metadata_schema annotations on entity-type round trips.

hoistEntityTypeSchema() only rehydrates properties/required, and upsertEntityType() only serializes those fields back into metadata_schema. That drops DesiredEntityType.metadata, so authored x-measure / x-dimension overrides never reach the server and can’t round-trip back into apply state. This needs to merge non-structural metadata into metadata_schema on write and rehydrate it on read.

Also applies to: 513-534, 567-605

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/commands/_lib/apply/client.ts` around lines 196 - 233,
hoistEntityTypeSchema currently only extracts properties and required from
row.metadata_schema and drops any non-structural metadata annotations; update it
to also copy any remaining keys from metadata_schema (e.g., x-measure,
x-dimension or any other annotation) into the returned RemoteEntityType
(preserving them under metadata or metadata_schema as your type expects) so
round-tripped annotations are not lost, and correspondingly modify
upsertEntityType to merge the DesiredEntityType.metadata (or remaining metadata
fields) back into the metadata_schema it writes (rather than serializing only
properties/required), ensuring non-structural keys are preserved on both read
(hoistEntityTypeSchema) and write (upsertEntityType) paths; apply the same
preservation/merge logic in the other locations mentioned (around the other
hoist/upsert blocks at the referenced ranges) so annotations round-trip intact.
🧹 Nitpick comments (1)
packages/cli/src/commands/_lib/apply/desired-state.ts (1)

46-55: ⚡ Quick win

Extract a shared EntityBacking interface.

This backing shape is now defined inline here and again in the client types. Pulling it into a named interface keeps the contracts aligned and avoids the two copies drifting.

As per coding guidelines, "Use interface for defining object shapes in TypeScript files".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/cli/src/commands/_lib/apply/desired-state.ts` around lines 46 - 55,
Replace the inline backing type with a named interface: declare and export
interface EntityBacking { sql: string; grain?: string[]; source?: string } and
change the existing property signature from backing?: { ... } to backing?:
EntityBacking in the desired-state.ts definition; then update the corresponding
client types to import/use this shared EntityBacking interface (and export it
from the module that holds it) so both places reference the same interface
rather than duplicating the shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/commands/_lib/apply/map-config.ts`:
- Around line 517-527: The backing.grain check treats an empty array as present
which prevents round-trip equality; change the condition in the backing mapping
(the ternary using entity.backing.grain) to only include grain when it's a
non-empty array (e.g., Array.isArray(entity.backing.grain) &&
entity.backing.grain.length > 0) and do the same normalization in the
corresponding write path so empty grain arrays are omitted consistently (the
same logic that produces the backing object with sql and
connectorKey(entity.backing.source) should mirror the write/serialization code).

In `@packages/server/src/tools/admin/manage_entity_schema.ts`:
- Around line 609-615: The branch that handles backing-only updates calls
applyInferredMeasures(args.metadata_schema, args.backing.sql) which passes
undefined when metadata_schema is omitted and thus rebuilds/drops existing
author-defined fields; change it to re-infer on top of the existing schema by
passing the currentSchema (or currentSchema ?? args.metadata_schema) into
applyInferredMeasures when args.metadata_schema is undefined—i.e., ensure the
code that sets effectiveSchema for the backing.sql case uses
applyInferredMeasures(currentSchema, args.backing.sql) so SQL-only updates
preserve existing metadata extensions.

---

Outside diff comments:
In `@packages/cli/src/commands/_lib/apply/client.ts`:
- Around line 196-233: hoistEntityTypeSchema currently only extracts properties
and required from row.metadata_schema and drops any non-structural metadata
annotations; update it to also copy any remaining keys from metadata_schema
(e.g., x-measure, x-dimension or any other annotation) into the returned
RemoteEntityType (preserving them under metadata or metadata_schema as your type
expects) so round-tripped annotations are not lost, and correspondingly modify
upsertEntityType to merge the DesiredEntityType.metadata (or remaining metadata
fields) back into the metadata_schema it writes (rather than serializing only
properties/required), ensuring non-structural keys are preserved on both read
(hoistEntityTypeSchema) and write (upsertEntityType) paths; apply the same
preservation/merge logic in the other locations mentioned (around the other
hoist/upsert blocks at the referenced ranges) so annotations round-trip intact.

In `@packages/cli/src/commands/_lib/apply/diff.ts`:
- Around line 471-489: The entity-type diff omits DesiredEntityType.metadata so
metadata-only changes (e.g., authored measure/dimension annotations) show as
noop; add a fields entry for "metadata" in the same fields array (similar to
"properties" and "backing") with a changed predicate like (d, r) =>
!deepEqual(d.metadata, r.metadata) so that DesiredEntityType.metadata
differences create update rows and allow apply to reconcile override metadata
after create.

---

Nitpick comments:
In `@packages/cli/src/commands/_lib/apply/desired-state.ts`:
- Around line 46-55: Replace the inline backing type with a named interface:
declare and export interface EntityBacking { sql: string; grain?: string[];
source?: string } and change the existing property signature from backing?: {
... } to backing?: EntityBacking in the desired-state.ts definition; then update
the corresponding client types to import/use this shared EntityBacking interface
(and export it from the module that holds it) so both places reference the same
interface rather than duplicating the shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 79e5317f-1032-418e-bd58-85233d758f90

📥 Commits

Reviewing files that changed from the base of the PR and between fc6a6d4 and 912e98d.

📒 Files selected for processing (13)
  • db/migrations/20260529130000_entity_types_backing.sql
  • packages/cli/src/commands/_lib/apply/__tests__/client.test.ts
  • packages/cli/src/commands/_lib/apply/__tests__/map-config.test.ts
  • packages/cli/src/commands/_lib/apply/client.ts
  • packages/cli/src/commands/_lib/apply/desired-state.ts
  • packages/cli/src/commands/_lib/apply/diff.ts
  • packages/cli/src/commands/_lib/apply/map-config.ts
  • packages/cli/src/config/define.ts
  • packages/server/src/__tests__/integration/entity-schema/entity-types.test.ts
  • packages/server/src/sandbox/namespaces/entity-schema.ts
  • packages/server/src/tools/admin/manage_entity_schema.ts
  • packages/server/src/utils/__tests__/infer-measures.test.ts
  • packages/server/src/utils/infer-measures.ts

Comment thread packages/cli/src/commands/_lib/apply/map-config.ts Outdated
Comment thread packages/server/src/tools/admin/manage_entity_schema.ts Outdated
buremba added 2 commits May 30, 2026 12:11
Caught by a live lobu apply round-trip against an embedded server:
- backing_grain (text[]) round-trips from the server as the raw literal
  '{a,b,c}', not a JS array, so remote.backing.grain != the config's array →
  'backing' churned every apply. Normalize the grain literal to string[] in
  hoist.
- a derived entity's properties are server-inferred (measure/dimension
  annotations from backing.sql), which the config never declares → skip the
  properties comparison for derived types (backing is the authored source).
Adds a listEntityTypes grain-normalization regression test.
…pply is exact

Replaces the properties-skip-for-derived shortcut (which made a DECLARED
measure override on a derived entity un-re-appliable unless the SQL also
changed) with a precise comparison:

- The server flags every inferred annotation `inferred: true` and injects ONLY
  that flag (no stray `type`), so a column's other keys are always the author's.
- The apply diff strips inferred-flagged annotations from both sides before
  comparing properties. Result: a pure-inferred derived type is idempotent, yet
  an author-declared override edit still surfaces as a real update.

Verified live (embedded PG): create infers + flags (no type); re-apply is noop;
adding a declared `measure({reagg:'ratio'})` override → 1 update, stored without
the flag, and re-apply of the override is noop.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/server/src/utils/__tests__/infer-measures.test.ts (1)

51-57: ⚡ Quick win

Use an interface for the inferred-flag shape instead of inline object literals.

These new assertions introduce inline object-shape types ({ inferred?: boolean }). Please extract that shape into an interface to comply with TS guidelines.

Proposed change
 import { describe, expect, it } from 'vitest';
 import { applyInferredMeasures, inferColumns } from '../infer-measures';
 
+interface InferredAnnotation {
+  inferred?: boolean;
+}
+
 describe('inferColumns', () => {
@@
-    expect((props.n['x-measure'] as { inferred?: boolean }).inferred).toBe(true);
-    expect((props.company_id['x-dimension'] as { inferred?: boolean }).inferred).toBe(true);
+    expect((props.n['x-measure'] as InferredAnnotation).inferred).toBe(true);
+    expect((props.company_id['x-dimension'] as InferredAnnotation).inferred).toBe(true);
     expect(
-      (props.spend['x-measure'] as { inferred?: boolean }).inferred
+      (props.spend['x-measure'] as InferredAnnotation).inferred
     ).toBeUndefined();

As per coding guidelines, **/*.{ts,tsx}: "Use interface for defining object shapes in TypeScript files."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/utils/__tests__/infer-measures.test.ts` around lines 51 -
57, The test uses inline object-shape types ({ inferred?: boolean }) in
assertions; define a local interface (e.g., InferredFlag) at the top of the test
file and replace the inline types with that interface in the three assertions
referencing props.n['x-measure'], props.company_id['x-dimension'], and
props.spend['x-measure']; ensure the interface name matches all three casts so
TypeScript linting guidelines for using interface over inline object literals
are satisfied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/server/src/utils/__tests__/infer-measures.test.ts`:
- Around line 51-57: The test uses inline object-shape types ({ inferred?:
boolean }) in assertions; define a local interface (e.g., InferredFlag) at the
top of the test file and replace the inline types with that interface in the
three assertions referencing props.n['x-measure'],
props.company_id['x-dimension'], and props.spend['x-measure']; ensure the
interface name matches all three casts so TypeScript linting guidelines for
using interface over inline object literals are satisfied.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 15d14f41-9aae-47c5-9dd6-d53502dbb746

📥 Commits

Reviewing files that changed from the base of the PR and between 912e98d and b0d8222.

📒 Files selected for processing (6)
  • packages/cli/src/commands/_lib/apply/__tests__/client.test.ts
  • packages/cli/src/commands/_lib/apply/__tests__/diff-idempotency.test.ts
  • packages/cli/src/commands/_lib/apply/client.ts
  • packages/cli/src/commands/_lib/apply/diff.ts
  • packages/server/src/utils/__tests__/infer-measures.test.ts
  • packages/server/src/utils/infer-measures.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/cli/src/commands/_lib/apply/client.ts
  • packages/server/src/utils/infer-measures.ts

Step 3 (the read path) reuses the existing query_sql rather than adding a
dedicated tool: a derived type's backing_sql is fetched via get_type and run
through query_sql, which org-scopes every referenced table via
validateAndScopeQuery. The inferred x-measure rules document how each measure
rolls up to a coarser grain.

- Notes the read path in the backing field description (discoverability).
- Adds an integration test proving a stored derived backing_sql is queryable
  AND org-scoped through query_sql (a sibling org's events are excluded from
  the aggregate).

Backing views must reference allowlisted event columns (business data in the
events.metadata jsonb, real columns like score/semantic_type), not arbitrary
columns — the same column allowlist query_sql already enforces.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts`:
- Line 51: The test SQL in derived-entity-query.test.ts currently queries the
raw events table; update the sql property value to use the current_event_records
view (replace 'FROM events' with 'FROM current_event_records') so the test
exercises supersession masking; modify the sql string in the object where sql:
'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY
semantic_type' is defined to reference current_event_records instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: b8463231-ee2a-4c4c-b93b-651137179320

📥 Commits

Reviewing files that changed from the base of the PR and between b0d8222 and a7fb1b4.

📒 Files selected for processing (2)
  • packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts
  • packages/server/src/tools/admin/manage_entity_schema.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/server/src/tools/admin/manage_entity_schema.ts

slug: 'events-by-type',
name: 'Events by type',
backing: {
sql: 'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY semantic_type',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use current_event_records instead of events in the backing query.

This test currently bakes in a direct events read, which violates the server event-query invariant and can miss supersession masking behavior. Please update the test SQL to query current_event_records so this integration test enforces the production contract.

As per coding guidelines, "packages/server/src/**/*.ts: Use current_event_records view to mask superseded rows when querying events".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts`
at line 51, The test SQL in derived-entity-query.test.ts currently queries the
raw events table; update the sql property value to use the current_event_records
view (replace 'FROM events' with 'FROM current_event_records') so the test
exercises supersession masking; modify the sql string in the object where sql:
'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY
semantic_type' is defined to reference current_event_records instead.

Consolidate on one SQL engine. node-sql-parser was the second parser
(alongside polyglot's validateWithSchema); replace its two uses with polyglot's
AST API, which is postgres-aware and handles metadata jsonb extraction + casts:

- execute-data-sources: extractTableRefs (getTables/getTableNames — schema-
  qualified refs rejected, CTE names excluded) and queryProjectsIdColumn now run
  on polyglot parse(). Org-scoping still wraps the ORIGINAL sql in CTEs — polyglot
  is used only for AST inspection, never to deparse (its transpiler mangles jsonb
  string literals).
- infer-measures: classify projection columns via getExprType (sum/count/avg/
  min/max/div + DISTINCT) instead of node-sql-parser's astify.
- Remove node-sql-parser from server + cli deps (regenerated bun.lock).

Polyglot's ESM build auto-inits on import, so the synchronous parse path is
unchanged. Verified: a derived view over events.metadata jsonb
(SUM((metadata->>'amount')::numeric)) is inferred (additive) AND queried
org-scoped via query_sql, with a sibling org's rows excluded.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/server/src/utils/execute-data-sources.ts (1)

93-106: polyglot-sql/sdk ast helpers (getTables, getTableNames, isStar) are part of the public API; optional chaining cleanup

  • @polyglot-sql/sdk exposes ast.getTables, ast.getTableNames, and ast.isStar, so the usages in packages/server/src/utils/execute-data-sources.ts are aligned with the SDK API.
  • Replace ast.isStar?.(...) with ast.isStar(...) in queryProjectsIdColumn to avoid unnecessary optional chaining.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/utils/execute-data-sources.ts` around lines 93 - 106, The
optional chaining around the SDK helper isStar is unnecessary and should be
removed: in the function queryProjectsIdColumn replace any call using
ast.isStar?.(...) with ast.isStar(...) so it uses the public API directly;
confirm other uses of ast.getTables and ast.getTableNames remain as-is (they are
part of the public API) and run type checks to ensure no undefined checks remain
around isStar calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/server/src/utils/execute-data-sources.ts`:
- Around line 93-106: The optional chaining around the SDK helper isStar is
unnecessary and should be removed: in the function queryProjectsIdColumn replace
any call using ast.isStar?.(...) with ast.isStar(...) so it uses the public API
directly; confirm other uses of ast.getTables and ast.getTableNames remain as-is
(they are part of the public API) and run type checks to ensure no undefined
checks remain around isStar calls.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 318aba2a-23ef-4bc2-9112-3b18c20a6673

📥 Commits

Reviewing files that changed from the base of the PR and between a7fb1b4 and 86dea8b.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • packages/cli/package.json
  • packages/server/package.json
  • packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts
  • packages/server/src/utils/execute-data-sources.ts
  • packages/server/src/utils/infer-measures.ts
💤 Files with no reviewable changes (2)
  • packages/cli/package.json
  • packages/server/package.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/server/src/tests/integration/entity-schema/derived-entity-query.test.ts
  • packages/server/src/utils/infer-measures.ts

…cross-org leak)

SECURITY: the node-sql-parser→polyglot migration's schema-qualified rejection
used ast.getTables, which only returns the FIRST FROM table — it does not
descend into JOINs or sub-selects. A schema-qualified ref in a join/subquery
(e.g. `... JOIN public.connections c`) slipped past, and since schema-qualified
names bypass the org-scoping CTE name-shadowing, it read every org's rows
(incl. otherwise-excluded columns like connections.credentials). Regression vs
node-sql-parser, which caught these.

Fix: detect schema-qualified refs by recursing the raw AST node graph (the only
traversal that reaches join/subquery tables), instead of polyglot's
getTables/walk. getTableNames (complete + CTE-excluded) still drives scoping.

Red→green reproducers (joins, multi-join, IN/EXISTS subqueries, UNION branch,
CTE body, 3-part names) + clean-query no-false-positive cases in
scoped-query-schema-rejection.test.ts. Affects every validateAndScopeQuery
caller: query_sql, metric_series, watcher/view-template data sources, reaction SDK.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts (1)

61-65: ⚡ Quick win

Strengthen the positive-path assertions to prove every table/branch is scoped.

toContain('organization_id') still passes if only the first table or UNION branch gets wrapped, which is very close to the regression this suite is meant to catch. For the join/CTE/union cases, assert the expected CTE aliases are present in out.sql (for example, both "events" AS ( and "entities" AS ().

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts`
around lines 61 - 65, The current positive-path test uses a weak assertion
(expect(out.sql).toContain('organization_id')) which can pass if only one
table/branch is scoped; update the test that calls scope(sql) to assert that
each expected CTE alias is present in out.sql (e.g.
expect(out.sql).toContain('"events" AS (') and
expect(out.sql).toContain('"entities" AS (') for the join/union/CTE cases) and
still assert out.params[0] is 'org_test'; locate the test that calls scope(sql)
and add explicit contains checks for every table/branch alias the input SQL
should produce to guarantee all branches are scoped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/server/src/utils/execute-data-sources.ts`:
- Around line 90-91: The AST walker rec currently bails out silently when depth
> 50 which allows deeply nested schema.table refs to bypass validation; change
the behavior so that when depth exceeds 50 the function fails closed by throwing
a specific error or setting a validation-failed flag (e.g., throw new Error('AST
depth exceeded validation limit') or mark a shared "validationFailed" variable)
instead of returning; update callers of rec (the validation routine that checks
schema-qualified references) to catch this condition and reject the query
accordingly so any traversal that hits the depth cap causes overall validation
to fail.

---

Nitpick comments:
In `@packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts`:
- Around line 61-65: The current positive-path test uses a weak assertion
(expect(out.sql).toContain('organization_id')) which can pass if only one
table/branch is scoped; update the test that calls scope(sql) to assert that
each expected CTE alias is present in out.sql (e.g.
expect(out.sql).toContain('"events" AS (') and
expect(out.sql).toContain('"entities" AS (') for the join/union/CTE cases) and
still assert out.params[0] is 'org_test'; locate the test that calls scope(sql)
and add explicit contains checks for every table/branch alias the input SQL
should produce to guarantee all branches are scoped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f3f9fe0d-f8b0-4234-8efc-1c245e2f5356

📥 Commits

Reviewing files that changed from the base of the PR and between 86dea8b and 129fdc8.

📒 Files selected for processing (2)
  • packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts
  • packages/server/src/utils/execute-data-sources.ts

Comment thread packages/server/src/utils/execute-data-sources.ts Outdated
buremba and others added 6 commits May 30, 2026 14:25
…inferred, reagg drift)

From a multi-perspective audit of the derived-entities feature:
- map-config: omit an empty `grain: []` (it's truthy but the server reads text[]
  back as absent → perpetual backing churn). Compare `.length`, not truthiness.
- manage_entity_schema: reject empty/whitespace `backing.sql` in create+update —
  TypeBox minLength isn't enforced for this tool, so a blank view could be
  persisted as a corrupt (unqueryable, measure-less) derived type.
- applyInferredMeasures: strip prior server-inferred (`inferred: true`)
  annotations before re-inferring, so a non-CLI caller that round-trips a
  read-back schema with a changed view can't freeze stale re-agg rules or keep
  annotations for columns the new sql dropped. (CLI already strips these.)
- ReaggRule: add `semi_additive` to the server union so it matches the CLI's
  (author-declarable only; inference never emits it).
- define.ts: drop the contradictory `{ mode: "stored" }` doc (there is no mode
  field — presence of backing is the only discriminant).

Tests: grain:[] omission (map-config), empty-sql rejection (entity-types),
stale-inferred re-inference (infer-measures).
Points at owletto e5b652c (feat/metric-derived-entities): the view-aware UI
for derived entity types — renders the backing view via query_sql instead of
the misleading empty stored-rows table, hides create affordances, and shows
the measures + backing query read-only.
A derived (view-backed) type has no rows in `entities` — its data is its
backing_sql view. handleCreate did not check, so the API/SDK would insert an
`entities` row the view silently ignores (orphan data). The web UI hides the
'Add' affordance, but the backend must guard the API path too. Now rejects
create when the target type has backing_sql set. +test.
… members

Members can now run query_sql + metric_series over their org's OPERATIONAL data
(entities, events, connections, feeds, watchers, …) — previously query_sql was
owner/admin-only, so members couldn't read derived-entity metrics.

The auth/identity tables stay admin-only via a per-query table allowlist
(ADMIN_ONLY_QUERYABLE_TABLES = oauth_tokens, oauth_clients, user): a non-admin
referencing them is rejected ("requires admin access"), even in a JOIN/subquery
(reuses the join-aware extraction). Secret columns were already schema-excluded;
this is the table-level guard on top.

- validateAndScopeQuery gains a restrictedTables option.
- query_sql/metric_series pass it for non-admins (closes a pre-existing
  metric_series hole where members could already read oauth_tokens).
- tool-access: query_sql no longer hard-gated to admin.
- tests: member blocked from oauth_tokens/user, allowed events/connections/feeds;
  admin unrestricted; tool-access tier flips; member-write-access updated.

Note: an EXISTING MCP session keeps its role until refresh (pre-existing
session-caching), so a just-downgraded admin's open session may retain access
until reconnect; fresh sessions + the UI's per-request session auth are gated
correctly.
The merge of main into this branch reset the owletto pointer to main's SHA,
dropping the derived-entity UI bump. Re-point at ba9daf0 = the UI commit rebased
onto the latest owletto main (so it carries both the new-user-routing fixes and
the derived view UI).
@buremba
Copy link
Copy Markdown
Member Author

buremba commented May 31, 2026

bug_free 78, simplicity 76, slop 0, bugs 0, 0 blockers

Typecheck/unit passed. [env] Integration nonzero from untouched public-origin/url-builder/public-pages host expectations and mcp-proxy upstream SSRF blocking; changed entity/query tests passed. Ran bun SQL-scope probes for comments/CTEs, scalar subquery scoping, and member oauth_tokens block -> ok.

Full verdict JSON
{
  "bug_free_confidence": 78,
  "bugs": 0,
  "slop": 0,
  "simplicity": 76,
  "blockers": [],
  "change_type": "feat",
  "behavior_change_risk": "high",
  "tests_adequate": true,
  "suggested_fixes": [],
  "notes": "Typecheck/unit passed. [env] Integration nonzero from untouched public-origin/url-builder/public-pages host expectations and mcp-proxy upstream SSRF blocking; changed entity/query tests passed. Ran bun SQL-scope probes for comments/CTEs, scalar subquery scoping, and member oauth_tokens block -> ok.",
  "categories": {
    "src": 861,
    "tests": 929,
    "docs": 0,
    "config": 0,
    "deps": 12,
    "migrations": 80,
    "ci": 0,
    "generated": 0
  }
}

Local review gate — branch protection can require the pi-review commit status. See docs/REVIEW_SCHEMA.md.

buremba added 5 commits May 31, 2026 17:19
…ence

Discovery pass (multi-agent + make review bug_free 66) found the persisted-
inference / semantic-layer half was over-engineered and the source of two churn
bugs. Collapse to the minimal complete model:

- DROP persisted inference: no applyInferredMeasures / stripMeasureAnnotations,
  no inferred:true flag, no withoutInferredAnnotations diff-strip. metadata_schema
  is stored verbatim; measure vs. dimension is classified ON READ (get returns
  measure_columns). Kills both churn bugs; apply diff is plain + idempotent.
- DROP backing_grain + backing_source columns (zero readers; re-add with a
  consumer). One column remains: backing_sql (NULL ⇒ stored).
- DROP the 6-value ReaggRule + numerator/denominator + measure()/dimension()
  authoring API (write-only; no rollup engine). SDK backing is just { sql }.

Bug fixes from the same pass:
- query_sql: drop the over-broad ORDER BY/LIMIT guard (the base SQL is wrapped
  in a subquery, so inner ORDER BY / window functions are valid) + make sort_by
  optional (a view's columns aren't known upfront).
- inferColumns: generic aggregates (bool_or, jsonb_agg, percentile … WITHIN
  GROUP) now classify as measures, not dimensions.
- derived-row-creation rejection moved into createEntity (the single chokepoint
  that also resolves public-catalog types) — closes the public-type bypass.

Migration reduced to a single backing_sql column. ~450 net lines removed.

chore: bump owletto pointer to c9fa21c (lean derived UI).
pi flagged both (cross-org leaks, blocker-class):
- Multi-statement: only the first statement was parsed/scoped, so a trailing
  `; SELECT … FROM public.oauth_tokens` ran UNSCOPED (exploitable via
  metric_series, which is member-accessible and doesn't subquery-wrap). Now
  extractTableRefs rejects multiple statements.
- Depth-50 fail-open: the schema-qualified-ref recursion silently returned past
  depth 50, so a deeply-nested public.<table> bypassed scoping. Replaced with an
  iterative (stack) traversal — no depth cap, seen-set bounds cycles.

Regression repros (60-level nested public.events; multi-statement) added to
scoped-query-schema-rejection.test.ts. Consumers (query_sql/metric_series/
watchers/derived-query) re-run green.
…ight

pi found two more bypasses of the derived-row invariant (a derived type must
never have stored rows):
- entity-link-upsert (connector auto-create) inserted rows without going through
  createEntity's guard — added the backing_sql check there too.
- update could convert a POPULATED stored type to derived, orphaning its rows —
  now rejected when entity_count > 0.

Rather than keep patching insert paths (whack-a-mole — pi found a different one
each review round), add a DB BEFORE INSERT trigger on entities that rejects rows
for derived types. That makes the invariant hold regardless of code path; the
app guards stay for friendly errors. Fires only for derived types (one PK lookup;
no effect on normal stored inserts).

+regression tests: convert-populated rejected, direct raw insert rejected by the
trigger.
…hand + expression-nested subqueries)

The @polyglot-sql/sdk-based org-scoping relied on ast.getTableNames, which has
two blind spots that each leaked once query_sql became member-accessible:

1. PostgreSQL's `TABLE <name>` shorthand mis-parses as a column → zero table
   refs → the query ran UNSCOPED and past the admin-only-table gate.
2. getTableNames does not descend into subqueries nested inside an EXPRESSION
   (CASE / scalar SELECT-list) → such a table was neither org-scoped nor
   admin-gated, leaking oauth_tokens to non-admins and other orgs' rows.

Fixes:
- Reject non-SELECT/WITH queries (incl. the TABLE shorthand) up front.
- extractTableRefs now returns the UNION of getTableNames and a complete raw
  AST walk, so a table escapes scoping only if BOTH extractors miss it; this
  drives the unknown-table check, the admin gate, AND org-scoping.
- Fail-closed CTE-collision guard: reject a query whose CTE name shadows a real
  or admin table, since the parser cannot disambiguate CTE vs base table by
  lexical scope.

Regressions added at both the validateAndScopeQuery and the member-context
query_sql tool level.
Comment thread packages/server/src/utils/execute-data-sources.ts Fixed
buremba added 2 commits May 31, 2026 22:15
…ug read-tier + trigger UPDATE coverage

Blocker: metric_series is member-reachable but omitted safeColumns, so its
org-scoping CTEs used SELECT * — a member charting `connections` got the
withheld `credentials` column (OAuth tokens). Pass SAFE_COLUMN_DEFS, matching
query_sql.

- query_sql: only require owner/admin (and only escalate callerIsAdmin) when
  org_slug points at ANOTHER org; passing your own org slug stays read-tier,
  matching the existing comment's intent.
- derived-row trigger: also fire on UPDATE OF entity_type_id (re-pointing a
  stored row at a derived type), and add a guard on entity_types so setting
  backing_sql on a type that still has stored rows is rejected at the DB.

Regressions: member metric_series credential-column safety; both new trigger
paths.
…metric_series

Defense-in-depth beyond the app-layer admin gate, since parser-based table
extraction can't be proven airtight for arbitrary member SQL. Non-admin
query_sql/metric_series now run their (org-scoped, app-gated) SQL under a
`lobu_query_restricted` Postgres role via SET LOCAL ROLE. The role can read
every queryable table EXCEPT the auth/identity tables (oauth_tokens,
oauth_clients, user), so any future parser hole that reaches one hits a DB
"permission denied" instead of leaking. Cross-org isolation stays enforced by
the org-scoping CTEs; this role is purely a table-level backstop.

- Migration is fully tolerant: if the DB user lacks CREATEROLE (managed prod
  cluster), CREATE ROLE / grants no-op via DO/EXCEPTION blocks and the deploy
  still succeeds.
- Runtime applies SET LOCAL ROLE only when the role exists (cached per-process
  pg_roles check); absent the role, callers fall back to the app-layer gate, so
  no member query ever breaks.
- REVOKE list must stay in sync with ADMIN_ONLY_QUERYABLE_TABLES.

Tests: the role denies oauth_tokens (permission denied) but reads operational
tables; a member's ordinary query_sql still returns rows under the role.
buremba added 9 commits May 31, 2026 22:44
…eck uses assumability

- inferColumns: a cast/paren wraps the aggregate node, so COUNT(*)::int and
  SUM(x)::numeric were classified as dimensions. Peel cast/paren wrappers before
  matching MEASURE_EXPR_TYPES; a cast of a non-aggregate stays a dimension.
- restrictedQueryRoleAvailable: check pg_has_role(current_user, role, 'MEMBER'),
  not mere existence. A role that exists but isn't grantable to the connecting
  user would make SET LOCAL ROLE throw and abort every member query; now that
  case falls back to the app-layer gate instead. CASE short-circuits so
  pg_has_role isn't evaluated for a non-existent role.
…in the testable app layer

The restricted-role layer (SET LOCAL ROLE + a provisioned Postgres role)
protected only the 3 auth tables, didn't help cross-org, and spread the control
across 5 places with a hand-synced deny-list. Worse, its prod-specific branches
(no CREATEROLE / role not assumable) can't be exercised under the superuser
embedded Postgres tests, so the security-relevant behavior was verified only by
reading.

Keep the single, deterministic, fully-testable control in validateAndScopeQuery
(union table extraction + fail-closed CTE-collision guard + admin gate) and back
it with a completeness invariant test: an admin-only table referenced in ANY
position (join, WHERE/EXISTS, scalar + CASE-nested subquery, CTE body, UNION,
doubly-nested) must be rejected for a member, with the inverse asserting clean
shapes aren't over-blocked. That test is the canary for a future parser blind
spot — it goes red in CI before a leak can ship.

Removed: migration 20260531130000, RESTRICTED_QUERY_ROLE/restrictedQueryRoleAvailable,
the SET LOCAL ROLE wiring, and restricted-query-role.test.ts. Unchanged: parser
fixes, metric_series column allowlist, org_slug read-tier, derived-row triggers,
measure-inference cast fix.
…in validateAndScopeQuery

Two pre-existing edge cases the review surfaced once query_sql/metric_series
became member-reachable (both latent on main while the tools were admin-only):

- validateTableQuery rejected ORDER BY/GROUP BY references to a query's OWN
  output alias (e.g. SELECT count(*) AS n ... ORDER BY n) as 'unknown column'.
  Now suppresses that E201 only when the name is a top-level output alias; a
  genuinely-unknown column and an excluded column referenced via alias
  (SELECT credentials AS x) are still rejected. Org-scoping/admin-gating are
  independent of this validator, so no security relaxation.
- buildScopedQuery prepended a second WITH when the user query led with a SQL
  comment before WITH, emitting invalid SQL. Strip leading comments before the
  merge decision so the injected CTEs splice into the single existing WITH.
…empty-backing guard; measure-test coverage

From the adversarial audit (39 findings → 5 confirmed, all low/medium):

- execute-data-sources: drop the getTableNames half of extractTableRefs' union.
  Differential-tested across 20 query shapes (comma joins, LATERAL, ONLY,
  recursive CTE, set ops, subqueries in FROM/ORDER BY/HAVING): the raw walk is a
  strict SUPERSET of getTableNames, so it added zero coverage — and was no hedge
  either (if the raw walk broke, getTableNames alone is incomplete and would leak
  regardless). The completeness-invariant suite guards raw-walk-only.
- cli map-config: reject an empty/whitespace backing.sql at load time, matching
  the file's 'fail loud in the CLI before any remote mutation' contract instead
  of a confusing mid-apply server 4xx.
- infer-measures test: cover the genuine parse-failure branch ('SELECT (') — the
  prior 'this is not sql' case passes via the non-select guard (it parses to a
  'not' node), never exercising !res.success.
… trigger ignores soft-deleted rows

- metric_series ran its transaction WITHOUT SET TRANSACTION READ ONLY, while
  validateAndScopeQuery accepts data-modifying WITH CTEs (the SELECT/WITH guard
  only checks the prefix). A member could therefore mutate via
  `WITH x AS (DELETE FROM events RETURNING id) SELECT count(*) FROM x`. Add
  READ ONLY first in the txn (DB-enforced, mirrors query_sql) so writes fail
  closed. Regression: member DML-CTE rejects + the row is untouched.
- reject_derived_conversion_with_rows trigger counted ALL rows, but the app
  convert-guard counts only live rows (deleted_at IS NULL). A type whose rows
  are all soft-deleted was blocked at the DB despite the app allowing it. Add
  `AND e.deleted_at IS NULL` so the trigger matches app semantics. Regression:
  convert-to-derived succeeds when the only row is soft-deleted.
…w are allowed (wrapped as subquery)

The registry description told agents 'Do NOT include ORDER BY/LIMIT/OFFSET',
but query_sql wraps the query as a subquery so those are valid inside it
(pagination/sort come from the sort_by/limit/offset args). Only positional
params are forbidden. Matches QuerySqlSchema's sql-arg description.
…r scanner

CodeQL js/redos (high) on the SELECT/WITH guard's regex: the nested-quantifier
comment strip `(?:--…\n|/*…*/)*` backtracks catastrophically on crafted input
(many unclosed /*), and it runs on member-supplied SQL on every query_sql/
metric_series call — a real DoS. Replace with stripLeadingComments (a single
linear indexOf scan) + isReadQuery, shared by validateAndScopeQuery and
metric_series (removing metric_series' duplicate copy of the same flagged
regex). Regression: 100k unclosed /* resolves in <1s instead of hanging.
@buremba buremba merged commit 73abb03 into main May 31, 2026
21 checks passed
@buremba buremba deleted the feat/metric-derived-entities branch May 31, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants