feat: derived entity types (SQL-view backing) + measure inference#1161
Conversation
…tests Add backing to entity types: a derived type is a read-only SQL view (backing_sql) over other relations whose aggregate columns become measures; stored is the default (absence of backing). Typed columns backing_sql/ backing_grain/backing_source on entity_types; full lobu apply round-trip (map-config -> desired-state -> diff -> client upsert -> server create/update -> list/hoist) plus SDK measure()/dimension() helpers and ReaggRule. Validated: cli unit tests + server integration round-trip against real PG (create/get/update/revert/stored), which caught a text[] write-encoding bug (fixed via pgTextArray + ::text[]).
Step 2: parse a derived entity's backing SQL (node-sql-parser astify) and classify each projection column — aggregates become measures with a re-agg rule (SUM/COUNT->additive, COUNT DISTINCT/MEDIAN->holistic, AVG and a/b->ratio, MIN/MAX->extremum), grouping columns become dimensions. Inferred roles merge into metadata_schema as x-measure/x-dimension on create/update; author-declared annotations win. Unit + integration tests. node-sql-parser retained: polyglot 0.1.15 does not export usable table/ projection extraction (getTables internal; getSourceTables throws on parse().ast; ast is untyped), so it stays the table-extraction + projection parser.
When an entity type reverts from derived to stored (backing=null), drop the x-measure/x-dimension annotations that inference wrote — they're meaningless on a row-backed type. Also re-infer on a metadata-only update of a still-derived type (using the current backing_sql). Adds the revert assertion to the integration test and CLI wire-payload tests (upsertEntityType sends nested backing / null).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds end-to-end support for SQL-view-backed derived entity types: DB migration columns, config and API contracts, CLI mapping/diff/upsert changes, server manage tool inference/stripping of measure/dimension annotations, SQL parsing migration, and unit/integration tests. ChangesDerived Entity Types with SQL-View Backing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labelsskip-size-check Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/cli/src/commands/_lib/apply/diff.ts (1)
471-489:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEntity metadata changes still diff as
noop.This adds
backing, butDesiredEntityType.metadatais still excluded from the entity-type field list. That means authored measure/dimension annotation changes won’t produce an update row, so apply can’t reconcile override metadata after the first create.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/src/commands/_lib/apply/diff.ts` around lines 471 - 489, The entity-type diff omits DesiredEntityType.metadata so metadata-only changes (e.g., authored measure/dimension annotations) show as noop; add a fields entry for "metadata" in the same fields array (similar to "properties" and "backing") with a changed predicate like (d, r) => !deepEqual(d.metadata, r.metadata) so that DesiredEntityType.metadata differences create update rows and allow apply to reconcile override metadata after create.packages/cli/src/commands/_lib/apply/client.ts (1)
196-233:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftPreserve
metadata_schemaannotations on entity-type round trips.
hoistEntityTypeSchema()only rehydratesproperties/required, andupsertEntityType()only serializes those fields back intometadata_schema. That dropsDesiredEntityType.metadata, so authoredx-measure/x-dimensionoverrides never reach the server and can’t round-trip back into apply state. This needs to merge non-structural metadata intometadata_schemaon write and rehydrate it on read.Also applies to: 513-534, 567-605
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/src/commands/_lib/apply/client.ts` around lines 196 - 233, hoistEntityTypeSchema currently only extracts properties and required from row.metadata_schema and drops any non-structural metadata annotations; update it to also copy any remaining keys from metadata_schema (e.g., x-measure, x-dimension or any other annotation) into the returned RemoteEntityType (preserving them under metadata or metadata_schema as your type expects) so round-tripped annotations are not lost, and correspondingly modify upsertEntityType to merge the DesiredEntityType.metadata (or remaining metadata fields) back into the metadata_schema it writes (rather than serializing only properties/required), ensuring non-structural keys are preserved on both read (hoistEntityTypeSchema) and write (upsertEntityType) paths; apply the same preservation/merge logic in the other locations mentioned (around the other hoist/upsert blocks at the referenced ranges) so annotations round-trip intact.
🧹 Nitpick comments (1)
packages/cli/src/commands/_lib/apply/desired-state.ts (1)
46-55: ⚡ Quick winExtract a shared
EntityBackinginterface.This backing shape is now defined inline here and again in the client types. Pulling it into a named interface keeps the contracts aligned and avoids the two copies drifting.
As per coding guidelines, "Use
interfacefor defining object shapes in TypeScript files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/cli/src/commands/_lib/apply/desired-state.ts` around lines 46 - 55, Replace the inline backing type with a named interface: declare and export interface EntityBacking { sql: string; grain?: string[]; source?: string } and change the existing property signature from backing?: { ... } to backing?: EntityBacking in the desired-state.ts definition; then update the corresponding client types to import/use this shared EntityBacking interface (and export it from the module that holds it) so both places reference the same interface rather than duplicating the shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/commands/_lib/apply/map-config.ts`:
- Around line 517-527: The backing.grain check treats an empty array as present
which prevents round-trip equality; change the condition in the backing mapping
(the ternary using entity.backing.grain) to only include grain when it's a
non-empty array (e.g., Array.isArray(entity.backing.grain) &&
entity.backing.grain.length > 0) and do the same normalization in the
corresponding write path so empty grain arrays are omitted consistently (the
same logic that produces the backing object with sql and
connectorKey(entity.backing.source) should mirror the write/serialization code).
In `@packages/server/src/tools/admin/manage_entity_schema.ts`:
- Around line 609-615: The branch that handles backing-only updates calls
applyInferredMeasures(args.metadata_schema, args.backing.sql) which passes
undefined when metadata_schema is omitted and thus rebuilds/drops existing
author-defined fields; change it to re-infer on top of the existing schema by
passing the currentSchema (or currentSchema ?? args.metadata_schema) into
applyInferredMeasures when args.metadata_schema is undefined—i.e., ensure the
code that sets effectiveSchema for the backing.sql case uses
applyInferredMeasures(currentSchema, args.backing.sql) so SQL-only updates
preserve existing metadata extensions.
---
Outside diff comments:
In `@packages/cli/src/commands/_lib/apply/client.ts`:
- Around line 196-233: hoistEntityTypeSchema currently only extracts properties
and required from row.metadata_schema and drops any non-structural metadata
annotations; update it to also copy any remaining keys from metadata_schema
(e.g., x-measure, x-dimension or any other annotation) into the returned
RemoteEntityType (preserving them under metadata or metadata_schema as your type
expects) so round-tripped annotations are not lost, and correspondingly modify
upsertEntityType to merge the DesiredEntityType.metadata (or remaining metadata
fields) back into the metadata_schema it writes (rather than serializing only
properties/required), ensuring non-structural keys are preserved on both read
(hoistEntityTypeSchema) and write (upsertEntityType) paths; apply the same
preservation/merge logic in the other locations mentioned (around the other
hoist/upsert blocks at the referenced ranges) so annotations round-trip intact.
In `@packages/cli/src/commands/_lib/apply/diff.ts`:
- Around line 471-489: The entity-type diff omits DesiredEntityType.metadata so
metadata-only changes (e.g., authored measure/dimension annotations) show as
noop; add a fields entry for "metadata" in the same fields array (similar to
"properties" and "backing") with a changed predicate like (d, r) =>
!deepEqual(d.metadata, r.metadata) so that DesiredEntityType.metadata
differences create update rows and allow apply to reconcile override metadata
after create.
---
Nitpick comments:
In `@packages/cli/src/commands/_lib/apply/desired-state.ts`:
- Around line 46-55: Replace the inline backing type with a named interface:
declare and export interface EntityBacking { sql: string; grain?: string[];
source?: string } and change the existing property signature from backing?: {
... } to backing?: EntityBacking in the desired-state.ts definition; then update
the corresponding client types to import/use this shared EntityBacking interface
(and export it from the module that holds it) so both places reference the same
interface rather than duplicating the shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 79e5317f-1032-418e-bd58-85233d758f90
📒 Files selected for processing (13)
db/migrations/20260529130000_entity_types_backing.sqlpackages/cli/src/commands/_lib/apply/__tests__/client.test.tspackages/cli/src/commands/_lib/apply/__tests__/map-config.test.tspackages/cli/src/commands/_lib/apply/client.tspackages/cli/src/commands/_lib/apply/desired-state.tspackages/cli/src/commands/_lib/apply/diff.tspackages/cli/src/commands/_lib/apply/map-config.tspackages/cli/src/config/define.tspackages/server/src/__tests__/integration/entity-schema/entity-types.test.tspackages/server/src/sandbox/namespaces/entity-schema.tspackages/server/src/tools/admin/manage_entity_schema.tspackages/server/src/utils/__tests__/infer-measures.test.tspackages/server/src/utils/infer-measures.ts
Caught by a live lobu apply round-trip against an embedded server:
- backing_grain (text[]) round-trips from the server as the raw literal
'{a,b,c}', not a JS array, so remote.backing.grain != the config's array →
'backing' churned every apply. Normalize the grain literal to string[] in
hoist.
- a derived entity's properties are server-inferred (measure/dimension
annotations from backing.sql), which the config never declares → skip the
properties comparison for derived types (backing is the authored source).
Adds a listEntityTypes grain-normalization regression test.
…pply is exact
Replaces the properties-skip-for-derived shortcut (which made a DECLARED
measure override on a derived entity un-re-appliable unless the SQL also
changed) with a precise comparison:
- The server flags every inferred annotation `inferred: true` and injects ONLY
that flag (no stray `type`), so a column's other keys are always the author's.
- The apply diff strips inferred-flagged annotations from both sides before
comparing properties. Result: a pure-inferred derived type is idempotent, yet
an author-declared override edit still surfaces as a real update.
Verified live (embedded PG): create infers + flags (no type); re-apply is noop;
adding a declared `measure({reagg:'ratio'})` override → 1 update, stored without
the flag, and re-apply of the override is noop.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/server/src/utils/__tests__/infer-measures.test.ts (1)
51-57: ⚡ Quick winUse an
interfacefor the inferred-flag shape instead of inline object literals.These new assertions introduce inline object-shape types (
{ inferred?: boolean }). Please extract that shape into aninterfaceto comply with TS guidelines.Proposed change
import { describe, expect, it } from 'vitest'; import { applyInferredMeasures, inferColumns } from '../infer-measures'; +interface InferredAnnotation { + inferred?: boolean; +} + describe('inferColumns', () => { @@ - expect((props.n['x-measure'] as { inferred?: boolean }).inferred).toBe(true); - expect((props.company_id['x-dimension'] as { inferred?: boolean }).inferred).toBe(true); + expect((props.n['x-measure'] as InferredAnnotation).inferred).toBe(true); + expect((props.company_id['x-dimension'] as InferredAnnotation).inferred).toBe(true); expect( - (props.spend['x-measure'] as { inferred?: boolean }).inferred + (props.spend['x-measure'] as InferredAnnotation).inferred ).toBeUndefined();As per coding guidelines,
**/*.{ts,tsx}: "Useinterfacefor defining object shapes in TypeScript files."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/utils/__tests__/infer-measures.test.ts` around lines 51 - 57, The test uses inline object-shape types ({ inferred?: boolean }) in assertions; define a local interface (e.g., InferredFlag) at the top of the test file and replace the inline types with that interface in the three assertions referencing props.n['x-measure'], props.company_id['x-dimension'], and props.spend['x-measure']; ensure the interface name matches all three casts so TypeScript linting guidelines for using interface over inline object literals are satisfied.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/server/src/utils/__tests__/infer-measures.test.ts`:
- Around line 51-57: The test uses inline object-shape types ({ inferred?:
boolean }) in assertions; define a local interface (e.g., InferredFlag) at the
top of the test file and replace the inline types with that interface in the
three assertions referencing props.n['x-measure'],
props.company_id['x-dimension'], and props.spend['x-measure']; ensure the
interface name matches all three casts so TypeScript linting guidelines for
using interface over inline object literals are satisfied.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 15d14f41-9aae-47c5-9dd6-d53502dbb746
📒 Files selected for processing (6)
packages/cli/src/commands/_lib/apply/__tests__/client.test.tspackages/cli/src/commands/_lib/apply/__tests__/diff-idempotency.test.tspackages/cli/src/commands/_lib/apply/client.tspackages/cli/src/commands/_lib/apply/diff.tspackages/server/src/utils/__tests__/infer-measures.test.tspackages/server/src/utils/infer-measures.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/cli/src/commands/_lib/apply/client.ts
- packages/server/src/utils/infer-measures.ts
Step 3 (the read path) reuses the existing query_sql rather than adding a dedicated tool: a derived type's backing_sql is fetched via get_type and run through query_sql, which org-scopes every referenced table via validateAndScopeQuery. The inferred x-measure rules document how each measure rolls up to a coarser grain. - Notes the read path in the backing field description (discoverability). - Adds an integration test proving a stored derived backing_sql is queryable AND org-scoped through query_sql (a sibling org's events are excluded from the aggregate). Backing views must reference allowlisted event columns (business data in the events.metadata jsonb, real columns like score/semantic_type), not arbitrary columns — the same column allowlist query_sql already enforces.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts`:
- Line 51: The test SQL in derived-entity-query.test.ts currently queries the
raw events table; update the sql property value to use the current_event_records
view (replace 'FROM events' with 'FROM current_event_records') so the test
exercises supersession masking; modify the sql string in the object where sql:
'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY
semantic_type' is defined to reference current_event_records instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: b8463231-ee2a-4c4c-b93b-651137179320
📒 Files selected for processing (2)
packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.tspackages/server/src/tools/admin/manage_entity_schema.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- packages/server/src/tools/admin/manage_entity_schema.ts
| slug: 'events-by-type', | ||
| name: 'Events by type', | ||
| backing: { | ||
| sql: 'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY semantic_type', |
There was a problem hiding this comment.
Use current_event_records instead of events in the backing query.
This test currently bakes in a direct events read, which violates the server event-query invariant and can miss supersession masking behavior. Please update the test SQL to query current_event_records so this integration test enforces the production contract.
As per coding guidelines, "packages/server/src/**/*.ts: Use current_event_records view to mask superseded rows when querying events".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.ts`
at line 51, The test SQL in derived-entity-query.test.ts currently queries the
raw events table; update the sql property value to use the current_event_records
view (replace 'FROM events' with 'FROM current_event_records') so the test
exercises supersession masking; modify the sql string in the object where sql:
'SELECT semantic_type, COUNT(*) AS event_count FROM events GROUP BY
semantic_type' is defined to reference current_event_records instead.
Consolidate on one SQL engine. node-sql-parser was the second parser (alongside polyglot's validateWithSchema); replace its two uses with polyglot's AST API, which is postgres-aware and handles metadata jsonb extraction + casts: - execute-data-sources: extractTableRefs (getTables/getTableNames — schema- qualified refs rejected, CTE names excluded) and queryProjectsIdColumn now run on polyglot parse(). Org-scoping still wraps the ORIGINAL sql in CTEs — polyglot is used only for AST inspection, never to deparse (its transpiler mangles jsonb string literals). - infer-measures: classify projection columns via getExprType (sum/count/avg/ min/max/div + DISTINCT) instead of node-sql-parser's astify. - Remove node-sql-parser from server + cli deps (regenerated bun.lock). Polyglot's ESM build auto-inits on import, so the synchronous parse path is unchanged. Verified: a derived view over events.metadata jsonb (SUM((metadata->>'amount')::numeric)) is inferred (additive) AND queried org-scoped via query_sql, with a sibling org's rows excluded.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/server/src/utils/execute-data-sources.ts (1)
93-106: polyglot-sql/sdk ast helpers (getTables,getTableNames,isStar) are part of the public API; optional chaining cleanup
@polyglot-sql/sdkexposesast.getTables,ast.getTableNames, andast.isStar, so the usages inpackages/server/src/utils/execute-data-sources.tsare aligned with the SDK API.- Replace
ast.isStar?.(...)withast.isStar(...)inqueryProjectsIdColumnto avoid unnecessary optional chaining.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/utils/execute-data-sources.ts` around lines 93 - 106, The optional chaining around the SDK helper isStar is unnecessary and should be removed: in the function queryProjectsIdColumn replace any call using ast.isStar?.(...) with ast.isStar(...) so it uses the public API directly; confirm other uses of ast.getTables and ast.getTableNames remain as-is (they are part of the public API) and run type checks to ensure no undefined checks remain around isStar calls.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/server/src/utils/execute-data-sources.ts`:
- Around line 93-106: The optional chaining around the SDK helper isStar is
unnecessary and should be removed: in the function queryProjectsIdColumn replace
any call using ast.isStar?.(...) with ast.isStar(...) so it uses the public API
directly; confirm other uses of ast.getTables and ast.getTableNames remain as-is
(they are part of the public API) and run type checks to ensure no undefined
checks remain around isStar calls.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 318aba2a-23ef-4bc2-9112-3b18c20a6673
⛔ Files ignored due to path filters (1)
bun.lockis excluded by!**/*.lock
📒 Files selected for processing (5)
packages/cli/package.jsonpackages/server/package.jsonpackages/server/src/__tests__/integration/entity-schema/derived-entity-query.test.tspackages/server/src/utils/execute-data-sources.tspackages/server/src/utils/infer-measures.ts
💤 Files with no reviewable changes (2)
- packages/cli/package.json
- packages/server/package.json
🚧 Files skipped from review as they are similar to previous changes (2)
- packages/server/src/tests/integration/entity-schema/derived-entity-query.test.ts
- packages/server/src/utils/infer-measures.ts
…cross-org leak) SECURITY: the node-sql-parser→polyglot migration's schema-qualified rejection used ast.getTables, which only returns the FIRST FROM table — it does not descend into JOINs or sub-selects. A schema-qualified ref in a join/subquery (e.g. `... JOIN public.connections c`) slipped past, and since schema-qualified names bypass the org-scoping CTE name-shadowing, it read every org's rows (incl. otherwise-excluded columns like connections.credentials). Regression vs node-sql-parser, which caught these. Fix: detect schema-qualified refs by recursing the raw AST node graph (the only traversal that reaches join/subquery tables), instead of polyglot's getTables/walk. getTableNames (complete + CTE-excluded) still drives scoping. Red→green reproducers (joins, multi-join, IN/EXISTS subqueries, UNION branch, CTE body, 3-part names) + clean-query no-false-positive cases in scoped-query-schema-rejection.test.ts. Affects every validateAndScopeQuery caller: query_sql, metric_series, watcher/view-template data sources, reaction SDK.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts (1)
61-65: ⚡ Quick winStrengthen the positive-path assertions to prove every table/branch is scoped.
toContain('organization_id')still passes if only the first table or UNION branch gets wrapped, which is very close to the regression this suite is meant to catch. For the join/CTE/union cases, assert the expected CTE aliases are present inout.sql(for example, both"events" AS (and"entities" AS ().🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts` around lines 61 - 65, The current positive-path test uses a weak assertion (expect(out.sql).toContain('organization_id')) which can pass if only one table/branch is scoped; update the test that calls scope(sql) to assert that each expected CTE alias is present in out.sql (e.g. expect(out.sql).toContain('"events" AS (') and expect(out.sql).toContain('"entities" AS (') for the join/union/CTE cases) and still assert out.params[0] is 'org_test'; locate the test that calls scope(sql) and add explicit contains checks for every table/branch alias the input SQL should produce to guarantee all branches are scoped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/server/src/utils/execute-data-sources.ts`:
- Around line 90-91: The AST walker rec currently bails out silently when depth
> 50 which allows deeply nested schema.table refs to bypass validation; change
the behavior so that when depth exceeds 50 the function fails closed by throwing
a specific error or setting a validation-failed flag (e.g., throw new Error('AST
depth exceeded validation limit') or mark a shared "validationFailed" variable)
instead of returning; update callers of rec (the validation routine that checks
schema-qualified references) to catch this condition and reject the query
accordingly so any traversal that hits the depth cap causes overall validation
to fail.
---
Nitpick comments:
In `@packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.ts`:
- Around line 61-65: The current positive-path test uses a weak assertion
(expect(out.sql).toContain('organization_id')) which can pass if only one
table/branch is scoped; update the test that calls scope(sql) to assert that
each expected CTE alias is present in out.sql (e.g.
expect(out.sql).toContain('"events" AS (') and
expect(out.sql).toContain('"entities" AS (') for the join/union/CTE cases) and
still assert out.params[0] is 'org_test'; locate the test that calls scope(sql)
and add explicit contains checks for every table/branch alias the input SQL
should produce to guarantee all branches are scoped.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f3f9fe0d-f8b0-4234-8efc-1c245e2f5356
📒 Files selected for processing (2)
packages/server/src/__tests__/unit/scoped-query-schema-rejection.test.tspackages/server/src/utils/execute-data-sources.ts
…inferred, reagg drift)
From a multi-perspective audit of the derived-entities feature:
- map-config: omit an empty `grain: []` (it's truthy but the server reads text[]
back as absent → perpetual backing churn). Compare `.length`, not truthiness.
- manage_entity_schema: reject empty/whitespace `backing.sql` in create+update —
TypeBox minLength isn't enforced for this tool, so a blank view could be
persisted as a corrupt (unqueryable, measure-less) derived type.
- applyInferredMeasures: strip prior server-inferred (`inferred: true`)
annotations before re-inferring, so a non-CLI caller that round-trips a
read-back schema with a changed view can't freeze stale re-agg rules or keep
annotations for columns the new sql dropped. (CLI already strips these.)
- ReaggRule: add `semi_additive` to the server union so it matches the CLI's
(author-declarable only; inference never emits it).
- define.ts: drop the contradictory `{ mode: "stored" }` doc (there is no mode
field — presence of backing is the only discriminant).
Tests: grain:[] omission (map-config), empty-sql rejection (entity-types),
stale-inferred re-inference (infer-measures).
Points at owletto e5b652c (feat/metric-derived-entities): the view-aware UI for derived entity types — renders the backing view via query_sql instead of the misleading empty stored-rows table, hides create affordances, and shows the measures + backing query read-only.
A derived (view-backed) type has no rows in `entities` — its data is its backing_sql view. handleCreate did not check, so the API/SDK would insert an `entities` row the view silently ignores (orphan data). The web UI hides the 'Add' affordance, but the backend must guard the API path too. Now rejects create when the target type has backing_sql set. +test.
…ntities # Conflicts: # packages/owletto
… members
Members can now run query_sql + metric_series over their org's OPERATIONAL data
(entities, events, connections, feeds, watchers, …) — previously query_sql was
owner/admin-only, so members couldn't read derived-entity metrics.
The auth/identity tables stay admin-only via a per-query table allowlist
(ADMIN_ONLY_QUERYABLE_TABLES = oauth_tokens, oauth_clients, user): a non-admin
referencing them is rejected ("requires admin access"), even in a JOIN/subquery
(reuses the join-aware extraction). Secret columns were already schema-excluded;
this is the table-level guard on top.
- validateAndScopeQuery gains a restrictedTables option.
- query_sql/metric_series pass it for non-admins (closes a pre-existing
metric_series hole where members could already read oauth_tokens).
- tool-access: query_sql no longer hard-gated to admin.
- tests: member blocked from oauth_tokens/user, allowed events/connections/feeds;
admin unrestricted; tool-access tier flips; member-write-access updated.
Note: an EXISTING MCP session keeps its role until refresh (pre-existing
session-caching), so a just-downgraded admin's open session may retain access
until reconnect; fresh sessions + the UI's per-request session auth are gated
correctly.
The merge of main into this branch reset the owletto pointer to main's SHA, dropping the derived-entity UI bump. Re-point at ba9daf0 = the UI commit rebased onto the latest owletto main (so it carries both the new-user-routing fixes and the derived view UI).
|
bug_free 78, simplicity 76, slop 0, bugs 0, 0 blockers Typecheck/unit passed. [env] Integration nonzero from untouched public-origin/url-builder/public-pages host expectations and mcp-proxy upstream SSRF blocking; changed entity/query tests passed. Ran bun SQL-scope probes for comments/CTEs, scalar subquery scoping, and member oauth_tokens block -> ok. Full verdict JSON{
"bug_free_confidence": 78,
"bugs": 0,
"slop": 0,
"simplicity": 76,
"blockers": [],
"change_type": "feat",
"behavior_change_risk": "high",
"tests_adequate": true,
"suggested_fixes": [],
"notes": "Typecheck/unit passed. [env] Integration nonzero from untouched public-origin/url-builder/public-pages host expectations and mcp-proxy upstream SSRF blocking; changed entity/query tests passed. Ran bun SQL-scope probes for comments/CTEs, scalar subquery scoping, and member oauth_tokens block -> ok.",
"categories": {
"src": 861,
"tests": 929,
"docs": 0,
"config": 0,
"deps": 12,
"migrations": 80,
"ci": 0,
"generated": 0
}
}Local review gate — branch protection can require the |
…ence
Discovery pass (multi-agent + make review bug_free 66) found the persisted-
inference / semantic-layer half was over-engineered and the source of two churn
bugs. Collapse to the minimal complete model:
- DROP persisted inference: no applyInferredMeasures / stripMeasureAnnotations,
no inferred:true flag, no withoutInferredAnnotations diff-strip. metadata_schema
is stored verbatim; measure vs. dimension is classified ON READ (get returns
measure_columns). Kills both churn bugs; apply diff is plain + idempotent.
- DROP backing_grain + backing_source columns (zero readers; re-add with a
consumer). One column remains: backing_sql (NULL ⇒ stored).
- DROP the 6-value ReaggRule + numerator/denominator + measure()/dimension()
authoring API (write-only; no rollup engine). SDK backing is just { sql }.
Bug fixes from the same pass:
- query_sql: drop the over-broad ORDER BY/LIMIT guard (the base SQL is wrapped
in a subquery, so inner ORDER BY / window functions are valid) + make sort_by
optional (a view's columns aren't known upfront).
- inferColumns: generic aggregates (bool_or, jsonb_agg, percentile … WITHIN
GROUP) now classify as measures, not dimensions.
- derived-row-creation rejection moved into createEntity (the single chokepoint
that also resolves public-catalog types) — closes the public-type bypass.
Migration reduced to a single backing_sql column. ~450 net lines removed.
chore: bump owletto pointer to c9fa21c (lean derived UI).
pi flagged both (cross-org leaks, blocker-class): - Multi-statement: only the first statement was parsed/scoped, so a trailing `; SELECT … FROM public.oauth_tokens` ran UNSCOPED (exploitable via metric_series, which is member-accessible and doesn't subquery-wrap). Now extractTableRefs rejects multiple statements. - Depth-50 fail-open: the schema-qualified-ref recursion silently returned past depth 50, so a deeply-nested public.<table> bypassed scoping. Replaced with an iterative (stack) traversal — no depth cap, seen-set bounds cycles. Regression repros (60-level nested public.events; multi-statement) added to scoped-query-schema-rejection.test.ts. Consumers (query_sql/metric_series/ watchers/derived-query) re-run green.
…ight pi found two more bypasses of the derived-row invariant (a derived type must never have stored rows): - entity-link-upsert (connector auto-create) inserted rows without going through createEntity's guard — added the backing_sql check there too. - update could convert a POPULATED stored type to derived, orphaning its rows — now rejected when entity_count > 0. Rather than keep patching insert paths (whack-a-mole — pi found a different one each review round), add a DB BEFORE INSERT trigger on entities that rejects rows for derived types. That makes the invariant hold regardless of code path; the app guards stay for friendly errors. Fires only for derived types (one PK lookup; no effect on normal stored inserts). +regression tests: convert-populated rejected, direct raw insert rejected by the trigger.
…dow; bump owletto pointer
…hand + expression-nested subqueries) The @polyglot-sql/sdk-based org-scoping relied on ast.getTableNames, which has two blind spots that each leaked once query_sql became member-accessible: 1. PostgreSQL's `TABLE <name>` shorthand mis-parses as a column → zero table refs → the query ran UNSCOPED and past the admin-only-table gate. 2. getTableNames does not descend into subqueries nested inside an EXPRESSION (CASE / scalar SELECT-list) → such a table was neither org-scoped nor admin-gated, leaking oauth_tokens to non-admins and other orgs' rows. Fixes: - Reject non-SELECT/WITH queries (incl. the TABLE shorthand) up front. - extractTableRefs now returns the UNION of getTableNames and a complete raw AST walk, so a table escapes scoping only if BOTH extractors miss it; this drives the unknown-table check, the admin gate, AND org-scoping. - Fail-closed CTE-collision guard: reject a query whose CTE name shadows a real or admin table, since the parser cannot disambiguate CTE vs base table by lexical scope. Regressions added at both the validateAndScopeQuery and the member-context query_sql tool level.
…ug read-tier + trigger UPDATE coverage Blocker: metric_series is member-reachable but omitted safeColumns, so its org-scoping CTEs used SELECT * — a member charting `connections` got the withheld `credentials` column (OAuth tokens). Pass SAFE_COLUMN_DEFS, matching query_sql. - query_sql: only require owner/admin (and only escalate callerIsAdmin) when org_slug points at ANOTHER org; passing your own org slug stays read-tier, matching the existing comment's intent. - derived-row trigger: also fire on UPDATE OF entity_type_id (re-pointing a stored row at a derived type), and add a guard on entity_types so setting backing_sql on a type that still has stored rows is rejected at the DB. Regressions: member metric_series credential-column safety; both new trigger paths.
…metric_series Defense-in-depth beyond the app-layer admin gate, since parser-based table extraction can't be proven airtight for arbitrary member SQL. Non-admin query_sql/metric_series now run their (org-scoped, app-gated) SQL under a `lobu_query_restricted` Postgres role via SET LOCAL ROLE. The role can read every queryable table EXCEPT the auth/identity tables (oauth_tokens, oauth_clients, user), so any future parser hole that reaches one hits a DB "permission denied" instead of leaking. Cross-org isolation stays enforced by the org-scoping CTEs; this role is purely a table-level backstop. - Migration is fully tolerant: if the DB user lacks CREATEROLE (managed prod cluster), CREATE ROLE / grants no-op via DO/EXCEPTION blocks and the deploy still succeeds. - Runtime applies SET LOCAL ROLE only when the role exists (cached per-process pg_roles check); absent the role, callers fall back to the app-layer gate, so no member query ever breaks. - REVOKE list must stay in sync with ADMIN_ONLY_QUERYABLE_TABLES. Tests: the role denies oauth_tokens (permission denied) but reads operational tables; a member's ordinary query_sql still returns rows under the role.
…eck uses assumability - inferColumns: a cast/paren wraps the aggregate node, so COUNT(*)::int and SUM(x)::numeric were classified as dimensions. Peel cast/paren wrappers before matching MEASURE_EXPR_TYPES; a cast of a non-aggregate stays a dimension. - restrictedQueryRoleAvailable: check pg_has_role(current_user, role, 'MEMBER'), not mere existence. A role that exists but isn't grantable to the connecting user would make SET LOCAL ROLE throw and abort every member query; now that case falls back to the app-layer gate instead. CASE short-circuits so pg_has_role isn't evaluated for a non-existent role.
…in the testable app layer The restricted-role layer (SET LOCAL ROLE + a provisioned Postgres role) protected only the 3 auth tables, didn't help cross-org, and spread the control across 5 places with a hand-synced deny-list. Worse, its prod-specific branches (no CREATEROLE / role not assumable) can't be exercised under the superuser embedded Postgres tests, so the security-relevant behavior was verified only by reading. Keep the single, deterministic, fully-testable control in validateAndScopeQuery (union table extraction + fail-closed CTE-collision guard + admin gate) and back it with a completeness invariant test: an admin-only table referenced in ANY position (join, WHERE/EXISTS, scalar + CASE-nested subquery, CTE body, UNION, doubly-nested) must be rejected for a member, with the inverse asserting clean shapes aren't over-blocked. That test is the canary for a future parser blind spot — it goes red in CI before a leak can ship. Removed: migration 20260531130000, RESTRICTED_QUERY_ROLE/restrictedQueryRoleAvailable, the SET LOCAL ROLE wiring, and restricted-query-role.test.ts. Unchanged: parser fixes, metric_series column allowlist, org_slug read-tier, derived-row triggers, measure-inference cast fix.
…in validateAndScopeQuery Two pre-existing edge cases the review surfaced once query_sql/metric_series became member-reachable (both latent on main while the tools were admin-only): - validateTableQuery rejected ORDER BY/GROUP BY references to a query's OWN output alias (e.g. SELECT count(*) AS n ... ORDER BY n) as 'unknown column'. Now suppresses that E201 only when the name is a top-level output alias; a genuinely-unknown column and an excluded column referenced via alias (SELECT credentials AS x) are still rejected. Org-scoping/admin-gating are independent of this validator, so no security relaxation. - buildScopedQuery prepended a second WITH when the user query led with a SQL comment before WITH, emitting invalid SQL. Strip leading comments before the merge decision so the injected CTEs splice into the single existing WITH.
…empty-backing guard; measure-test coverage
From the adversarial audit (39 findings → 5 confirmed, all low/medium):
- execute-data-sources: drop the getTableNames half of extractTableRefs' union.
Differential-tested across 20 query shapes (comma joins, LATERAL, ONLY,
recursive CTE, set ops, subqueries in FROM/ORDER BY/HAVING): the raw walk is a
strict SUPERSET of getTableNames, so it added zero coverage — and was no hedge
either (if the raw walk broke, getTableNames alone is incomplete and would leak
regardless). The completeness-invariant suite guards raw-walk-only.
- cli map-config: reject an empty/whitespace backing.sql at load time, matching
the file's 'fail loud in the CLI before any remote mutation' contract instead
of a confusing mid-apply server 4xx.
- infer-measures test: cover the genuine parse-failure branch ('SELECT (') — the
prior 'this is not sql' case passes via the non-select guard (it parses to a
'not' node), never exercising !res.success.
… trigger ignores soft-deleted rows - metric_series ran its transaction WITHOUT SET TRANSACTION READ ONLY, while validateAndScopeQuery accepts data-modifying WITH CTEs (the SELECT/WITH guard only checks the prefix). A member could therefore mutate via `WITH x AS (DELETE FROM events RETURNING id) SELECT count(*) FROM x`. Add READ ONLY first in the txn (DB-enforced, mirrors query_sql) so writes fail closed. Regression: member DML-CTE rejects + the row is untouched. - reject_derived_conversion_with_rows trigger counted ALL rows, but the app convert-guard counts only live rows (deleted_at IS NULL). A type whose rows are all soft-deleted was blocked at the DB despite the app allowing it. Add `AND e.deleted_at IS NULL` so the trigger matches app semantics. Regression: convert-to-derived succeeds when the only row is soft-deleted.
…w are allowed (wrapped as subquery) The registry description told agents 'Do NOT include ORDER BY/LIMIT/OFFSET', but query_sql wraps the query as a subquery so those are valid inside it (pagination/sort come from the sort_by/limit/offset args). Only positional params are forbidden. Matches QuerySqlSchema's sql-arg description.
…r scanner CodeQL js/redos (high) on the SELECT/WITH guard's regex: the nested-quantifier comment strip `(?:--…\n|/*…*/)*` backtracks catastrophically on crafted input (many unclosed /*), and it runs on member-supplied SQL on every query_sql/ metric_series call — a real DoS. Replace with stripLeadingComments (a single linear indexOf scan) + isReadQuery, shared by validateAndScopeQuery and metric_series (removing metric_series' duplicate copy of the same flagged regex). Regression: 100k unclosed /* resolves in <1s instead of hanging.
What
Adds derived entity types — an entity type can be a read-only SQL view over other relations instead of inserted rows. Its aggregate columns are inferred as measures (with a re-aggregation rule); grouping columns become dimensions. Stored entities are the default and are unchanged.
This is the substrate for filling things like subscriptions/spend from the event stream: define a view, get governed measures, no hand-annotation for the common cases.
Model (kept deliberately small)
defineEntityType({ backing: { sql, grain?, source? } }). Presence ofbacking⇒ derived; absence ⇒ stored. Nomodeflag (derived ⟺backing_sql IS NOT NULL).x-measureon the existingproperties/x-convention.measure()/dimension()helpers + aReaggRuletype.backing_sql/backing_grain(text[]) /backing_sourcecolumns onentity_types(idempotent migration). The tool/CLI surface uses a single nestedbackingarg.Inference / re-agg classifier
Parses the view SELECT (node-sql-parser) and classifies each projection column:
SUM/COUNT→ additive ·COUNT(DISTINCT)/MEDIAN/PERCENTILE→ holistic ·AVGanda/b→ ratio ·MIN/MAX→ extremum · grouping cols → dimension. Inferred roles merge intometadata_schema; author-declared annotations always win. Reverting a type to stored strips the derived-only annotations.Apply round-trip
defineEntityType→ map-config → desired-state → diff → client upsert → server create/update → list/hoist. Stored entities carry nobackingon either side, so they never churn the diff.Tests / validation
text[]write-encoding bug, nowpgTextArray+::text[].)upsertEntityTypesends nestedbackingfor derived /nullfor stored.infer-measures) +applyInferredMeasuresdeclared-wins +mapEntityTypebacking mapping.lobu run+lobu apply— it would boot a long-lived gateway against the shared devDATABASE_URL(dev-DB pollution risk) and only adds the unchanged routing/transport hop, which is bracketed by the wire-payload + server-integration tests above.CI note
If the
integrationjob shows red, it's pre-existing environmental failures (isolated-vmnot built for the runner,url-builder/public-origin/public-pagesenv-var tests) — none in files this branch touches; this branch's own suites pass in the same run.Scope / deferred (intentional)
trip.costover events in a window —over/attribution) are deferred; trips/assets can be modeled as derived views in the meantime.node-sql-parserretained: polyglot 0.1.15 doesn't expose usable table/projection extraction (getTablesinternal;getSourceTablesthrows on its untypedparse().ast), and the org-scoping path can't ride a reverse-engineered AST.Follow-ups
query_sql/ view template) that surfaces these measures with re-aggregation.🤖 Generated with Claude Code
Need help on this PR? Tag
@codesmithwith what you need. Autofix is disabled.Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests