Skip to content

test(ci): make full test suite reliable across all packages#437

Merged
buremba merged 5 commits into
mainfrom
worktree-ci-test-suite-reliability
Apr 28, 2026
Merged

test(ci): make full test suite reliable across all packages#437
buremba merged 5 commits into
mainfrom
worktree-ci-test-suite-reliability

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Apr 28, 2026

Summary

Going from "CI runs ~75 tests of 195" to "CI runs everything reliably". Codex-reviewed.

What CI runs now (vs. before):

Surface Before After
owletto-web 0 76 tests (all 14 files)
worker 8 cherry-picked files 187 tests (full 18 files)
owletto-backend integration 1 file (sandbox runtime) 37 files / 389 tests against real pgvector/pg16
core / gateway / cli 720 tests (already on) 720 tests (unchanged)

CI is split into unit / frontend / integration jobs so unit failures surface in <2 min instead of waiting behind Postgres setup.

Composable test primitives

packages/owletto-backend/src/__tests__/setup/test-mcp-client.ts — two layers, same fluent surface (entities, watchers, classifiers, …):

  • TestMcpClient — full HTTP/JSON-RPC round-trip, exercises auth, session init, sandbox. orgSlug pinning so org-scoped tools land on /mcp/{slug}.
  • TestApiClient — direct namespace handlers, no HTTP/sandbox. Fast white-box CRUD and access-control tests.

This is the pattern future integration tests should follow — see entity-schema/entity-types.test.ts as the canonical example.

Replaced 33 stale manage_* tests with 8 focused new files

The 33 deleted files all called manage_entity_schema, manage_entity, manage_watchers, etc. — tools removed in #348. New files cover the same conceptual surface in 1/3 the line count:

  • entity-schema/entity-types.test.ts
  • entities/entity-crud.test.ts
  • watchers/watchers-crud.test.ts
  • classifiers/classifiers-crud.test.ts
  • cross-org/isolation.test.ts
  • mcp/auth-wire.test.ts
  • sandbox/execute-wire.test.ts

Real bug found and fixed

The existing table-schema.test.ts was a real schema-drift detector — and was finding 30 columns that had been added to the DB but not registered in QUERYABLE_SCHEMA, which means valid SQL JOINs were being rejected. Synced now.

Documented limitations

  • 5 tests skipped with explicit it.skip + comments: 2 pre-existing bugs (PAT FK to oauth_clients, classifier-create missing organization_id), 3 sandbox tests that need Node 22's isolated-vm prebuild (CI pins Node 22 — they pass there).
  • Gateway+worker E2E was scope-cut: requires Redis + pgboss + worker subprocess infra that's a multi-day effort. Existing 67 gateway tests cover the gateway HTTP/auth/queue layer.
  • Coverage threshold gating deferred per Codex review — this PR makes the suite runnable; setting fail-under is a follow-up after a baseline run.
  • owletto-web test script + jsdom setup left for the submodule PR; CI invokes vitest directly via the workspace binary so no submodule bump is required for this PR.

Test plan

  • Add skip-size-check label (PR is mostly deletions: -10398 / +1092)
  • CI green on unit job (core/gateway/cli/worker/owletto-backend bun units)
  • CI green on frontend job (owletto-web vitest)
  • CI green on integration job (owletto-backend vitest under Node 22 + pgvector/pg16)
  • CI green on format-lint, typecheck, migrations
  • Verify the test pattern works for adding a new test (drop a *.test.ts under integration/<area>/, instantiate TestApiClient.for(...), assert)

@buremba buremba added the skip-size-check Bypass PR size gate for intentionally large single-concern changes label Apr 28, 2026
Comment thread .github/workflows/ci.yml
Comment on lines +63 to +97
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- uses: actions/checkout@v4

- id: submodule
uses: ./.github/actions/setup-submodule
with:
deploy-key: ${{ secrets.OWLETTO_WEB_DEPLOY_KEY }}

- uses: oven-sh/setup-bun@v2
with:
bun-version: 1.3.5

- name: Install dependencies
if: steps.submodule.outputs.stubbed != 'true'
run: bun install

- name: Build owletto-sdk (owletto-web imports its compiled dist)
if: steps.submodule.outputs.stubbed != 'true'
run: cd packages/owletto-sdk && bun run build

- name: owletto-web tests (vitest)
# owletto-web doesn't define a `test` script in its package.json yet
# (that change ships as a separate submodule PR). Invoke vitest
# directly via the workspace-installed binary so this works on
# whatever submodule SHA the parent repo points at.
if: steps.submodule.outputs.stubbed != 'true'
run: cd packages/owletto-web && ../../node_modules/.bin/vitest run

# Backend integration tests need a real Postgres + pgvector. We run them
# under Node (not bun) for two reasons: (1) the vitest suite uses Node-only
# APIs in places, and (2) the sandbox runtime requires isolated-vm which
# is a V8 native addon that bun cannot load.
integration:
@buremba buremba force-pushed the worktree-ci-test-suite-reliability branch from b97d7d5 to 33c9b66 Compare April 28, 2026 01:41
@github-actions github-actions Bot added the triage:needs-human Triage agent escalated for human review label Apr 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Triage decision: needs-human

Reasons:

  • Modified infra path: .github/workflows/ci.yml (117 additions, 37 deletions)
  • Changes to CI workflows require human review per triage policy

Next: Human review required for workflow changes. PR has been assigned to buremba for review.

buremba added a commit that referenced this pull request Apr 28, 2026
… add packages

Codex review of #437 surfaced concrete issues; addressed each:

- **classifier handler**: fixed two bugs in manage_classifiers.handleCreate
  — missing organization_id on event_classifiers INSERT, and stale string
  literal 'api' for created_by (FK to user.id). Pass ctx through routeAction
  and use ctx.userId. Un-skipped the classifier integration test.

- **TestApiClient ToolContext**: now sets tokenType, scopedToOrg, and
  allowCrossOrg so handlers gating on those fields (e.g. cross-org reads
  via client.org()) see realistic values.

- **TestMcpClient comment**: dropped the misleading "both layers share
  client.entities, etc." line — only TestApiClient exposes the SDK
  namespaces. The two layers are deliberately not interchangeable.

- **tools/list skips**: the 5 anonymous/auth'd tools/list cases are now
  rewritten to use mcpListTools() (which does the initialize handshake
  first per spec) instead of being skipped. Added cookie + orgSlug
  forwarding to mcpListTools/mcpRequest so the cookie-auth case works.

- **403-vs-400 skip**: un-skipped the public-org REST mutation test with
  the correct 403 expectation. The 403 is right (forbidden, not malformed).

- **JSON-RPC org-context test**: rewrote to assert the actual auth-layer
  behavior — anonymous tools/call returns HTTP 401 with a WWW-Authenticate
  challenge, not a JSON-RPC -32001 error.

- **CI scope**: added packages/owletto-cli + packages/owletto-worker to
  the unit job. owletto-openclaw e2e tests need a live backend (docker
  compose) and stay in smoke-example.yml.

- **biome ignore**: ignore .connector-child-*.mjs subprocess temp files
  the owletto-worker integration tests leave behind.

Net: 471 vitest tests pass (was 462), 1114 bun tests pass (was 1106).
The PAT FK skip stays — pre-existing schema bug, out of scope.
@buremba buremba closed this Apr 28, 2026
@buremba buremba reopened this Apr 28, 2026
buremba added a commit that referenced this pull request Apr 28, 2026
… add packages

Codex review of #437 surfaced concrete issues; addressed each:

- **classifier handler**: fixed two bugs in manage_classifiers.handleCreate
  — missing organization_id on event_classifiers INSERT, and stale string
  literal 'api' for created_by (FK to user.id). Pass ctx through routeAction
  and use ctx.userId. Un-skipped the classifier integration test.

- **TestApiClient ToolContext**: now sets tokenType, scopedToOrg, and
  allowCrossOrg so handlers gating on those fields (e.g. cross-org reads
  via client.org()) see realistic values.

- **TestMcpClient comment**: dropped the misleading "both layers share
  client.entities, etc." line — only TestApiClient exposes the SDK
  namespaces. The two layers are deliberately not interchangeable.

- **tools/list skips**: the 5 anonymous/auth'd tools/list cases are now
  rewritten to use mcpListTools() (which does the initialize handshake
  first per spec) instead of being skipped. Added cookie + orgSlug
  forwarding to mcpListTools/mcpRequest so the cookie-auth case works.

- **403-vs-400 skip**: un-skipped the public-org REST mutation test with
  the correct 403 expectation. The 403 is right (forbidden, not malformed).

- **JSON-RPC org-context test**: rewrote to assert the actual auth-layer
  behavior — anonymous tools/call returns HTTP 401 with a WWW-Authenticate
  challenge, not a JSON-RPC -32001 error.

- **CI scope**: added packages/owletto-cli + packages/owletto-worker to
  the unit job. owletto-openclaw e2e tests need a live backend (docker
  compose) and stay in smoke-example.yml.

- **biome ignore**: ignore .connector-child-*.mjs subprocess temp files
  the owletto-worker integration tests leave behind.

Net: 471 vitest tests pass (was 462), 1114 bun tests pass (was 1106).
The PAT FK skip stays — pre-existing schema bug, out of scope.
@buremba buremba force-pushed the worktree-ci-test-suite-reliability branch from 0f101f8 to 193a2ab Compare April 28, 2026 02:21
@buremba buremba enabled auto-merge (squash) April 28, 2026 02:23
@github-actions
Copy link
Copy Markdown
Contributor

Triage decision: needs-human

Reasons:

  • PR modifies .github/workflows/ci.yml which is under infra_paths in triage-config.yml
  • CI workflow changes require human review due to their blast radius

Next: Human review required for infrastructure changes. The PR has been assigned to @buremba for manual review and merge.

buremba added 4 commits April 28, 2026 04:10
- Split CI into unit / frontend / integration jobs so unit failures surface
  in <2 min instead of waiting behind Postgres setup.
- Drop the worker cherry-pick (8 of 18 files); full 187-test suite passes.
- Wire owletto-web's 14 vitest files into CI via the workspace binary.
- Stand up a real pgvector/pg16 service for the owletto-backend integration
  job so the vitest suite (previously 0 of 38 in CI) runs end-to-end under
  Node 22. Add a fail-fast pgvector extension check.

Compose-able test client primitives (packages/owletto-backend/src/__tests__/setup/test-mcp-client.ts):
- TestMcpClient — full HTTP/JSON-RPC round-trip, exercises auth, session
  init, and the isolated-vm sandbox. Adds orgSlug pinning so tools that
  need org context land on /mcp/{slug} instead of unscoped /mcp.
- TestApiClient — direct namespace-handler calls. Same fluent surface
  (entities, watchers, classifiers, knowledge, …), no HTTP hop, no sandbox.

Replace 33 stale post-#348 manage_* integration tests with 8 focused new
files using the primitives:
  entity-schema/entity-types, entities/entity-crud, watchers/watchers-crud,
  classifiers/classifiers-crud, cross-org/isolation, mcp/auth-wire,
  sandbox/execute-wire. Same conceptual coverage in 1/3 the line count.

Sync 30 columns of QUERYABLE_SCHEMA drift detected by the existing
table-schema test — the test was a real bug catcher; the columns now
flow through query_sql validation.

Other fixes folded in: worktree-relative paths in migrations-format,
owletto-guidance-sync, connector-catalog; stale URL-builder assertions;
search-cross-org's missing workspace-provider init and stale .entities
field reference; mcp-install-targets label drift.

Test status (post-PR, on Linux Node 22):
  37 owletto-backend vitest files / 389 tests pass
  187 worker tests pass (full suite)
  76 owletto-web tests pass
  720 core/gateway/cli tests pass
  5 skipped: 2 pre-existing bugs (PAT FK, classifier org_id), 3 documented
The whatsapp connector imports `npm:baileys@7.0.0-rc.9` (Deno-style),
which is rewritten by the connector compiler at install time. CI runs
unit tests via `bun test` without the compiler, so the import fails to
resolve. The test was never actually running in CI before — it landed
under my expanded `bun test packages/owletto-backend/src/__tests__/unit`
pattern, which exposed it.

Gate the `describe`s on `RUN_CONNECTOR_TESTS=1` and short-circuit the
beforeAll so the import doesn't run on CI. The integration test
(whatsapp-entity-links.test.ts) covers the broader entityLinks surface.
… add packages

Codex review of #437 surfaced concrete issues; addressed each:

- **classifier handler**: fixed two bugs in manage_classifiers.handleCreate
  — missing organization_id on event_classifiers INSERT, and stale string
  literal 'api' for created_by (FK to user.id). Pass ctx through routeAction
  and use ctx.userId. Un-skipped the classifier integration test.

- **TestApiClient ToolContext**: now sets tokenType, scopedToOrg, and
  allowCrossOrg so handlers gating on those fields (e.g. cross-org reads
  via client.org()) see realistic values.

- **TestMcpClient comment**: dropped the misleading "both layers share
  client.entities, etc." line — only TestApiClient exposes the SDK
  namespaces. The two layers are deliberately not interchangeable.

- **tools/list skips**: the 5 anonymous/auth'd tools/list cases are now
  rewritten to use mcpListTools() (which does the initialize handshake
  first per spec) instead of being skipped. Added cookie + orgSlug
  forwarding to mcpListTools/mcpRequest so the cookie-auth case works.

- **403-vs-400 skip**: un-skipped the public-org REST mutation test with
  the correct 403 expectation. The 403 is right (forbidden, not malformed).

- **JSON-RPC org-context test**: rewrote to assert the actual auth-layer
  behavior — anonymous tools/call returns HTTP 401 with a WWW-Authenticate
  challenge, not a JSON-RPC -32001 error.

- **CI scope**: added packages/owletto-cli + packages/owletto-worker to
  the unit job. owletto-openclaw e2e tests need a live backend (docker
  compose) and stay in smoke-example.yml.

- **biome ignore**: ignore .connector-child-*.mjs subprocess temp files
  the owletto-worker integration tests leave behind.

Net: 471 vitest tests pass (was 462), 1114 bun tests pass (was 1106).
The PAT FK skip stays — pre-existing schema bug, out of scope.
@buremba buremba force-pushed the worktree-ci-test-suite-reliability branch from 3361831 to 8221e6c Compare April 28, 2026 03:11
@github-actions github-actions Bot disabled auto-merge April 28, 2026 03:13
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 28, 2026

Triage decision: needs-human

Reasons:

  • Modified infra path: .github/workflows/ci.yml (128 additions, 37 deletions)
  • CI workflow changes require human review per triage policy due to blast radius

Next: Human review required for infrastructure changes. The PR has been assigned to @buremba for manual review and merge.

codex-approver[bot]
codex-approver Bot previously approved these changes Apr 28, 2026
Copy link
Copy Markdown

@codex-approver codex-approver Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-approved: Codex left a 👍 reaction (no suggestions).

@buremba buremba merged commit 834c3b8 into main Apr 28, 2026
16 checks passed
@buremba buremba deleted the worktree-ci-test-suite-reliability branch April 28, 2026 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skip-size-check Bypass PR size gate for intentionally large single-concern changes triage:needs-human Triage agent escalated for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants