Skip to content

Add Claude Code GitHub Workflow#3

Merged
buremba merged 2 commits into
mainfrom
add-claude-github-actions-1753660931839
Jul 28, 2025
Merged

Add Claude Code GitHub Workflow#3
buremba merged 2 commits into
mainfrom
add-claude-github-actions-1753660931839

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented Jul 28, 2025

🤖 Installing Claude Code GitHub App

This PR adds a GitHub Actions workflow that enables Claude Code integration in our repository.

What is Claude Code?

Claude Code is an AI coding agent that can help with:

  • Bug fixes and improvements
  • Documentation updates
  • Implementing new features
  • Code reviews and suggestions
  • Writing tests
  • And more!

How it works

Once this PR is merged, we'll be able to interact with Claude by mentioning @claude in a pull request or issue comment.
Once the workflow is triggered, Claude will analyze the comment and surrounding context, and execute on the request in a GitHub action.

Important Notes

  • This workflow won't take effect until this PR is merged
  • @claude mentions won't work until after the merge is complete
  • The workflow runs automatically whenever Claude is mentioned in PR or issue comments
  • Claude gets access to the entire PR or issue context including files, diffs, and previous comments

Security

  • Our Anthropic API key is securely stored as a GitHub Actions secret
  • Only users with write access to the repository can trigger the workflow
  • All Claude runs are stored in the GitHub Actions run history
  • Claude's default tools are limited to reading/writing files and interacting with our repo by creating comments, branches, and commits.
  • We can add more allowed tools by adding them to the workflow file like:
allowed_tools: Bash(npm install),Bash(npm run build),Bash(npm run lint),Bash(npm run test)

There's more information in the Claude Code action repo.

After merging this PR, let's try mentioning @claude in a comment on any PR to get started!

@buremba buremba merged commit 23e733d into main Jul 28, 2025
3 of 14 checks passed
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jul 28, 2025

Claude finished @buremba's task —— View job


PR Review: Add Claude Code GitHub Workflow

I'll review this pull request for code quality, best practices, potential bugs, performance considerations, security concerns, and test coverage.

Review Tasks:

  • Read and analyze the new workflow file .github/workflows/claude-code-review.yml
  • Read and analyze the modified workflow file .github/workflows/claude.yml
  • Review code quality and best practices
  • Check for potential bugs or issues
  • Assess performance considerations
  • Evaluate security concerns
  • Review test coverage
  • Provide comprehensive feedback

Code Review Summary

This PR adds two GitHub Actions workflows for Claude Code integration:

  1. New file: .github/workflows/claude-code-review.yml - Automated PR reviews
  2. Modified: .github/workflows/claude.yml - Manual @claude triggers

Overall Assessment: GOOD

The implementation follows GitHub Actions best practices and provides good security controls. Here's my detailed analysis:

🔍 Code Quality & Best Practices

Strengths:

  • Clear, descriptive workflow names and job names
  • Well-structured YAML with proper indentation
  • Comprehensive commented examples for customization
  • Uses official GitHub actions (actions/checkout@v4)
  • Proper semantic versioning for action (@beta)

Suggestions:

  • .github/workflows/claude-code-review.yml:32 - Consider using fetch-depth: 0 for full history if Claude needs access to complete git history for complex reviews
  • Both files use @beta version - consider pinning to specific release version for production stability

🐛 Potential Bugs/Issues

Minor Issues Found:

  • .github/workflows/claude.yml:26 - The actions: read permission comment mentions "Required for Claude to read CI results on PRs" but this permission might not be needed if Claude doesn't actually read CI results
  • No validation of whether secrets.CLAUDE_CODE_OAUTH_TOKEN exists before workflow execution

Recommendations:

  • Add conditional checks for required secrets
  • Consider adding error handling for failed Claude actions

Performance Considerations

Good:

  • fetch-depth: 1 minimizes checkout time (claude-code-review.yml:32)
  • Workflows only trigger on relevant events
  • Optional path filtering available (claude-code-review.yml:7-11)

Optimization Opportunities:

  • .github/workflows/claude-code-review.yml:4-5 - Consider adding paths filter to only run on code changes, not docs/markdown
  • Could add job concurrency controls to prevent multiple Claude runs on rapid pushes

🔒 Security Concerns

Security Strengths:

  • ✅ Proper permission scoping using permissions block
  • ✅ Uses OIDC token (id-token: write) for secure authentication
  • ✅ Secret management through GitHub Secrets
  • ✅ Read-only permissions for repository content

Security Considerations:

  • .github/workflows/claude.yml:16-19 - The trigger condition allows any user to mention @claude. Consider adding user/organization allowlists for sensitive repos
  • .github/workflows/claude-code-review.yml - Automated reviews run on all PRs by default, which could be expensive. Consider author filtering for public repos

Recommended Security Enhancements:

# Add to claude-code-review.yml for user filtering
if: |
  github.event.pull_request.author_association == 'MEMBER' ||
  github.event.pull_request.author_association == 'COLLABORATOR'

🧪 Test Coverage

Gap Identified:

  • No test workflows or validation for the GitHub Actions themselves
  • Missing integration tests to verify Claude responses

Suggestions:

  • Add workflow validation using actionlint or similar tools
  • Consider adding a test job that validates the workflow syntax
  • Test the integration with a sample PR/issue

📋 Additional Recommendations

  1. Documentation: The inline comments are excellent, but consider adding a README section about the workflow setup

  2. Monitoring: Add workflow status badges or notifications for failed Claude runs

  3. Rate Limiting: Consider adding workflow concurrency limits to prevent API abuse:

    concurrency:
      group: ${{ github.workflow }}-${{ github.ref }}
      cancel-in-progress: true
  4. Error Recovery: Add retry logic for transient failures

🎯 Specific File Feedback

.github/workflows/claude-code-review.yml:

  • Line 44-52: Excellent prompt structure for automated reviews
  • Lines 67-69: Smart conditional prompts for different contributor types
  • Consider uncommenting and customizing lines 72-77 for your specific needs

.github/workflows/claude.yml:

  • Lines 15-19: Robust trigger conditions covering all comment types
  • Line 40-41: Good addition of additional_permissions for CI result access
  • Consider adding the optional allowed_tools (line 53) for enhanced functionality

Final Verdict

This is a well-implemented Claude Code integration that follows GitHub Actions best practices. The code quality is high, security is properly considered, and the structure is maintainable. Main recommendations are around adding user filtering for security and performance optimization through path/author filtering.

Ready for production with minor security considerations addressed.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Jul 28, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@buremba buremba deleted the add-claude-github-actions-1753660931839 branch February 25, 2026 22:02
buremba added a commit that referenced this pull request Apr 26, 2026
Pi flagged two BLOCKERS in the previous round:

1. **Connection metadata leak.** `formatEntityResult` calls
   `fetchConnectionsForEntity(primaryEntity.id)` with no caller-org
   scope. For a public-catalog entity referenced by multiple tenants,
   any tenant searching that entity would receive other tenants'
   connection display names, configs, and feed entity names. Now
   skipped when the primary entity is in a different org from the
   caller — connections are tenant operational data, never canonical.

2. **Cross-tenant stat side channel.** Count subqueries in the SELECT
   (content_count, connection_count, watcher_count, children_count)
   computed globally for the entity id; for public-catalog entities
   referenced from many tenants, this leaks aggregate activity volumes.
   Now gated `CASE WHEN e.organization_id = $callerOrg THEN ... ELSE 0
   END` for each count, so cross-org rows return zeros for operational
   stats. Children query also scoped to primary's own org.

Also addressing IMPORTANT #3: tenant-local results were getting pushed
out by high-scoring public matches. ORDER BY now `(e.organization_id =
$caller) DESC, match_score DESC` so caller-org wins ties.
buremba added a commit that referenced this pull request Apr 26, 2026
…377)

* feat(world-model): cross-org relationship_types + catalog discovery in search

Closes the two BLOCKER gaps pi flagged after #374:

1. **Schema search path for entity_relationship_types** (`tools/admin/manage_entity.ts::handleLink`).
   Mirrors what #374 did for entity_types: tenant first, then any
   `visibility='public'` org. Tenant-local relationship types still win.
   Without this, even though entities can use public-catalog vocabulary,
   relationships couldn't — e.g. a tenant relating their `\$member` to a
   canonical Apple Inc would have to register a local copy of `works_at`.

2. **Public-catalog discovery in `tools/search.ts`**. Adds an
   `include_public_catalogs` arg (defaults to true) so tenant agents can
   find canonical entities (HMRC, banks, currencies, …) by name/type
   without knowing entity ids upfront. Result rows already carry
   `organization_id`, so the agent can tell tenant-local from canonical
   hits. `fetchEntityById` widens the same way so an entity_id lookup
   following a search hit resolves cleanly.

No DB migration. Tests:
- `tools/__tests__/search-cross-org.test.ts` (3): public+tenant in one
  call; flag=false hides public; private orgs not snooped
- `entity-relationships.test.ts`: tenant uses a `works-at-public`
  relationship_type defined in a public catalog org

* fix(cross-org-fixes): close privacy leaks in cross-org search

Pi flagged two BLOCKERS in the previous round:

1. **Connection metadata leak.** `formatEntityResult` calls
   `fetchConnectionsForEntity(primaryEntity.id)` with no caller-org
   scope. For a public-catalog entity referenced by multiple tenants,
   any tenant searching that entity would receive other tenants'
   connection display names, configs, and feed entity names. Now
   skipped when the primary entity is in a different org from the
   caller — connections are tenant operational data, never canonical.

2. **Cross-tenant stat side channel.** Count subqueries in the SELECT
   (content_count, connection_count, watcher_count, children_count)
   computed globally for the entity id; for public-catalog entities
   referenced from many tenants, this leaks aggregate activity volumes.
   Now gated `CASE WHEN e.organization_id = $callerOrg THEN ... ELSE 0
   END` for each count, so cross-org rows return zeros for operational
   stats. Children query also scoped to primary's own org.

Also addressing IMPORTANT #3: tenant-local results were getting pushed
out by high-scoring public matches. ORDER BY now `(e.organization_id =
$caller) DESC, match_score DESC` so caller-org wins ties.

* fix(cross-org-fixes): zero out children content_count for cross-org primaries

Pi follow-up: maintains the 'operational counts are zero for cross-org'
invariant consistently — children of a public-catalog primary now show
content_count=0 to match the primary's own zeroed stats.
buremba added a commit that referenced this pull request Apr 27, 2026
* feat(world-model): cross-org schema CRUD + read-side tolerance

Closes the tenant-facing surface that consumes the agent-side cross-org
plumbing landed in #374/#377. Items #1, #4 from
docs/plans/world-model.md "Outstanding work"; #3, #5 collapse to doc-only.

- manage_entity_schema list/get widen to (caller_org OR visibility=public)
  with tenant-first ORDER BY; rows now carry organization_slug. Same
  pattern used in entity-management.ts:249-260 resolver.
- resolve_path widens both intermediate and leaf entity lookups so a
  tenant path can traverse into a public-catalog entity referenced via
  a cross-org relationship.
- getEntity widens the read; comment already promised "own org or public".
- Re-key entity_count helpers from slug to entity_type_id so cross-org
  slug collisions don't merge counts across rows.
- Item #3 noted as already shipped (organization-dropdown.tsx already
  splits Your Organizations / Public Organizations with a separator).
- Item #5 deferred — no exposed updateOrganization mutation today; the
  guard SQL is preserved inline for the future implementer.

* docs(world-model): item #6 first-pass changelog

Pruned classification-test-brand (id=45) from market-intelligence.
Held back the $member rows (real membership, not cruft) and the
template-seed verticals (need user call before pruning whole orgs).

* fix(world-model): gate operational counts + $member ACL after cross-org widening

Pi review of #386 flagged three real regressions introduced by the
cross-org read widening. Fixes:

- getEntity: scope total_content / active_connections / watchers_count /
  children_count by caller org. When `e` is a public-catalog row, totals
  now reflect the caller's references to it, never aggregate cross-tenant
  activity.
- resolve_path leaf: same scoping for total_content (events) and
  watchers_count.
- Exclude $member from public-catalog fallback in getEntity, resolve_path
  intermediate, and resolve_path leaf. Member-redaction uses
  ctx.memberRole (caller's workspace role), so a tenant admin/owner could
  otherwise read a public catalog's $member email by virtue of being
  admin of their own org. $member rows are per-tenant by design.
- rtHandleList relationship_count: scope by caller's organization so
  public relationship-type rows don't expose global usage volume.

Pre-existing concerns flagged in review but out of scope for this PR
(documented for follow-up): resolve_path bootstrap entity-type counts
(unscoped + missing deleted_at), schema get's slug ambiguity across
multiple public catalogs, requireRelationshipType denying list_rules on
public RTs.
buremba added a commit that referenced this pull request May 15, 2026
… flow (#746)

End-to-end testing PR #740 against prod surfaced three blockers that each
silently undermined the new provider-key path. All three are tiny:

1. **`lobu apply` skipped provider-key push when settings were noop.**
   The `setProviderApiKey()` call was nested inside the `rowsByKind("settings")`
   loop, so on idempotent re-runs (settings unchanged) the key push never fired.
   First-time apply against a target where the agent already exists never
   wrote `agent_secrets`. Moved the push out of the settings loop into its
   own walk over `ctx.state.agents` — the secret value isn't part of the
   settings JSON, so it can need updating even when settings are noop.

2. **`lobu chat` ignored the active context unless `--context` was passed.**
   `chatCommand` only resolved the context when `options.context` was
   explicitly set; otherwise it fell straight through to `resolveGatewayUrl`
   (which defaults to `http://localhost:8787`). Anyone with `lobu context use
   <remote>` set silently sent every chat to a non-existent local gateway —
   manifesting as `Error: fetch failed`. Now resolves the *active* context
   first; the localhost default kicks in only when no context exists.

3. **Worker-side org lookup returned null because AsyncLocalStorage isn't set.**
   `readOrgSharedProviderKey()` used `tryGetOrgId()` to scope the agent_secrets
   read. The worker-spawn code path that calls `buildEnvVars()` runs outside
   the org-routing middleware's ALS scope, so the helper always returned null
   and the org-shared key was never injected — the agent kept getting "needs
   to connect <provider>" even with the row in place. Switched to deriving
   org from the `agentId` parameter via a join through the `agents` table.
   Globally-unique agent_id makes this correct today; the per-org PK swap
   (Phase B/C) won't break it because that migration adds a composite key
   without removing global uniqueness.

Tested:
- `bun test packages/cli/src/commands/_lib/apply` — 115 pass.
- `bun run typecheck` clean.
- Manual end-to-end verification against prod after deploy:
  - `lobu apply` shows `↻ provider-key food-ordering/z-ai`.
  - DB has the `provider:z-ai:apiKey` row in `lobu-team` org.
  - `lobu chat -a food-ordering "ping"` reaches the right gateway URL
    (was hitting localhost before fix #2).
  - With fix #3 deployed, the worker injects the key and the agent answers.
buremba added a commit that referenced this pull request May 16, 2026
The previous fixup commit updated db/schema.sql to match the reduced
4-index scope but missed committing the migration file itself, so HEAD
still had the original 8-index DROP statements. CI's dbmate ran the
8-drop migration but schema.sql only removed 4 → drift fail.

This commits the matching migration content.
buremba added a commit that referenced this pull request May 16, 2026
* perf: drop 8 unused indexes (5.16 GB) + event_count from list

Three changes bundled around the post-incident perf brainstorm, each
verified against prod:

* db/migrations: drops 8 indexes that pg_stat_user_indexes reports
  idx_scan=0 after 28h of prod uptime — `idx_events_embedding` (4.3 GB
  ivfflat ANN), `idx_events_raw_content_trgm` (554 MB trigram GIN),
  `idx_events_search_tsv` (228 MB fulltext GIN), and 5 smaller ones.
  pg_stat_statements over the same window confirms zero queries touch
  the shapes they serve (<->, <=>, payload_text ILIKE, @@ to_tsquery).
  Search code paths in content-search.ts exist but aren't called in
  prod today; rebuild CONCURRENTLY if/when needed.

  Migration uses plain DROP INDEX rather than CONCURRENTLY because
  dbmate's transaction:false directive doesn't actually exit the
  transaction block against the pq driver (see comment in
  20260426130001_db_integrity_cleanup_concurrent.sql). The operator
  runbook in docs/MIGRATIONS.md "When dbmate fails in prod" covers
  applying CONCURRENTLY out-of-band first, then recording the
  schema_migrations row.

* manage_connections.ts handleList: removes the per-row event_count
  subselect. The supersedes anti-join through current_event_records
  was the entire cost of that query — verified by EXPLAIN ANALYZE
  against prod, 1303ms → 2.3ms (566x). handleGet (single connection
  detail) still computes it; that path is one row and costs ~1.2ms.
  Submodule bump in packages/web pulls in the matching frontend
  changes (owletto-web#136).

* docs/MIGRATIONS.md: appends a Lobu-specific policy paragraph to the
  cascade section — connections are soft-deleted only in prod. The
  cascade UPDATE on events.connection_id is ~13s per call at current
  scale and is the issue at rank #8 in pg_stat_statements. The
  connection-creation rollback path that hard-deletes never-activated
  rows (no events yet, no cascade) is the only acceptable use.

* fix(perf): correct schema.sql drift + bump web submodule

CI's pg_dump produces plain NOT NULL form (vs PG18's named CONSTRAINTs).
The earlier commit's schema.sql had the CONSTRAINT form leaked in from
local regen, failing the schema-drift check. Restored from origin/main +
re-applied only the intended 4-index removals.

Web submodule bump: the earlier pin (perf/connections-detail-event-count-
fallback branch tip) wasn't reachable from web/main per the strict
reachability check. Re-point to current web/main HEAD.

* fix(perf): commit the reduced migration content (pi review #3+#4)

The previous fixup commit updated db/schema.sql to match the reduced
4-index scope but missed committing the migration file itself, so HEAD
still had the original 8-index DROP statements. CI's dbmate ran the
8-drop migration but schema.sql only removed 4 → drift fail.

This commits the matching migration content.
buremba added a commit that referenced this pull request May 16, 2026
Six findings, all addressed:

* **#1 preStop on Recreate makes the gap WORSE.** Pi caught the
  reversal — under `strategy: Recreate` the new pod doesn't start
  until the old one fully terminates, so adding a preStop sleep
  EXTENDS the no-available-server window rather than shrinking it.
  Default `preStopDelaySeconds` to 0 and only emit the lifecycle
  hook when > 0. Document that ops repos should set it only with
  RollingUpdate (which needs RWX storage on workspaces — out of
  scope here).
* **#2 migration too broad.** Tightened WHERE to require
  `feeds.status='active'` and `connections.status='active'` so we
  match exactly the set CheckDueFeeds would process. Paused /
  pending_auth / error feeds are left alone — they don't
  contribute to the error spam and may be intentionally recoverable.
* **#3 Sentry double-report.** `server.ts onError` already calls
  `Sentry.captureException` then `logger.error`; the new logger
  forwarder would have sent the same event again. Added
  `sentryReported: true` marker on that log line; logger transport
  skips forwarding when it sees the marker.
* **#4 dedupe fingerprint too coarse.** Added `err.message` to the
  fingerprint so distinct messages from the same catch site (same
  Error type, same top stack frame) get distinct fingerprints
  within the 60s window.
* **#5 existing string-error call sites are pre-existing tech debt**
  — left as-is for now. Documented in the PR body as a follow-up;
  the new forwarder handles `err` / `error` Error-object payloads
  well, and call sites with stringified errors are a separate
  cleanup pass.
* **#6 warn-skip leaves future orphans being polled forever.** Now
  `createSyncRunWithClient` soft-deletes the feed in-place when no
  active connector_definition is found, so CheckDueFeeds stops
  selecting it. Operators recover by clearing `deleted_at` after
  reinstalling the definition.
buremba added a commit that referenced this pull request May 16, 2026
* fix: close monitoring + deploy gaps from post-incident audit

Three gaps identified by the audit after #771/#772 landed (1914
errors/5min in stdout, zero Sentry issues, single-replica Recreate
deploy strategy = ~30s "no available server" window per rollout):

* **Sentry forwarding from pino** (utils/logger.ts). The existing
  Sentry capture middleware only fires on HTTP 500 responses;
  `logger.error()` from background jobs (CheckDueFeeds, runs queue,
  scheduled tasks) was invisible to monitoring. Now a pino destination
  inspects each log line and forwards `level >= error` to
  `Sentry.captureException` (with stack reconstruction) or
  `captureMessage`. In-process dedupe by (msg, err.type, top stack
  frame) with a 60s window prevents the 380/min repeating-error
  pattern from saturating Sentry — Sentry has its own grouping but
  every captureException still incurs an HTTP call.

* **Orphan-feed cleanup + graceful skip** (queue-helpers.ts +
  20260517020000_softdelete_orphan_feeds.sql). The 'website'
  connector was archived/uninstalled in some orgs but their feeds
  weren't soft-deleted. `createSyncRunWithClient` threw on every
  CheckDueFeeds tick when no active connector_definition existed for
  the (key, org) pair — produced the 380/min error stream that
  masked real issues. Code change: warn + skip instead of throw, so
  future orphans don't spam logs. Migration: one-shot soft-delete of
  feeds matching the orphan criteria (no pinned_version, active
  connection, no active connector_definition for the org).

* **PreStop drain + grace period** (charts/lobu/templates/
  deployment.yaml, values.yaml). Single-replica gateway with
  `strategy: Recreate` (workspaces PVC is RWO) means every deploy
  has a ~30s window where the old pod is gone but the new one isn't
  ready — Cloudflare returns "no available server" to live traffic
  during this window. The 2026-05-16 incident felt extreme because
  the pod was actually crash-looping; the same shape happens briefly
  on every healthy deploy. PreStop hook (15s sleep, configurable via
  app.preStopDelaySeconds) gives Service endpoint deregistration time
  to propagate before SIGTERM, so requests in-flight during the drain
  still hit a live process. terminationGracePeriodSeconds bumped to
  45 to cover preStop + graceful shutdown (~5s). Full elimination
  needs RWX storage for workspaces so we can roll instead of recreate
  — documented in the deployment.yaml strategy comment.

* fix(audit): address pi review on #775

Six findings, all addressed:

* **#1 preStop on Recreate makes the gap WORSE.** Pi caught the
  reversal — under `strategy: Recreate` the new pod doesn't start
  until the old one fully terminates, so adding a preStop sleep
  EXTENDS the no-available-server window rather than shrinking it.
  Default `preStopDelaySeconds` to 0 and only emit the lifecycle
  hook when > 0. Document that ops repos should set it only with
  RollingUpdate (which needs RWX storage on workspaces — out of
  scope here).
* **#2 migration too broad.** Tightened WHERE to require
  `feeds.status='active'` and `connections.status='active'` so we
  match exactly the set CheckDueFeeds would process. Paused /
  pending_auth / error feeds are left alone — they don't
  contribute to the error spam and may be intentionally recoverable.
* **#3 Sentry double-report.** `server.ts onError` already calls
  `Sentry.captureException` then `logger.error`; the new logger
  forwarder would have sent the same event again. Added
  `sentryReported: true` marker on that log line; logger transport
  skips forwarding when it sees the marker.
* **#4 dedupe fingerprint too coarse.** Added `err.message` to the
  fingerprint so distinct messages from the same catch site (same
  Error type, same top stack frame) get distinct fingerprints
  within the 60s window.
* **#5 existing string-error call sites are pre-existing tech debt**
  — left as-is for now. Documented in the PR body as a follow-up;
  the new forwarder handles `err` / `error` Error-object payloads
  well, and call sites with stringified errors are a separate
  cleanup pass.
* **#6 warn-skip leaves future orphans being polled forever.** Now
  `createSyncRunWithClient` soft-deletes the feed in-place when no
  active connector_definition is found, so CheckDueFeeds stops
  selecting it. Operators recover by clearing `deleted_at` after
  reinstalling the definition.

* chore: bump web submodule to current main for drift check
buremba added a commit that referenced this pull request May 16, 2026
Server (packages/server):
- HOST default of 0.0.0.0 conflicted with the no-auth bind guard. The Mac
  runner now explicitly sets HOST=127.0.0.1 in the spawn env (#1), and
  server.ts also gets the same loopback bind guard so an accidental
  LOBU_NO_AUTH=1 in production refuses to start instead of silently
  bypassing auth on the public bind (#5).

- ensureBootstrapPat now runs BEFORE httpServer.listen so the very first
  request can't 503 due to the seeding race (#3). The early-return now
  trusts the DB row, not the bootstrap-pat.txt file: a wiped LOBU_DATA_DIR
  with a leftover file used to leave no-auth permanently 503; now we
  re-mint the user/org/PAT (#4). Production safety guard still skips when
  OTHER (non-bootstrap) users exist.

- getNoAuthUser pins to the BOOTSTRAP_USER_ID + BOOTSTRAP_ORG_ID pair
  directly, not "first owner/admin membership LIMIT 1" — eliminates the
  nondeterminism if bootstrap-user ever gets cross-org memberships (#7).

- isLoopbackHost moved to packages/server/src/utils/loopback.ts so both
  start-local and server share it. Handles the full IPv4 loopback /8,
  ::1, [::1], and IPv4-mapped IPv6 loopback (::ffff:127.x.y.z) (#8).

- New CSRF middleware in index.ts. Only fires when LOBU_NO_AUTH=1. On
  mutating methods (POST/PUT/PATCH/DELETE) requires:
    * Host header is a loopback alias (defeats DNS rebinding).
    * Origin or Sec-Fetch-Site says same-origin/none, OR a custom
      X-Lobu-Client header is present (native clients omit Origin).
    * Content-Type, if set, must be application/json — defeats CSRF
      simple-request form posts that browsers allow without preflight.
  This is the only protection between the no-auth bypass and any
  malicious site the user visits in their browser, so it MUST be on
  whenever no-auth is on (#6).

Mac app (apps/mac/Lobu):
- LocalLobuRunner: pass HOST=127.0.0.1 alongside LOBU_NO_AUTH=1, and
  restore spawnedThisSession tracking so adoptLocalCredentials can refuse
  adoption when start() adopted a pre-existing server instead of
  spawning one. A malicious squatter or someone else's lobu run would
  otherwise receive our synthesised credentials (#2).
- WorkerClient sends X-Lobu-Client: menubar on every request so the new
  CSRF middleware accepts native client traffic that legitimately omits
  Origin (#6).
buremba added a commit that referenced this pull request May 16, 2026
…779)

* feat(server,mac): no-auth mode for embedded server (LOBU_NO_AUTH=1)

Replaces the closed PR #777 ("lift the bootstrap PAT") with the cleaner
Phase B from docs/plans/personal-mode-auth.md: server short-circuits
auth when LOBU_NO_AUTH=1, attributes every request to the local user
ensureBootstrapPat() seeded. The macOS menu bar spawns the runner with
that env set, then sets credentials directly — no PAT to read, no
verification call, no ownership tracking.

Server (packages/server):
- multi-tenant.ts: getNoAuthUser() loads the bootstrap-user + their
  personal org once and caches; resolveAuth() short-circuits with
  owner-role attribution when LOBU_NO_AUTH=1. URL-supplied org slug
  must match the local user's org (single-org by definition).
- start-local.ts: post-listen bind assertion refuses to serve on
  anything other than 127.0.0.1 / ::1 when LOBU_NO_AUTH=1. Surfaces a
  hard error early instead of silently exposing the local user's data.

Mac app (apps/mac/Lobu):
- LocalLobuRunner sets LOBU_NO_AUTH=1 in the spawn env alongside
  LOBU_DATA_DIR.
- AppState.connect() — when targeting the managed runner, calls
  adoptLocalCredentials() with synthesised OAuthCredentials (dummy
  bearer; server ignores it). No PAT file, no userinfo verification,
  no spawnedThisSession check — none of that is needed when the
  server itself bypasses auth.
- MenuBarContent button reads "Start" / "Connect" for managed-runner
  URLs.

User experience: click Start once. Server spawns with no-auth env,
popover transitions to signed-in within ~1 s. No browser, no code, no
approval. The dummy bearer the menu bar sends is never validated; it
exists only so the existing WorkerClient/Authorization scaffolding
doesn't have to learn a "no header" mode.

Defers (still in docs/plans/personal-mode-auth.md):
- CSRF middleware on mutating routes — browser-tab exfiltration risk
  remains until we ship Origin / Sec-Fetch-Site / Host / Content-Type
  checks. Today no-auth mode trusts that the loopback bind is the
  only attack surface.
- Per-user data dir / port for shared macOS user accounts.

* fix(no-auth): address all 8 pi blockers on PR #779

Server (packages/server):
- HOST default of 0.0.0.0 conflicted with the no-auth bind guard. The Mac
  runner now explicitly sets HOST=127.0.0.1 in the spawn env (#1), and
  server.ts also gets the same loopback bind guard so an accidental
  LOBU_NO_AUTH=1 in production refuses to start instead of silently
  bypassing auth on the public bind (#5).

- ensureBootstrapPat now runs BEFORE httpServer.listen so the very first
  request can't 503 due to the seeding race (#3). The early-return now
  trusts the DB row, not the bootstrap-pat.txt file: a wiped LOBU_DATA_DIR
  with a leftover file used to leave no-auth permanently 503; now we
  re-mint the user/org/PAT (#4). Production safety guard still skips when
  OTHER (non-bootstrap) users exist.

- getNoAuthUser pins to the BOOTSTRAP_USER_ID + BOOTSTRAP_ORG_ID pair
  directly, not "first owner/admin membership LIMIT 1" — eliminates the
  nondeterminism if bootstrap-user ever gets cross-org memberships (#7).

- isLoopbackHost moved to packages/server/src/utils/loopback.ts so both
  start-local and server share it. Handles the full IPv4 loopback /8,
  ::1, [::1], and IPv4-mapped IPv6 loopback (::ffff:127.x.y.z) (#8).

- New CSRF middleware in index.ts. Only fires when LOBU_NO_AUTH=1. On
  mutating methods (POST/PUT/PATCH/DELETE) requires:
    * Host header is a loopback alias (defeats DNS rebinding).
    * Origin or Sec-Fetch-Site says same-origin/none, OR a custom
      X-Lobu-Client header is present (native clients omit Origin).
    * Content-Type, if set, must be application/json — defeats CSRF
      simple-request form posts that browsers allow without preflight.
  This is the only protection between the no-auth bypass and any
  malicious site the user visits in their browser, so it MUST be on
  whenever no-auth is on (#6).

Mac app (apps/mac/Lobu):
- LocalLobuRunner: pass HOST=127.0.0.1 alongside LOBU_NO_AUTH=1, and
  restore spawnedThisSession tracking so adoptLocalCredentials can refuse
  adoption when start() adopted a pre-existing server instead of
  spawning one. A malicious squatter or someone else's lobu run would
  otherwise receive our synthesised credentials (#2).
- WorkerClient sends X-Lobu-Client: menubar on every request so the new
  CSRF middleware accepts native client traffic that legitimately omits
  Origin (#6).

* fix(no-auth): close pi round-2 gaps — startup-leak, partial-state, CSRF holes

Pi's verification of the previous fixup commit caught three remaining
issues:

1. AppState's init re-spawns the runner and starts polling using the
   persisted no-auth credentials WITHOUT checking spawnedThisSession.
   If a squatter happened to win :8787 on startup, the polling client
   would send our synthesised "noauth" bearer + X-Lobu-Client to it.
   Now: after startLocalLobu in init, if we didn't actually spawn the
   process, clear the persisted creds and stop. The user will see the
   connection card again and can sign in via OAuth.

2. ensureBootstrapPat checked only the user row's existence before the
   early-return. Partial state (user exists, org or member rows missing)
   would still wedge getNoAuthUser forever. Now we check all three rows
   together — any missing one triggers a re-mint.

3. CSRF middleware: tightened in three ways.
   - Missing Content-Type on a mutation is now rejected (was previously
     a bypass — `if (ct && ...)` skipped when ct was empty).
   - WorkerClient.markNotificationRead now sends Content-Type:
     application/json even with an empty body to satisfy the tightened
     check.
   - OAuthClient.postRawJSON and ChromeBridgeHost.mintChildToken now
     send X-Lobu-Client: menubar so they aren't rejected for missing
     Origin in no-auth mode.
   - Host header validation reuses the shared isLoopbackHost util
     (stripping port + brackets first) so the alias set is consistent
     with the bind-time enforcement.
buremba added a commit that referenced this pull request May 17, 2026
5 issues from pi's review of #814's WatcherDispatcher / completeWatcherRun
that all materialise on real device traffic:

1. (BLOCKER) Schedule advancement mismatch — completing a device watcher
   moved `last_fired_at` forward but never bumped `next_run_at`, so the
   scheduler tick re-materialised the same watcher every minute forever.
   Extract `advanceWatcherSchedule(sql|tx, watcherId)` from automation.ts
   (was `advanceWatcherScheduleAfterTerminalFailure`) and call it from
   both manage_watchers(action="complete_window") and the device
   complete-watcher endpoint after the in-transaction completion writes.

2. Device-identity binding — `claimed_by === body.worker_id` alone is
   spoofable across devices that share a user OAuth token: any other
   device with the token could claim a run under any worker_id, then a
   third caller with the same token could complete it. For user-scoped
   workers we now resolve the caller's `device_workers.id` from
   `(workerUserId, body.worker_id)` and require it to equal
   `approved_input.device_worker_id` (which materializeDueWatcherRuns
   already snapshotted from the watcher pin). Mismatch → 403.

3. Completion race — two concurrent POSTs both passed the unlocked
   `authorizeRunForWorker` status read, both opened a tx, both INSERTed
   a watcher_windows row; the loser's run-UPDATE saw 0 rows and the tx
   committed a phantom window. Now lock `SELECT ... FOR UPDATE` on the
   run row at the top of the tx; if status is no longer 'running',
   return 200 idempotently with `{idempotent: true}`.

4. Window-id allocation — drop the inline `COALESCE(MAX(id), 0) + 1`
   in favour of the codebase's shared `getNextNumericId(tx, 'watcher_windows')`
   helper (whitelisted, same pattern used by manage_watchers).

5. Malformed payload no longer stranded the run — validation that
   throws inside the tx rolls back to `running`, and there's no
   stale-run sweep today. Validate `approved_input.window_start` /
   `window_end` BEFORE the transaction; on bad input mark the run
   `failed` (so next_run_at advances normally) and return 400.

Tests: three integration tests covering #1, #3, and #5. The user-scoped
spoof test (#2) needs OAuth device-flow setup that isn't wired up in
the current test fixtures — leaving it for a follow-up that lands the
helper alongside other user-scoped worker tests.
buremba added a commit that referenced this pull request May 17, 2026
* feat(watchers): device-pinned watcher runs end-to-end (#798 PR-1)

The schema work in #811 added watchers.device_worker_id + agent_kind +
notification + cooldown columns, and #808 made the server-side
dispatcher skip runs already pinned to a device. This PR wires the
pinning + claim + completion sides so a user's Lobu Mac app can
actually execute the run via a local CLI agent (Claude Code, etc.).

Server-side:

  1. `materializeDueWatcherRuns` now reads `watchers.device_worker_id`
     and `watchers.agent_kind` and persists both into the run's
     `approved_input` JSONB. The existing #802 dispatcher exclusion
     keys off `approved_input->>'device_worker_id'`, so device-pinned
     rows stay `pending` on the server side.

  2. `/api/workers/poll` gains a parallel CTE branch for
     `run_type='watcher'` AND
     `approved_input->>'device_worker_id' = <this device's id>`. The
     poll response is short-circuited for watchers — no connector
     code, no credentials, no compiled_code lookup. It returns a
     watcher payload envelope `{ watcher, event, context }` that the
     device-side dispatcher uses to build a CLI prompt.

  3. New endpoint `POST /api/workers/me/runs/:runId/complete-watcher`:
     authorizes via the existing claim-ownership gate
     (`authorizeRunForWorker`), writes a `watcher_windows` row with
     `model_used='device-cli'` and `extracted_data.kind='device_cli_output'`,
     updates `runs.status` to completed/failed, advances
     `watchers.last_fired_at`, and emits a `watcher:updated` lifecycle
     event so dashboard metric_series picks up the run.

  4. The new path is whitelisted in the user-scoped worker route gate
     (was previously 403 for `/api/workers/me/...` outside
     auth-profiles/feeds).

Tests:

  - materializeDueWatcherRuns persists device_worker_id + agent_kind.
  - End-to-end complete-watcher → runs.completed, watcher_windows
    row with the CLI output, last_fired_at advanced.
  - Failure path: error supplied → runs.failed, no window row.
  - 409 for non-watcher run types, 404 for unknown run ids.

PR-2 (Owletto Mac app dispatcher + ClaudeCodeExecutor) consumes this
contract.

* fix(watchers): close pi review on device-side complete-watcher (#814)

5 issues from pi's review of #814's WatcherDispatcher / completeWatcherRun
that all materialise on real device traffic:

1. (BLOCKER) Schedule advancement mismatch — completing a device watcher
   moved `last_fired_at` forward but never bumped `next_run_at`, so the
   scheduler tick re-materialised the same watcher every minute forever.
   Extract `advanceWatcherSchedule(sql|tx, watcherId)` from automation.ts
   (was `advanceWatcherScheduleAfterTerminalFailure`) and call it from
   both manage_watchers(action="complete_window") and the device
   complete-watcher endpoint after the in-transaction completion writes.

2. Device-identity binding — `claimed_by === body.worker_id` alone is
   spoofable across devices that share a user OAuth token: any other
   device with the token could claim a run under any worker_id, then a
   third caller with the same token could complete it. For user-scoped
   workers we now resolve the caller's `device_workers.id` from
   `(workerUserId, body.worker_id)` and require it to equal
   `approved_input.device_worker_id` (which materializeDueWatcherRuns
   already snapshotted from the watcher pin). Mismatch → 403.

3. Completion race — two concurrent POSTs both passed the unlocked
   `authorizeRunForWorker` status read, both opened a tx, both INSERTed
   a watcher_windows row; the loser's run-UPDATE saw 0 rows and the tx
   committed a phantom window. Now lock `SELECT ... FOR UPDATE` on the
   run row at the top of the tx; if status is no longer 'running',
   return 200 idempotently with `{idempotent: true}`.

4. Window-id allocation — drop the inline `COALESCE(MAX(id), 0) + 1`
   in favour of the codebase's shared `getNextNumericId(tx, 'watcher_windows')`
   helper (whitelisted, same pattern used by manage_watchers).

5. Malformed payload no longer stranded the run — validation that
   throws inside the tx rolls back to `running`, and there's no
   stale-run sweep today. Validate `approved_input.window_start` /
   `window_end` BEFORE the transaction; on bad input mark the run
   `failed` (so next_run_at advances normally) and return 400.

Tests: three integration tests covering #1, #3, and #5. The user-scoped
spoof test (#2) needs OAuth device-flow setup that isn't wired up in
the current test fixtures — leaving it for a follow-up that lands the
helper alongside other user-scoped worker tests.

* fix(watchers): pi round-2 — bind workerId from auth, race-free id alloc, no double-advance

A) Bound-worker check now reads `c.var.mcpAuthInfo?.workerId` (PAT/OAuth
   token binding), not body.worker_id. Same-user attacker can't complete
   as another registered worker by lying in the payload.

B) `getNextNumericId` takes a per-table `pg_advisory_xact_lock`
   (`hashtext('<table>_id_alloc')`). Inside `completeWatcherRun`'s tx the
   lock is held until commit, so two concurrent completions on different
   watcher runs serialize on allocation instead of racing on MAX(id)+1.

C) Validation-failure path only advances the schedule when the
   `UPDATE … WHERE status='running'` actually matched a row (RETURNING-gated).
   Two concurrent malformed POSTs no longer double-tick `next_run_at`.

Tests:
- device spoof (Fix A): token bound to worker A, run pinned to worker B,
  body posts worker-B — asserts 403 and run stays running.
- concurrent allocation (Fix B): two completions on different watchers
  fired in parallel — both 200, distinct watcher_windows.id.
- double-advance (Fix C): two malformed POSTs against the same run —
  next_run_at advances only once.
buremba added a commit that referenced this pull request May 18, 2026
…entMessage TTL, persist-then-post

Addresses the four pi-review findings on PR #834.

[HIGH #1] Cross-tenant claim hole — keying by `id` alone let a leaked or
forged question id be consumed by any connection in any org. The claim
SQL now scopes by `(id, organization_id, connection_id, expected_user_id)`
and the migration adds those columns as NOT NULL. Cross-tenant and
cross-connection clicks return null without consuming the row.

[MED #2] Unbounded pendingSentMessages — the old 5-minute timer was
removed by mistake. A per-bridge `setInterval` now sweeps SentMessage
handles older than 24h (matching the DB row TTL) and also drops handles
for any row the scheduled DB sweep deleted. The sweep is unref'd and
cleared on bridge unregister.

[MED #3] Claim-then-auth race — folded into the SQL claim. The
`expected_user_id = $4` predicate means wrong-user clicks never set
`claimed_at`, so process death after the claim can no longer leave the
row in a half-consumed state. The old "claim → check author → async
restash" three-step (and `restashPendingQuestion` itself) is gone.

[LOW #4] Post-then-persist ordering — `onQuestionCreated` now persists
the row first, then posts the card. If the persist fails the card is
never shown; if the post fails after a successful persist the row is
dropped immediately rather than waiting for the 24h sweep.

Red→green coverage: the new test file asserts the scoped claim rejects
cross-tenant, cross-connection, and wrong-user clicks AND that the row
remains claimable by the rightful owner. These same assertions fail on
the pre-fix commit (verified locally against `claimPendingQuestion(id)`).
buremba added a commit that referenced this pull request May 18, 2026
…es (#834)

* refactor(server): move pending interactions to Postgres

Replaces the in-process `Map<questionId, PendingQuestionEntry>` in
`interaction-bridge.ts` with a `public.pending_interactions` table so a
button click landing on pod B can claim a question registered on pod A.

`claimPendingQuestion` is an atomic `UPDATE … SET claimed_at = now()
WHERE id = $1 AND claimed_at IS NULL RETURNING entry_payload` — Slack
webhook retries and racing clicks serialize on the row, the winner gets
the payload, losers see null and no-op.

The non-serializable platform `SentMessage` handle (used to edit the
original card after a click) stays in a small per-pod cache. Losing it
cross-pod only degrades card-edit UX; the answer still routes correctly.

A 24h stale-row sweep piggybacks on the existing scheduled
`sweep-ephemeral-tables` task via `sweepEphemeralTables` in
`core-services.ts`.

Migration is additive (one new table, two indexes). Single-pod behavior
is unchanged.

SSE fan-out (sse-manager.ts) is still per-pod and left for a follow-up
PR.

* fix(server): scope pending-interaction claims by org+conn+user, add SentMessage TTL, persist-then-post

Addresses the four pi-review findings on PR #834.

[HIGH #1] Cross-tenant claim hole — keying by `id` alone let a leaked or
forged question id be consumed by any connection in any org. The claim
SQL now scopes by `(id, organization_id, connection_id, expected_user_id)`
and the migration adds those columns as NOT NULL. Cross-tenant and
cross-connection clicks return null without consuming the row.

[MED #2] Unbounded pendingSentMessages — the old 5-minute timer was
removed by mistake. A per-bridge `setInterval` now sweeps SentMessage
handles older than 24h (matching the DB row TTL) and also drops handles
for any row the scheduled DB sweep deleted. The sweep is unref'd and
cleared on bridge unregister.

[MED #3] Claim-then-auth race — folded into the SQL claim. The
`expected_user_id = $4` predicate means wrong-user clicks never set
`claimed_at`, so process death after the claim can no longer leave the
row in a half-consumed state. The old "claim → check author → async
restash" three-step (and `restashPendingQuestion` itself) is gone.

[LOW #4] Post-then-persist ordering — `onQuestionCreated` now persists
the row first, then posts the card. If the persist fails the card is
never shown; if the post fails after a successful persist the row is
dropped immediately rather than waiting for the 24h sweep.

Red→green coverage: the new test file asserts the scoped claim rejects
cross-tenant, cross-connection, and wrong-user clicks AND that the row
remains claimable by the rightful owner. These same assertions fail on
the pre-fix commit (verified locally against `claimPendingQuestion(id)`).

* chore(schema): collapse double-blank-lines around pending_interactions to match dbmate dump

* fix(test): seed organizationId on makeConnection in connections-platform-isolation

The bridge's new per-tenant guard returns before resolveThread when
connection.organizationId is missing — three tests asserting
instanceChat.channel() got called fell to 0 calls. Fixture now sets
organizationId: "test-org" so the path reaches resolveThread; the
subsequent persist still fails on FK in CI (no org row), but is
caught by the bridge's persist-then-post catch block — which is what
the bridge does in production when persistence drops.
buremba added a commit that referenced this pull request May 19, 2026
…ing_otps

The prior addendum #2 draft proposed a new pairing_otps table + a new
GET /auth/pair route. Over-built. Lobu already has:

- GET /exchange-token (auth/routes.ts:292) — accepts a token, mints a
  session cookie via mintSessionCookieValue, redirects to ?next=URL
  with Referrer-Policy: no-referrer + relative-path-only enforcement.
- resolveDeepLinkToken (routes.ts:266) — accepts PATs or session tokens.
- PersonalAccessTokenService.create/verify/revoke (auth/tokens.ts) —
  full PAT lifecycle.
- personal_access_tokens already has expires_at + revoked_at.

Real delta is a one-column migration:

  ALTER TABLE personal_access_tokens
    ADD COLUMN single_use boolean NOT NULL DEFAULT false;

Plus ~20 LOC in PersonalAccessTokenService.verify to auto-revoke
single-use PATs on first successful verify, plus a tiny new
POST /auth/pair-token route that mints a fresh single-use PAT for
"Pair another device" in the SPA. Everything else reuses existing
plumbing.

Pairing URL becomes:
  /exchange-token?token=owl_pat_...&next=/auth/enrol-credential

No new table. No new auth primitive. Composes with v3 core
(ENCRYPTION_KEY as install operator password) and addendum #1
(passkey + macOS passcode unlock).
buremba added a commit that referenced this pull request May 19, 2026
- ensureInstallOperator now converges on every boot: if first-boot
  personal-org provisioning failed, later boots patch it instead of
  short-circuiting on the existing user row (pi finding #1).
- Preserve the legacy bootstrap-user (pre-PR #902) carve-out in every
  human-count predicate so upgraded installs don't treat it as a real
  human (pi finding #2).
- /api/local-init now fails closed when peerRemoteAddress is missing,
  guarded by LOBU_LOCAL_INIT_ALLOW_MISSING_PEER=1 for tests only (pi
  finding #3).
- Docs realigned to implementation: member-list / admin user-list
  carve-outs explicitly NOT in this PR (intentional, scoped via orgs);
  SPA password-manager mitigations deferred to a follow-up owletto PR;
  CLI flow corrected to use /api/local-init (which mints the worker
  PAT, unlike /api/auth/sign-in/email); hashPassword sourced from
  better-auth/crypto (the actual import) (pi findings #4, #5, #6).
buremba added a commit that referenced this pull request May 19, 2026
)

* feat(auth): install_operator bootstrap — unblock headless installs

Fresh lobu run boots with empty user table; CLI calls /api/local-init;
server says no_user_yet → no SPA available in /tmp / CI / containers.
ensureInstallOperator() auto-provisions a synthetic install_operator
user at first boot whose password is the install's ENCRYPTION_KEY.
A new principal_kind discriminator on the user table keeps the
operator out of human-discovery surfaces (signup count, password
reset, magic link, OAuth account-linking).

Supersedes #917 (closed): that PR went through 5+ design revisions
and accumulated machinery (pairing URL file, single_use PAT, POST
/auth/pair-token, /auth/enrol-credential SPA page, custom OTP table)
that codex review revealed was redundant with better-auth +
browser-native WebAuthn cross-device verification + existing
/api/local-init.

See docs/install-operator-bootstrap.md for the full design.

* docs(a1): explicit Chrome extension via Mac bridge + cross-platform fallback

* fix(auth): address pi review on #923

- ensureInstallOperator now converges on every boot: if first-boot
  personal-org provisioning failed, later boots patch it instead of
  short-circuiting on the existing user row (pi finding #1).
- Preserve the legacy bootstrap-user (pre-PR #902) carve-out in every
  human-count predicate so upgraded installs don't treat it as a real
  human (pi finding #2).
- /api/local-init now fails closed when peerRemoteAddress is missing,
  guarded by LOBU_LOCAL_INIT_ALLOW_MISSING_PEER=1 for tests only (pi
  finding #3).
- Docs realigned to implementation: member-list / admin user-list
  carve-outs explicitly NOT in this PR (intentional, scoped via orgs);
  SPA password-manager mitigations deferred to a follow-up owletto PR;
  CLI flow corrected to use /api/local-init (which mints the worker
  PAT, unlike /api/auth/sign-in/email); hashPassword sourced from
  better-auth/crypto (the actual import) (pi findings #4, #5, #6).

* refactor(auth): trim ceremony in install-operator.ts after audit

Audit found that `HUMAN_KIND`, `NOT_INSTALL_OPERATOR_PREDICATE`, and
`isInstallOperator` had zero production consumers — they were imported
only by the unit test that pinned them against themselves. The carve-out
SQL in `auth/index.tsx`, `auth/config.ts`, and `auth/routes.ts` all
hand-roll `principal_kind <> 'install_operator'` / `=== "install_operator"`
literals; the predicate constant never became a single source of truth.

`installOperatorEmail` was also only exported for the unit test; it's a
2-line helper used in one file, so de-export and shrink.

Drop the redundant `encryptionKey.length === 0` clause (empty string is
already falsy) and inline the `operatorEmail` temporary.

Delete the unit test file outright — every remaining subject is either a
plain string constant or the integration-tested `ensureInstallOperator`.
Coverage for the real behavior lives in the existing integration test
(`__tests__/integration/auth/install-operator.test.ts`), which still
passes.

Net: install-operator.ts 159 → 127 LOC; unit test 64 LOC removed; no
behavior change.

* fix(install_operator): validate ENCRYPTION_KEY shape at bootstrap

Before this fix, `ensureInstallOperator()` only checked that
`ENCRYPTION_KEY` was set, then handed it straight to `hashPassword()` —
which accepts any string. A 24-byte base64 or other non-canonical value
bootstrapped the operator fine, but the at-rest encryption path
(provider keys, secrets) requires a canonical 32-byte base64/hex key and
500s with "ENCRYPTION_KEY must be a canonical base64 or hex encoded
32-byte key" on every save. Net result: user could sign in but couldn't
persist any encrypted secret.

Now the install refuses to start with the same canonical error message
the runtime would emit, so the operator either signs in AND can save
secrets, or fails fast with an actionable hint
(`openssl rand -hex 32` / `openssl rand -base64 32`).

- `packages/core/src/utils/encryption.ts`: extract `decodeEncryptionKey`
  (pure, no side effects) and add `assertEncryptionKey` +
  `ENCRYPTION_KEY_FORMAT_ERROR` so upstream validators can reuse the
  exact same shape check the runtime uses.
- `packages/server/src/auth/install-operator.ts`: call
  `assertEncryptionKey` before `hashPassword`.
- Integration test: switch the existing fixture to a canonical hex key
  and add a new case that asserts a malformed key is rejected and no
  user row is written.

* fix(ci): add principal_kind to QUERYABLE_SCHEMA + reset owletto submodule

- table-schema: register the new user.principal_kind column added by the
  install-operator migration so the drift-detection test stops failing.
- owletto: reset the submodule pointer to the SHA that lobu/main carries
  (4f7c757), since this branch's pin (f611c1d) sits behind a real
  non-bot commit on owletto/main, tripping the Submodule Drift check.
buremba added a commit that referenced this pull request May 19, 2026
buremba added a commit that referenced this pull request May 19, 2026
…resh cache (codex round 2 #3)

Previously TarballFileSource.diffSinceRef() called fetch() before
requireMeta(), so a fresh-cache caller got a full re-ingest delta
(every current file as 'added') instead of the explicit
'source not fetched yet — call fetch() before diffSinceRef()' error
that LocalFileSource and GitFileSource both surface.

Pin the same contract here: check meta first; only then re-fetch
and diff. Adds a test asserting the throw on an uninitialised cache.
buremba added a commit that referenced this pull request May 19, 2026
buremba added a commit that referenced this pull request May 19, 2026
…resh cache (codex round 2 #3)

Previously TarballFileSource.diffSinceRef() called fetch() before
requireMeta(), so a fresh-cache caller got a full re-ingest delta
(every current file as 'added') instead of the explicit
'source not fetched yet — call fetch() before diffSinceRef()' error
that LocalFileSource and GitFileSource both surface.

Pin the same contract here: check meta first; only then re-fetch
and diff. Adds a test asserting the throw on an uninitialised cache.
buremba added a commit that referenced this pull request May 19, 2026
…ingestion (#933)

* feat(connector-sdk): FileSystemSource primitive for filesystem-shape ingestion

Reusable building block for connectors that ingest from a directory-tree-shaped
source: git repos, remote tarballs, and local pre-staged data. Connectors get
a Snapshot with walkFiles/readFile/readText (no rootDir exposure) and a
diffSinceRef() API keyed on an opaque ref the connector persists in its
checkpoint.

Three implementations:

  - GitFileSource: shallow clone via isomorphic-git (depth=1, singleBranch).
    Subsequent fetches do an incremental git.fetch + checkout. diffSinceRef
    walks two trees and classifies blobs by OID.
  - TarballFileSource: streaming download via native fetch, atomic extract
    via 'tar', manifest-based diff (sha256-per-file canonical hash).
  - LocalFileSource: no copy, snapshot points at the live path; manifest is
    still computed and persisted so diffs work across runs.

Cache layout is SDK-managed and opaque:

  ${WORKSPACE_DIR}/.lobu-cache/sources/<sha256(uri)[:32]>/
    ├── snapshot/       (files; rootDir hidden from connector)
    ├── manifest.json   (current ref + per-file sha256)
    ├── refs/<ref>.json (per-ref manifests for diffSinceRef)
    └── meta.json       (uri + kind; verified on cache reuse)

URI scheme handling:
  - git+https://owner/repo.git[@ref]  → GitFileSource (default ref: main)
  - https://...{.tar.gz,.tgz}         → TarballFileSource
  - file:///abs/path/                 → LocalFileSource
  - git+ssh://, ssh://                → rejected (operator-key auth out of scope)
  - git+http://, http://              → rejected (no plaintext clones/fetches)
  - s3://, gs://, azure://, etc.      → rejected with future-scheme message
  - https://...other-ext              → rejected (v1 supports .tar.gz/.tgz only)

Codex review (model_reasoning_effort=high) flagged five things; folded the
blockers into the implementation:

  1. Snapshot immutability (HIGH): mutable on-disk cache could let a Snapshot
     return future bytes for an old ref. Documented the contract clearly
     (consume snapshot before next fetch; ref pins for detection) and added a
     per-source mutex so concurrent fetch()es serialize.
  2. Cache hash collision-safety (MED): bumped sha256(uri) prefix from 16 to
     32 hex chars (64 → 128 bits). meta.json is read on reuse and refuses to
     touch a cache dir whose stored uri doesn't match the requested one.
  3. Manifest diff perf (MED): acknowledged whole-source-scan cost in v1.
     Per-ref manifest accumulation is a known limitation (no GC yet).
  4. URI rejections (LOW): dropped git+http:// from the supported list per
     the review — only git+https:// is accepted.
  5. Snapshot API: confirmed walkFiles/readFile/readText is the right
     abstraction; rootDir stays hidden.

Tests (56 across 5 files; bun:test):
  - URI parser: every accepted scheme + every rejected scheme + edge cases
  - Glob matcher: literal/star/double-star/question + regex-meta escapes
  - LocalFileSource: walk, read, security (refuse ../escape, refuse absolute),
    stable ref on no-op, diff for added/modified/removed, empty-ref fast
    path, unknown-prev-ref fallback
  - TarballFileSource: localhost http server fixture (no external network),
    swap-served-tarball mid-test to exercise diff, 404 handling, cache hit
  - GitFileSource: parseGitUri (including userinfo edge case), real
    isomorphic-git repo built in cache, diff across two commits, missing
    prev-ref error, prev===current empty diff
  - DirectorySnapshot path-escape protection unit tests

Test pass rate: 56/56. `bunx tsc --noEmit` clean in connector-sdk.

* fix(connector-sdk): address pi -p findings on FileSystemSource

Five issues raised in self-review (#933):

P1 — DirectorySnapshot.readFile() could be escaped via a symlink inside
the snapshot (lexical-only path check). Added realpath() resolution that
verifies the canonical target still falls under the canonical root.
Legitimate intra-snapshot symlinks still work; ones pointing outside
throw "resolves outside snapshot root via symlink".

P1 — GitFileSource exposed `.git/` to the connector because the snapshot
root was the clone dir. Added `exclude: isGitInternalPath` so walkFiles
skips `.git` and `.git/**` and readFile/readText refuse them with a
clear "excluded from snapshot" error.

P1 — LocalFileSource self-ingested `.lobu-cache/` when the source root
matched WORKSPACE_DIR — every fetch() mutated the ref and snapshots
leaked manifests. Skip `.lobu-cache` in collectFiles + DirectorySnapshot
exclude so the cache is invisible to the source it sits inside.

P2 — Glob `**/` required at least one directory segment, so `**/*.md`
silently skipped root-level `foo.md` and `docs/**/*.md` skipped
`docs/foo.md`. Now `**/` matches zero or more directory segments —
matches POSIX find / common glob semantics.

P3 — withSourceLock leaked one Map entry per distinct URI because the
finally identity-check compared two freshly-created `.catch(...)`
promises. Store the guarded promise once and compare against the stored
reference.

Added tests:
  - snapshot-security.test.ts: symlink-out-of-root rejected; in-root
    symlink allowed; LocalFileSource self-ingest stability.
  - git-file-source: walkFiles/readFile do not expose `.git/`.
  - glob: docs/**/*.md matches docs/foo.md AND docs/a/b/foo.md, **/*
    matches root-level files, **/*.md matches foo.md.

62/62 tests pass; typecheck clean.

* fix(connector-sdk): enforce HTTPS in Git/Tarball constructors (codex #2)

* fix(connector-sdk): scope .lobu-cache exclusion to actual cache location (codex #3)

* fix(connector-sdk): require fetched meta before diffSinceRef (codex #4)

* fix(connector-sdk): make Snapshot immutable across fetches (codex #1, #5)

- git: read blobs from the captured commit OID via new GitSnapshot, so a
  later fetch() that rewrites HEAD/working tree doesn't leak new bytes
  into older snapshots.
- tarball: install extracted files into a per-ref directory
  (refs/<ref>/), never overwriting; old snapshots stay readable, and a
  failed extraction can't leave a half-written canonical snapshot dir
  behind (folds in finding #5).
- local: materialise an immutable per-ref copy of the source via
  COPYFILE_FICLONE (reflink where supported, full copy otherwise) — a
  hardlink shares the inode and is NOT immutable when the source is
  truncate-rewritten in place.

* fix(connector-sdk): unify unknown-ref contract across all FileSystemSource impls (codex #6)

Git previously threw when prevRef wasn't reachable in the shallow clone;
tarball and local returned a full-reingest delta. Connectors got
divergent shapes for the same logical condition. Align on the local /
tarball behaviour: any unknown prevRef returns every current file in
'added', and the documented contract on FileSystemSource.diffSinceRef
spells that out explicitly.

* fix(connector-sdk): close LocalFileSource fetch race (codex round 2 #1)

Previously fetch() hashed live bytes then copied them, so a write
between hash and copy left snapshot.ref describing pre-write bytes
while snapshot.readFile returned post-write bytes — breaking the
immutability promise.

New pipeline: list paths → copy to staging → hash from staging → if
refs/<ref>/ exists drop staging, else rename staging → refs/<ref>/.
The bytes the Snapshot exposes are the same bytes the ref was hashed
over, regardless of source mutations during fetch().

Test pins this by firing a concurrent rewrite mid-fetch and asserting
sha256(snapshot.readFile bytes) reconstructs snapshot.ref exactly.

* fix(connector-sdk): reject http redirects in tarball + git fetch (codex round 2 #2)

Before: TarballFileSource called fetch(uri, { redirect: 'follow' }), and
GitFileSource used the stock isomorphic-git/http/node impl (simple-get
under the hood) which follows redirects unconditionally. An https
endpoint replying with 'Location: http://...' silently downgraded to
plaintext on either transport.

Tarball: switch to redirect: 'manual' + a 5-hop manual loop that
re-validates https:// on every redirect target.

Git: ship a custom HttpClient (sources/git-http.ts) built on
node:https that follows redirects manually with the same https-only
guard. Pass it to git.clone/fetch instead of the default impl.

Tests pin both: stand up an HTTPS server that 302s to a localhost
HTTP server and assert TarballFileSource / GitFileSource reject with
a plaintext-rejection error and never hit the http endpoint.

* fix(connector-sdk): tarball.diffSinceRef must throw before fetch on fresh cache (codex round 2 #3)

Previously TarballFileSource.diffSinceRef() called fetch() before
requireMeta(), so a fresh-cache caller got a full re-ingest delta
(every current file as 'added') instead of the explicit
'source not fetched yet — call fetch() before diffSinceRef()' error
that LocalFileSource and GitFileSource both surface.

Pin the same contract here: check meta first; only then re-fetch
and diff. Adds a test asserting the throw on an uninitialised cache.

* fix(connector-sdk): prune old per-ref cache dirs to keep N=3 most recent (codex round 2 #4)

Local + tarball sources never pruned refs/<hash>/ directories — a 1GB
tarball changing daily would grow 1GB/day forever. Both source impls
explicitly acknowledged this in code comments.

Add pruneOldRefDirs to TarballFileSource: after installing the new
ref dir, walk refs/, sort by mtime desc, rm-rf everything past the
3rd entry. Always preserve the just-installed dir even if its mtime
happens to be older than a cache-hit branch's. Per-ref manifest JSONs
(<ref>.json) are kept indefinitely so historical diffs still resolve.

LocalFileSource's prune logic + test landed alongside the race fix in
commit 0bfdd45 (it shared the new install pipeline). Here we add the
tarball-side prune + a test that performs 4 distinct-content fetches
and asserts at most 3 ref dirs remain.

GitFileSource is unaffected: it reads from the bare git store which
isomorphic-git's shallow fetch keeps small.

* test(connector-sdk): pin self-signed cert rejection with verification on (codex round 2 #5)

The existing tarball + git tests run with NODE_TLS_REJECT_UNAUTHORIZED='0'
so their self-signed-cert fixtures are reachable. That left a regression
gap: a future change that globally disables cert verification would not
have been caught by the suite.

Add a new test file (tls-verification.test.ts) that spawns a fresh
bun subprocess with NODE_TLS_REJECT_UNAUTHORIZED stripped from the
env, runs fetch() against the self-signed-cert HTTPS server, and
asserts both TarballFileSource and GitFileSource reject with a
cert-related error.

Subprocess isolation is needed because:
  1. Bun's test runner shares one process across files, so the env var
     set by sibling tests leaks across.
  2. Once TLS verification has been disabled in a process via that env
     var, deleting the var does NOT re-enable strict verification —
     the relaxed state appears latched at the TLS module level.

* fix(connector-sdk): seal LocalFileSource hashes during stream-copy + lock per-ref dir read-only (codex round 3 #1)

The previous fetch() did copy-all-then-hash-all. Between the per-file copy
and the per-file hash, an attacker who could discover the in-cache staging
dir could write to staging/a.txt — landing AFTER a.txt was copied but
BEFORE a.txt was hashed — making snap.ref record one byte sequence while
snapshot.readFile() returned a different one. Hash and bytes disagree.

Fix it in two layers:

(a) Staging now lives in os.tmpdir() with a 128-bit random name (via
    mkdtemp) and mode 0700, not under the cache root. The codex repro's
    watcher pattern of scanning refsDir/.staging.* can't even find the
    staging dir, and same-user peers have to defeat dir permissions.

(b) Each file is stream-piped from source through a sha256 hasher AND a
    write stream in a single pass. The manifest hash is SEALED on the
    bytes that landed in staging — there is no second read pass that
    could observe a post-copy mutation.

(c) After rename(staging → per-ref dir), recursively chmod the per-ref
    tree: files → 0400, dirs → 0500. Any post-rename mutation through
    the per-ref path now gets EACCES, pinning the bytes a Snapshot
    reads to the same bytes the hash was sealed over.

pruneOldRefDirs gains an unlock pass — rm -rf would EACCES on dirs in
mode 0500. Test cleanup helpers (forceRm) walk + chmod 0700/0600 before
rm for the same reason.

Test: codex round 3 #1 watcher repro is ported verbatim — 12x 8MB files
to widen the copy window, watcher races staging/a.txt mutation. The
post-fix invariants (snap.ref recomputed externally === snap.ref;
snap.readText('a.txt') === 'ORIGINAL') hold whether the attack window
opened or not.

* fix(connector-sdk): protect local cache-hit ref from pruning (codex round 3 #4)

When fetch() takes the cache-hit branch (refs/<ref> already exists), the
returned Snapshot points at an existing per-ref dir whose mtime hasn't
been touched. If that dir happens to be the oldest under refs/ and the
cache is at MAX_REF_DIRS, the mtime-sort prune deletes it — ENOENTing
the Snapshot we just handed back.

tarball-file-source.ts already solved this with a `protectedRefDir`
parameter (see lines 137 + 228-273). Wire the same plumbing through
LocalFileSource: pass the current refDir into pruneOldRefDirs and skip
it during the prune regardless of mtime ordering.

Test: create 4 distinct refs, force the first one's mtime to year 2000,
restore source content to match the first ref, fetch (cache-hit branch),
assert snap.readText still works — pre-fix it ENOENTs.

* fix(connector-sdk): keep staging inside cache root, fix EXDEV regression (codex round 4 #1)

Round 3 moved LocalFileSource staging to os.tmpdir()/lobu-stage-<rand>/,
then renamed into WORKSPACE_DIR/.lobu-cache/sources/<hash>/refs/<ref>/.
On Linux hosts where /tmp is a separate tmpfs mount from the workspace
dir, rename() throws EXDEV: cross-device link.

Move staging back inside the cache root at refs/<crypto-random-32hex>/.
Same filesystem as the destination per-ref dir, so rename() is atomic.
The crypto-random 32-hex name (no fixed prefix) means an external
readdir of refs/ can't distinguish in-flight staging from completed
per-ref dirs (64-hex sha256) without inspecting each entry.

Prune filters by name shape: only touch 64-hex directories. The
32-hex staging dirs are off-limits to the pruner.

* refactor(connector-sdk): drop chmod immutability theater, document honest contract (codex round 4 #2)

Round 3 chmod'd the per-ref cache dir to 0500 + files to 0400 after
rename, claiming it pinned snapshot.readFile bytes against post-fetch
mutation. Codex round 4 reproduced the flaw: chmod 0600 && writeFile
succeeds because the owner can re-mode their own files. Against the
threat model that matters (same worker UID, or hostile connector code
running in-process), the lock was theater.

Drop the chmod. The actual immutability mechanism is the
stream-seal-during-copy from round 3: each file's sha256 is computed
from the same stream that writes staging, so snap.ref and the bytes
snap.readFile returns at the moment fetch() resolved agree by
construction — even if the source file is rewritten mid-copy. That
fix stands on its own.

Removed:
- lockTreeReadOnly() / unlockTree() helpers
- chmod 0400 write mode in streamCopyAndHash
- forceRm test helper (existed only to undo chmod for rm)
- Test asserting per-ref dir write returns EACCES (assertion no
  longer holds; chmod was the mechanism)

Updated:
- Snapshot JSDoc in file-source.ts now states the honest same-UID
  mutability contract: cache is trusted to the worker, post-fetch
  same-UID mutation will be visible via readFile while ref stays
  pinned to fetch-time bytes
- Cross-process prune race documented as a v1 limitation in both
  file-source.ts and cache.ts withSourceLock
- Round-3 watcher repro test rewritten: it now exercises stream-seal
  by mutating the SOURCE files mid-fetch (the scenario stream-seal
  actually defends against) and still asserts externalHash ===
  snap.ref

* build(cli): declare isomorphic-git + tar runtime deps for FileSystemSource

The CLI bundles packages/server/dist/server.bundle.mjs which now imports
@lobu/connector-sdk → isomorphic-git + tar. dev.test.ts asserts every
non-@Lobu connector-sdk dep is also declared in @lobu/cli's deps so
`npx @lobu/cli run` resolves them from cli's node_modules.

* test(connector-sdk): suppress CodeQL alert on test-only TLS-verify disable

The self-signed TLS fixture sets NODE_TLS_REJECT_UNAUTHORIZED=0 around an
in-process HTTPS server. The production rejection posture is separately
pinned by tls-verification.test.ts which asserts the cert IS rejected with
verification on. Annotate with lgtm[] to silence the CodeQL false positive.
buremba added a commit that referenced this pull request May 19, 2026
Round-2 codex review of PR #940 surfaced three defects in the PAT bridge
added by 10bb63b (`fix(auth): accept owl_pat_ PATs in embedded Agent API
auth bridge`). All three live in the same middleware closure, so fix them
together in one extracted `createLobuAuthBridge()` factory.

#1 (HIGH) — missing tenant-membership check. After PAT verification the
bridge synthesised (user, session) and set `organizationId = patInfo.organizationId`
without checking the user was still a member of that org. The canonical
REST path enforces this at `workspace/multi-tenant.ts:425` and returns
403 `forbidden`. Mirror that: query the `member` table for
`(userId, organizationId)`; reject with the same 403 shape when missing.

#2 (MED) — cookie precedence over invalid PAT. The original ordering
hydrated Better Auth first and only attempted PAT validation inside an
`if (!c.get('user'))` guard. A request carrying both a valid session cookie
and `Authorization: Bearer owl_pat_<bad>` therefore authenticated as the
cookie user and the invalid PAT was never challenged. Reverse the order:
when the `Authorization` header carries `Bearer owl_pat_*`, the PAT path
is authoritative — failure short-circuits with 401 regardless of cookie.
Better Auth only runs when the header is absent or non-PAT.

#3 (MED) — null-org PAT silent re-scoping. `personal_access_tokens.organization_id`
is `ON DELETE SET NULL`; a PAT minted for a since-deleted org would fall
through to `resolveDefaultOrgId(userId)` and silently bind to the user's
earliest membership. Treat PATs with `organizationId === null` as invalid
on this path and return 401 with a message pointing at `lobu token`.

Refactor: extract the bridge from the closure inside `initLobuGateway`
into an exported `createLobuAuthBridge()` factory. The behaviour change
is what the bullets above describe; the factory exists so the next commit
can exercise the bridge from integration tests without bootstrapping the
full gateway.
buremba added a commit that referenced this pull request May 19, 2026
Round-2 codex review of PR #940 noted that existing PAT coverage hits the
MCP routes and the token service, but not the embedded /lobu/api/v1/agents/*
auth bridge introduced by 10bb63b. Add a focused integration suite that
mounts `createLobuAuthBridge` (exported in the previous commit) on a
minimal Hono app and exercises every contract the bridge has to honour.

11 tests across four describe blocks:

  - Happy path: valid PAT → 200, organizationId pinned to the PAT's org.
  - Rejection cases: unknown hash, expired, revoked, missing owl_pat_
    prefix, empty Authorization, non-Bearer scheme — all 401 (with the
    bridge's `invalid_token` shape on actual PATs, and the test handler's
    `no-user` shape on tokens the bridge correctly ignores).
  - Cookie precedence (codex #2): valid session cookie + invalid PAT →
    401 invalid_token, not 200 via cookie fallback.
  - Tenant membership (codex #1): valid PAT for an org the user has been
    removed from → 403 forbidden, mirroring multi-tenant.ts:425. Plus a
    defensive variant for a PAT minted against an org the user never
    joined.
  - Null org PAT (codex #3): valid PAT whose organization_id was set to
    NULL after creation (mirrors the ON DELETE SET NULL collapse path) →
    401 invalid_token, not silent re-resolution to the user's earliest
    membership via resolveDefaultOrgId.

Run with LOBU_TEST_BACKEND=pglite — no external Postgres required.
buremba added a commit that referenced this pull request May 19, 2026
* fix(server): mount /lobu prefix in PGlite assembly (parity with server.ts)

start-local.ts called initLobuGateway() but threw away the returned Hono
app, so the embedded gateway's public Agent API (/lobu/api/v1/agents/*),
worker gateway, MCP proxy, and bundled API docs were all unreachable in
PGlite mode — every call returned 404. server.ts already mounts the same
app at /lobu (PR #637); this aligns the PGlite entrypoint.

Reproducer:
  Before: GET /lobu/health -> 404
  After:  GET /lobu/health -> 200

* fix(auth): accept owl_pat_ PATs in embedded Agent API auth bridge

The lobuApp middleware only hydrated (user, session) from a Better Auth
session (cookie or bearer session-token); owl_pat_* personal access tokens
were ignored, so every /lobu/api/v1/agents/* call authenticated with a PAT
minted by 'lobu token create' (or returned by /api/local-init's
device_token) fell through to the unauthenticated path and the embedded
authProvider returned null. The qmsum-demo benchmark worked around this by
forging a Better Auth session cookie from BETTER_AUTH_SECRET.

Extend the middleware to verify Authorization: Bearer owl_pat_* tokens via
PersonalAccessTokenService.verify, look up the bound user, and synthesize
the same (user, session) shape the Better Auth path produces. The downstream
org-context middleware now honours an org id pinned on the PAT (PAT minted
for org A must run against org A) before falling back to the user's default
membership. This fixes both PGlite and Postgres assemblies — they share
this auth path.

Reproducer (PGlite):
  Before: GET /lobu/api/v1/agents -H 'Authorization: Bearer owl_pat_...' -> 401
  After:  GET /lobu/api/v1/agents -H 'Authorization: Bearer owl_pat_...' -> 200
  After:  GET /lobu/api/v1/agents -H 'Authorization: Bearer owl_pat_BAD' -> 401

* fix(auth): harden embedded Agent API auth bridge (codex #1, #2, #3)

Round-2 codex review of PR #940 surfaced three defects in the PAT bridge
added by 10bb63b (`fix(auth): accept owl_pat_ PATs in embedded Agent API
auth bridge`). All three live in the same middleware closure, so fix them
together in one extracted `createLobuAuthBridge()` factory.

#1 (HIGH) — missing tenant-membership check. After PAT verification the
bridge synthesised (user, session) and set `organizationId = patInfo.organizationId`
without checking the user was still a member of that org. The canonical
REST path enforces this at `workspace/multi-tenant.ts:425` and returns
403 `forbidden`. Mirror that: query the `member` table for
`(userId, organizationId)`; reject with the same 403 shape when missing.

#2 (MED) — cookie precedence over invalid PAT. The original ordering
hydrated Better Auth first and only attempted PAT validation inside an
`if (!c.get('user'))` guard. A request carrying both a valid session cookie
and `Authorization: Bearer owl_pat_<bad>` therefore authenticated as the
cookie user and the invalid PAT was never challenged. Reverse the order:
when the `Authorization` header carries `Bearer owl_pat_*`, the PAT path
is authoritative — failure short-circuits with 401 regardless of cookie.
Better Auth only runs when the header is absent or non-PAT.

#3 (MED) — null-org PAT silent re-scoping. `personal_access_tokens.organization_id`
is `ON DELETE SET NULL`; a PAT minted for a since-deleted org would fall
through to `resolveDefaultOrgId(userId)` and silently bind to the user's
earliest membership. Treat PATs with `organizationId === null` as invalid
on this path and return 401 with a message pointing at `lobu token`.

Refactor: extract the bridge from the closure inside `initLobuGateway`
into an exported `createLobuAuthBridge()` factory. The behaviour change
is what the bullets above describe; the factory exists so the next commit
can exercise the bridge from integration tests without bootstrapping the
full gateway.

* test(auth): cover embedded Agent API auth bridge (codex #4)

Round-2 codex review of PR #940 noted that existing PAT coverage hits the
MCP routes and the token service, but not the embedded /lobu/api/v1/agents/*
auth bridge introduced by 10bb63b. Add a focused integration suite that
mounts `createLobuAuthBridge` (exported in the previous commit) on a
minimal Hono app and exercises every contract the bridge has to honour.

11 tests across four describe blocks:

  - Happy path: valid PAT → 200, organizationId pinned to the PAT's org.
  - Rejection cases: unknown hash, expired, revoked, missing owl_pat_
    prefix, empty Authorization, non-Bearer scheme — all 401 (with the
    bridge's `invalid_token` shape on actual PATs, and the test handler's
    `no-user` shape on tokens the bridge correctly ignores).
  - Cookie precedence (codex #2): valid session cookie + invalid PAT →
    401 invalid_token, not 200 via cookie fallback.
  - Tenant membership (codex #1): valid PAT for an org the user has been
    removed from → 403 forbidden, mirroring multi-tenant.ts:425. Plus a
    defensive variant for a PAT minted against an org the user never
    joined.
  - Null org PAT (codex #3): valid PAT whose organization_id was set to
    NULL after creation (mirrors the ON DELETE SET NULL collapse path) →
    401 invalid_token, not silent re-resolution to the user's earliest
    membership via resolveDefaultOrgId.

Run with LOBU_TEST_BACKEND=pglite — no external Postgres required.

* fix(auth): case-insensitive Bearer scheme parsing per RFC 7235 (codex round-2)

Pre-fix: `Authorization: bearer owl_pat_*` (lowercase scheme) failed the
`header.startsWith('Bearer ')` literal match, so the PAT path was skipped
and the bridge fell through to the Better Auth cookie path — a valid
session cookie would silently mask an invalid/revoked PAT.

RFC 7235 §2.1 makes the auth scheme token case-insensitive. Parse it that
way. Token VALUE comparison stays case-sensitive — PAT hashes are.

* test(auth): cover lowercase Bearer bypass + cookie-only happy path (codex round-2)

Three new tests against `createLobuAuthBridge`:

1. Lowercase `bearer` scheme + invalid PAT + valid session cookie → 401
   (proves the case-insensitive parse + PAT precedence hold together; was
   the evasion gap before the fix).
2. Uppercase `BEARER` scheme + valid PAT → 200 (case-insensitive parse,
   success direction).
3. Cookie-only request (no Authorization header) → bridge reaches
   `next()` instead of short-circuiting with its own 401/403
   (`error: 'invalid_token'` / `error: 'forbidden'` would indicate the
   PAT or membership path mistakenly fired). End-to-end Better Auth cookie
   verification is exercised by entities/member-privacy-contract.test.ts
   via the full app; this minimal harness only owns the bridge contract.

* fix(auth): case-insensitive PAT prefix detection (codex round-3)

Mirror the Bearer scheme fix at the inner prefix check: `Bearer OWL_PAT_*`
now flows through PAT validation instead of falling through to cookie
auth. Token value handed to verify() is unchanged — PAT hashes stay
byte-exact.
buremba added a commit that referenced this pull request May 26, 2026
…th (#1069)

* fix(embeddings): version-stamp embeddings and batch the sync embed path

Finding #3 (HIGH): embeddings were never version-stamped, so swapping
EMBEDDINGS_MODEL to a different same-dimension model silently mixed
incompatible vector spaces with no detection. The connector-worker
discarded the service-reported `model`; event_embeddings had no model
column.

- Add `embedding_model text` to event_embeddings (migration
  20260526120000) plus a column comment.
- Thread the model stamp through the pipeline: capture the service
  `model`, FAIL LOUD via resolveServiceModel() when it differs from the
  worker's expected model (equal dimensionality is not enough), and
  persist it via ContentItem.embedding_model and
  CompleteEmbeddingsRequest. Both server INSERT paths (worker-api
  completeEmbeddings + insert-event upsertEmbedding) write the stamp;
  legacy/omitted stamps store NULL.

Finding #12 (MED, perf): the sync embedding path generated one embedding
per event (one HTTP round-trip / ONNX pass each). Accumulate a chunk's
texts and call batchGenerateEmbeddings once, mapping vectors back to each
event by index; empty-text events get no vector and a batch failure
fails open (items stream without embeddings), matching the prior
behaviour.

Reproducers:
- embeddings-model-stamp.test.ts: resolveServiceModel rejects a
  same-dimension mismatch and resolves the stamp otherwise.
- executor-batch-embed.test.ts: one chunk -> exactly one batch call with
  vectors + stamp mapped back per event.
- events/embedding-model-stamp.test.ts (integration): embedding_model
  round-trips through insertEvent; NULL when unsupplied.

* fix(embeddings): scope similarity + backfill by model stamp

Close the loop on the version stamp: persisting embedding_model is not
enough — vector search and the backfill trigger must also scope by it,
or a same-dimension model swap still compares the query against stale
vectors from another model and never re-embeds them.

- current_event_records now exposes emb.embedding_model (migration
  recreates the view, column appended at the end; down-migration
  restores the prior shape).
- content-search scopes every <=> comparison to the configured model
  (NULL = legacy, assumed current): matchCondition, similarity /
  combined-score exprs, and the candidate (recall) vector branch. The
  filtered_ids CTEs carry embedding_model so fi.* references resolve.
  Configured model is inlined as a validated SQL literal
  (configuredEmbeddingModelSqlLiteral), avoiding param-index surgery in
  the hot query builder.
- trigger-embed-backfill treats rows whose stamp differs from the
  configured model as needing backfill (not only missing rows);
  fetchEventsForEmbedding returns them too.
- completeEmbeddings + insert-event upsertEmbedding REPLACE a stale-model
  row on conflict (DO UPDATE ... WHERE model IS DISTINCT FROM), idempotent
  for same-model re-submits.

E2E reproducer (embedding-model-swap-e2e.test.ts): ingest under model A,
switch configured model to same-dimension model B; under B the row is
excluded from both the main and candidate search paths and flagged stale
by the backfill query, while under A it is still returned and not stale.
RED against the unscoped query (row leaked under B); GREEN after scoping.

* fix(embeddings): treat NULL stamps as non-comparable + guard query model

Address the review blockers: a NULL embedding_model (legacy row written
before stamping) has an UNKNOWN true model, so it must not be assumed to
match the configured model.

- content-search modelScopeFor now requires an EXACT match
  (embedding_model = configured); NULL rows are excluded from vector
  comparison until restamped. They remain reachable by text search.
- trigger-embed-backfill + worker-api fetchEventsForEmbedding now treat
  NULL as stale via `IS DISTINCT FROM`, so the backfill restamps legacy
  rows (self-healing; no permanent vector-search blackout).
- server-side generateEmbeddings (used to embed the search query) now
  FAILS LOUD when the embeddings service reports a model different from
  the configured one, instead of only logging — a wrong-model query
  vector must never be compared against model-scoped rows.
- configuredEmbeddingModelSqlLiteral validates the model against the
  service's name allowlist before inlining (defense-in-depth).

Tests:
- createTestEvent now stamps the configured model by default (mirrors
  real ingestion; pass embedding_model: null to simulate a legacy row),
  so existing vector-search fixtures stay searchable under exact scoping.
- embedding-model-swap-e2e adds a NULL-stamp case: a legacy row is
  excluded from vector search and flagged stale by the backfill.
- new unit embeddings-model-guard.test.ts: generateEmbeddings rejects a
  service model mismatch; configuredEmbeddingModelSqlLiteral rejects an
  unsafe identifier.

* fix(embeddings): stamp benchmark adapter embeddings with configured model

The memory-benchmark adapter inserted event_embeddings without a model
stamp, so under exact model-scoped vector search its rows would be
NULL-stamped and invisible to recall. Stamp them with the configured
model, consistent with real ingestion.

* test(server): drive real completeEmbeddings handler for stale-model replace + idempotency

Closes the coverage gap flagged in pre-merge review: the prior test hand-rolled
the upsert SQL instead of calling the handler, so a regression of completeEmbeddings
to ON CONFLICT DO NOTHING would have stayed green. New test invokes the real
handler via a minimal Context and asserts updated=1 on a stale-model replace and
updated=0 on a same-model re-submit (idempotent).

* fix(server): make embedding_model down-migration rollback-safe

CREATE OR REPLACE VIEW cannot remove a column from an existing view in
Postgres, so the prior down path would fail on rollback. Drop and recreate
current_event_records without embedding_model, then DROP COLUMN. (pi review
bugs:1 finding on the rollback-only path; up path was unaffected.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant