fix: Improve local deployment and worker execution#12
Conversation
- Fix Claude CLI accessibility for non-root users in worker container - Update service account name from claude-worker to peerbot - Fix variable name typo in entrypoint.sh - Use bun instead of node for worker execution 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @kirnap's task —— View job Code Review Complete ✅I've reviewed the PR changes and here's my comprehensive feedback: Todo List:
Code Quality & Best Practices ✅Dockerfile.worker (
job-manager.ts (
entrypoint.sh (
Potential Bugs & Issues
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude apply changes in this branch |
|
Claude finished @buremba's task —— View job I've successfully applied and improved the changes in this branch! ✅ Todo List:
Changes Applied✅ Original PR improvements validated:
✅ Additional improvement applied:
SummaryThe branch now includes:
The changes follow the repository's guidelines and deployment instructions. The bot should now work properly with the improved local deployment setup and worker execution environment. Note: I couldn't run |
Add ENV variables to ensure Claude CLI is accessible to non-root users through both bashrc and container environment, addressing container compatibility issues. Co-authored-by: Burak Emre Kabakcı <buremba@users.noreply.github.com>
…th (#1069) * fix(embeddings): version-stamp embeddings and batch the sync embed path Finding #3 (HIGH): embeddings were never version-stamped, so swapping EMBEDDINGS_MODEL to a different same-dimension model silently mixed incompatible vector spaces with no detection. The connector-worker discarded the service-reported `model`; event_embeddings had no model column. - Add `embedding_model text` to event_embeddings (migration 20260526120000) plus a column comment. - Thread the model stamp through the pipeline: capture the service `model`, FAIL LOUD via resolveServiceModel() when it differs from the worker's expected model (equal dimensionality is not enough), and persist it via ContentItem.embedding_model and CompleteEmbeddingsRequest. Both server INSERT paths (worker-api completeEmbeddings + insert-event upsertEmbedding) write the stamp; legacy/omitted stamps store NULL. Finding #12 (MED, perf): the sync embedding path generated one embedding per event (one HTTP round-trip / ONNX pass each). Accumulate a chunk's texts and call batchGenerateEmbeddings once, mapping vectors back to each event by index; empty-text events get no vector and a batch failure fails open (items stream without embeddings), matching the prior behaviour. Reproducers: - embeddings-model-stamp.test.ts: resolveServiceModel rejects a same-dimension mismatch and resolves the stamp otherwise. - executor-batch-embed.test.ts: one chunk -> exactly one batch call with vectors + stamp mapped back per event. - events/embedding-model-stamp.test.ts (integration): embedding_model round-trips through insertEvent; NULL when unsupplied. * fix(embeddings): scope similarity + backfill by model stamp Close the loop on the version stamp: persisting embedding_model is not enough — vector search and the backfill trigger must also scope by it, or a same-dimension model swap still compares the query against stale vectors from another model and never re-embeds them. - current_event_records now exposes emb.embedding_model (migration recreates the view, column appended at the end; down-migration restores the prior shape). - content-search scopes every <=> comparison to the configured model (NULL = legacy, assumed current): matchCondition, similarity / combined-score exprs, and the candidate (recall) vector branch. The filtered_ids CTEs carry embedding_model so fi.* references resolve. Configured model is inlined as a validated SQL literal (configuredEmbeddingModelSqlLiteral), avoiding param-index surgery in the hot query builder. - trigger-embed-backfill treats rows whose stamp differs from the configured model as needing backfill (not only missing rows); fetchEventsForEmbedding returns them too. - completeEmbeddings + insert-event upsertEmbedding REPLACE a stale-model row on conflict (DO UPDATE ... WHERE model IS DISTINCT FROM), idempotent for same-model re-submits. E2E reproducer (embedding-model-swap-e2e.test.ts): ingest under model A, switch configured model to same-dimension model B; under B the row is excluded from both the main and candidate search paths and flagged stale by the backfill query, while under A it is still returned and not stale. RED against the unscoped query (row leaked under B); GREEN after scoping. * fix(embeddings): treat NULL stamps as non-comparable + guard query model Address the review blockers: a NULL embedding_model (legacy row written before stamping) has an UNKNOWN true model, so it must not be assumed to match the configured model. - content-search modelScopeFor now requires an EXACT match (embedding_model = configured); NULL rows are excluded from vector comparison until restamped. They remain reachable by text search. - trigger-embed-backfill + worker-api fetchEventsForEmbedding now treat NULL as stale via `IS DISTINCT FROM`, so the backfill restamps legacy rows (self-healing; no permanent vector-search blackout). - server-side generateEmbeddings (used to embed the search query) now FAILS LOUD when the embeddings service reports a model different from the configured one, instead of only logging — a wrong-model query vector must never be compared against model-scoped rows. - configuredEmbeddingModelSqlLiteral validates the model against the service's name allowlist before inlining (defense-in-depth). Tests: - createTestEvent now stamps the configured model by default (mirrors real ingestion; pass embedding_model: null to simulate a legacy row), so existing vector-search fixtures stay searchable under exact scoping. - embedding-model-swap-e2e adds a NULL-stamp case: a legacy row is excluded from vector search and flagged stale by the backfill. - new unit embeddings-model-guard.test.ts: generateEmbeddings rejects a service model mismatch; configuredEmbeddingModelSqlLiteral rejects an unsafe identifier. * fix(embeddings): stamp benchmark adapter embeddings with configured model The memory-benchmark adapter inserted event_embeddings without a model stamp, so under exact model-scoped vector search its rows would be NULL-stamped and invisible to recall. Stamp them with the configured model, consistent with real ingestion. * test(server): drive real completeEmbeddings handler for stale-model replace + idempotency Closes the coverage gap flagged in pre-merge review: the prior test hand-rolled the upsert SQL instead of calling the handler, so a regression of completeEmbeddings to ON CONFLICT DO NOTHING would have stayed green. New test invokes the real handler via a minimal Context and asserts updated=1 on a stale-model replace and updated=0 on a same-model re-submit (idempotent). * fix(server): make embedding_model down-migration rollback-safe CREATE OR REPLACE VIEW cannot remove a column from an existing view in Postgres, so the prior down path would fail on rollback. Drop and recreate current_event_records without embedding_model, then DROP COLUMN. (pi review bugs:1 finding on the rollback-only path; up path was unaffected.)
Summary
Test plan
🤖 Generated with Claude Code