Skip to content

fix: Improve local deployment and worker execution#12

Merged
buremba merged 2 commits into
mainfrom
bug/local-deploy
Aug 15, 2025
Merged

fix: Improve local deployment and worker execution#12
buremba merged 2 commits into
mainfrom
bug/local-deploy

Conversation

@kirnap
Copy link
Copy Markdown
Contributor

@kirnap kirnap commented Aug 13, 2025

Summary

  • Fix Claude CLI accessibility for non-root users in worker container by adding proper PATH exports
  • Update service account name from claude-worker to peerbot for consistency
  • Fix variable name typo in entrypoint.sh ($version -> $claude_version)
  • Switch from node to bun for worker execution

Test plan

  • Verify worker container starts successfully with bun runtime
  • Confirm Claude CLI is accessible to non-root users
  • Test local deployment with updated service account name
  • Validate entrypoint script runs without variable errors

🤖 Generated with Claude Code

- Fix Claude CLI accessibility for non-root users in worker container
- Update service account name from claude-worker to peerbot
- Fix variable name typo in entrypoint.sh
- Use bun instead of node for worker execution

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Aug 13, 2025

Claude finished @kirnap's task —— View job


Code Review Complete ✅

I've reviewed the PR changes and here's my comprehensive feedback:

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Analyze Dockerfile.worker changes
  • Analyze job-manager.ts changes
  • Analyze entrypoint.sh changes
  • Provide comprehensive review feedback
  • Update comment with final review

Code Quality & Best Practices ✅

Dockerfile.worker (/Dockerfile.worker):

  • Good: Proper multi-stage approach with dependency installation and build steps
  • Good: Switching to bun runtime aligns with modern Node.js alternatives
  • Good: Creating symlinks and setting proper permissions for Claude CLI access
  • Good: Non-root user setup follows security best practices
  • Improvement: Lines 32-33 export PATH in .bashrc, but the container uses ENTRYPOINT which may not source bashrc. Consider adding these exports to the global environment instead.

job-manager.ts (packages/dispatcher/src/kubernetes/job-manager.ts:471):

  • Good: Service account name change from claude-worker to peerbot improves consistency
  • Good: Clean, single-line change with clear intent

entrypoint.sh (packages/worker/scripts/entrypoint.sh:119):

  • Good: Fixed variable reference from $version to $claude_version - this resolves the typo
  • Good: Comprehensive tool validation and environment setup
  • Good: Proper error handling with exit codes

Potential Bugs & Issues ⚠️

  1. PATH Export Issue (Dockerfile.worker:32-33): The PATH exports to .bashrc may not be effective since containers typically don't source interactive shell configs. The current symlink approach in line 25 should work, but consider adding:

    ENV PATH="/root/.bun/bin:$PATH"
  2. Tool Version Check (entrypoint.sh:109): The head -1 pipe might truncate multiline version outputs. Consider using head -n1 for better portability.

Performance Considerations 🚀

  • Positive: Switching from Node.js to bun should improve startup time and memory usage
  • Positive: Proper resource limits defined in job manifest (lines 361-369 in job-manager.ts)
  • Good: TTL cleanup configuration (line 336) prevents job accumulation

Security Assessment 🔒

Strengths:

  • Non-root user execution (Dockerfile.worker:69)
  • Proper file permissions on workspace (entrypoint.sh:81)
  • Environment variable validation (entrypoint.sh:25-45)
  • Secret management via Kubernetes secrets (job-manager.ts:422-447)

Concerns:

  • Environment variable CLAUDE_CODE_DANGEROUSLY_SKIP_PERMISSIONS (job-manager.ts:449-450) should be documented why this is necessary and if it introduces security risks

Test Coverage Recommendations 🧪

The PR test plan is comprehensive but could be enhanced:

Current test plan covers:

  • Worker container startup with bun
  • Claude CLI accessibility
  • Service account name validation
  • Entrypoint script execution

Additional test suggestions:

  • Test Claude CLI functionality within the container (not just accessibility)
  • Verify bun can execute the compiled TypeScript output
  • Test environment variable validation edge cases
  • Verify workspace permissions work correctly for non-root user

Overall Assessment ⭐

This is a well-structured PR that addresses specific deployment issues:

Strengths:

  • Clear, focused changes that solve specific problems
  • Good security practices maintained
  • Proper error handling and validation
  • Performance improvement with bun runtime

⚠️ Minor Issues:

  • PATH export mechanism could be more reliable
  • Security flag needs documentation

Recommendation: Approve with minor suggestions. The changes are solid and address the stated issues effectively. Consider addressing the PATH export concern for more robust Claude CLI access.


@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Aug 13, 2025

Claude encountered an error —— View job


I'll analyze this and get back to you.

@buremba
Copy link
Copy Markdown
Member

buremba commented Aug 15, 2025

@claude apply changes in this branch

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Aug 15, 2025

Claude finished @buremba's task —— View job


I've successfully applied and improved the changes in this branch! ✅

Todo List:

  • Read repository guidelines (CLAUDE.md)
  • Analyze current changes in the branch
  • Review previous Claude feedback for identified issues
  • Test the bot with ./test-bot.js (requires manual approval due to external API calls)
  • Address potential PATH export issue identified in previous review
  • Validate and improve the deployment configuration

Changes Applied

Original PR improvements validated:

  1. Dockerfile.worker: Proper bun runtime setup with Claude CLI access
  2. job-manager.ts:471: Service account updated to peerbot for consistency
  3. entrypoint.sh:119: Fixed variable name $claude_version

Additional improvement applied:

  • Enhanced Claude CLI PATH accessibility (Dockerfile.worker): Added explicit ENV variables to ensure Claude CLI is accessible to non-root users through both bashrc and container environment, addressing the compatibility issue identified in the previous review.

Summary

The branch now includes:

  • ✅ Bun runtime integration for improved performance
  • ✅ Proper Claude CLI accessibility for non-root users (enhanced)
  • ✅ Consistent service account naming (peerbot)
  • ✅ Fixed entrypoint script variable references
  • ✅ Robust PATH configuration for container compatibility

The changes follow the repository's guidelines and deployment instructions. The bot should now work properly with the improved local deployment setup and worker execution environment.

Note: I couldn't run ./test-bot.js due to external API restrictions, but the code changes align with the test plan requirements and should ensure proper functionality.


Add ENV variables to ensure Claude CLI is accessible to non-root users
through both bashrc and container environment, addressing container
compatibility issues.

Co-authored-by: Burak Emre Kabakcı <buremba@users.noreply.github.com>
@buremba buremba merged commit 645c2f1 into main Aug 15, 2025
3 of 17 checks passed
@buremba buremba deleted the bug/local-deploy branch February 25, 2026 22:02
buremba added a commit that referenced this pull request May 26, 2026
…th (#1069)

* fix(embeddings): version-stamp embeddings and batch the sync embed path

Finding #3 (HIGH): embeddings were never version-stamped, so swapping
EMBEDDINGS_MODEL to a different same-dimension model silently mixed
incompatible vector spaces with no detection. The connector-worker
discarded the service-reported `model`; event_embeddings had no model
column.

- Add `embedding_model text` to event_embeddings (migration
  20260526120000) plus a column comment.
- Thread the model stamp through the pipeline: capture the service
  `model`, FAIL LOUD via resolveServiceModel() when it differs from the
  worker's expected model (equal dimensionality is not enough), and
  persist it via ContentItem.embedding_model and
  CompleteEmbeddingsRequest. Both server INSERT paths (worker-api
  completeEmbeddings + insert-event upsertEmbedding) write the stamp;
  legacy/omitted stamps store NULL.

Finding #12 (MED, perf): the sync embedding path generated one embedding
per event (one HTTP round-trip / ONNX pass each). Accumulate a chunk's
texts and call batchGenerateEmbeddings once, mapping vectors back to each
event by index; empty-text events get no vector and a batch failure
fails open (items stream without embeddings), matching the prior
behaviour.

Reproducers:
- embeddings-model-stamp.test.ts: resolveServiceModel rejects a
  same-dimension mismatch and resolves the stamp otherwise.
- executor-batch-embed.test.ts: one chunk -> exactly one batch call with
  vectors + stamp mapped back per event.
- events/embedding-model-stamp.test.ts (integration): embedding_model
  round-trips through insertEvent; NULL when unsupplied.

* fix(embeddings): scope similarity + backfill by model stamp

Close the loop on the version stamp: persisting embedding_model is not
enough — vector search and the backfill trigger must also scope by it,
or a same-dimension model swap still compares the query against stale
vectors from another model and never re-embeds them.

- current_event_records now exposes emb.embedding_model (migration
  recreates the view, column appended at the end; down-migration
  restores the prior shape).
- content-search scopes every <=> comparison to the configured model
  (NULL = legacy, assumed current): matchCondition, similarity /
  combined-score exprs, and the candidate (recall) vector branch. The
  filtered_ids CTEs carry embedding_model so fi.* references resolve.
  Configured model is inlined as a validated SQL literal
  (configuredEmbeddingModelSqlLiteral), avoiding param-index surgery in
  the hot query builder.
- trigger-embed-backfill treats rows whose stamp differs from the
  configured model as needing backfill (not only missing rows);
  fetchEventsForEmbedding returns them too.
- completeEmbeddings + insert-event upsertEmbedding REPLACE a stale-model
  row on conflict (DO UPDATE ... WHERE model IS DISTINCT FROM), idempotent
  for same-model re-submits.

E2E reproducer (embedding-model-swap-e2e.test.ts): ingest under model A,
switch configured model to same-dimension model B; under B the row is
excluded from both the main and candidate search paths and flagged stale
by the backfill query, while under A it is still returned and not stale.
RED against the unscoped query (row leaked under B); GREEN after scoping.

* fix(embeddings): treat NULL stamps as non-comparable + guard query model

Address the review blockers: a NULL embedding_model (legacy row written
before stamping) has an UNKNOWN true model, so it must not be assumed to
match the configured model.

- content-search modelScopeFor now requires an EXACT match
  (embedding_model = configured); NULL rows are excluded from vector
  comparison until restamped. They remain reachable by text search.
- trigger-embed-backfill + worker-api fetchEventsForEmbedding now treat
  NULL as stale via `IS DISTINCT FROM`, so the backfill restamps legacy
  rows (self-healing; no permanent vector-search blackout).
- server-side generateEmbeddings (used to embed the search query) now
  FAILS LOUD when the embeddings service reports a model different from
  the configured one, instead of only logging — a wrong-model query
  vector must never be compared against model-scoped rows.
- configuredEmbeddingModelSqlLiteral validates the model against the
  service's name allowlist before inlining (defense-in-depth).

Tests:
- createTestEvent now stamps the configured model by default (mirrors
  real ingestion; pass embedding_model: null to simulate a legacy row),
  so existing vector-search fixtures stay searchable under exact scoping.
- embedding-model-swap-e2e adds a NULL-stamp case: a legacy row is
  excluded from vector search and flagged stale by the backfill.
- new unit embeddings-model-guard.test.ts: generateEmbeddings rejects a
  service model mismatch; configuredEmbeddingModelSqlLiteral rejects an
  unsafe identifier.

* fix(embeddings): stamp benchmark adapter embeddings with configured model

The memory-benchmark adapter inserted event_embeddings without a model
stamp, so under exact model-scoped vector search its rows would be
NULL-stamped and invisible to recall. Stamp them with the configured
model, consistent with real ingestion.

* test(server): drive real completeEmbeddings handler for stale-model replace + idempotency

Closes the coverage gap flagged in pre-merge review: the prior test hand-rolled
the upsert SQL instead of calling the handler, so a regression of completeEmbeddings
to ON CONFLICT DO NOTHING would have stayed green. New test invokes the real
handler via a minimal Context and asserts updated=1 on a stale-model replace and
updated=0 on a same-model re-submit (idempotent).

* fix(server): make embedding_model down-migration rollback-safe

CREATE OR REPLACE VIEW cannot remove a column from an existing view in
Postgres, so the prior down path would fail on rollback. Drop and recreate
current_event_records without embedding_model, then DROP COLUMN. (pi review
bugs:1 finding on the rollback-only path; up path was unaffected.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants