chore(pgvector-embedded): rebuild prebuilt artifacts (v0.8.1)#977
chore(pgvector-embedded): rebuild prebuilt artifacts (v0.8.1)#977github-actions[bot] wants to merge 8 commits into
Conversation
Foundation for user-declared connector dependencies: - SDK: add `runtime.nix.packages` to ConnectorRuntimeInfo so a connector declares its native (nixpkgs) tools; rides the existing `runtime` JSONB, no migration. npm deps are bundled at compile time and don't go here. - Compile: externalize `@lobu/connector-sdk` (+ `lobu` alias) instead of bundling it. The SDK pulls a large infra graph (Sentry/OTel/grpc/git) transitively, inflating every connector to multiple MB; the runtime provides the SDK (it's a connector-worker dep), so the bundle leaves it as a runtime-resolved import — the standard externalize-the-framework pattern. Bundles drop from ~MB to the connector's own code + deps. Also emit an inline source map (sourcesContent:false) for stack traces. - Executor: when a connector declares nix packages, wrap the child in `nix-shell -p <pkgs> --run "exec node ..."` so the tools are on PATH; `exec` preserves the IPC channel and kill semantics. Plain fork() stays the path when no packages are declared. Fail with a clear actionable error when nix-shell is absent but packages are required.
Threads a connector's declared native packages from storage to the executor so they're on PATH during a run: - worker-api poll: join connector_definitions and surface its `runtime`, emitting `nix_packages` in the poll response (the `runtime.nix.packages` the SDK extraction already persists in the existing runtime JSONB — no storage change needed). - daemon: carry `nix_packages` on PollResponse and pass it through the three executeCompiledConnector call sites (sync/action/auth). - executeCompiledConnector: forward `nixPackages` to executor.execute, which wraps the child in nix-shell when non-empty. Connectors that declare no native deps are unaffected (plain fork path).
…ckaging Connector npm deps now work in the apply→cloud path: - apply: project-supplied connectors are compiled on the CLI (the only place the project's node_modules exists, so esbuild can bundle the connector's declared deps) and uploaded as a pre-compiled bundle (`compiled: true`); the server stores the artifact instead of recompiling source it can't resolve deps for. - ensure-deps-installed: runs `bun install --ignore-scripts` in the connector's project (resolved via the nearest `lobu.toml`, so a connector inside a monorepo never installs the wrong root); no-ops when the project declares no package.json. - client.installConnector gains a `compiled` flag. - lobu init scaffolds package.json (with @lobu/connector-sdk devDependency for editor types) + tsconfig.json + connectors/. - AGENTS.md documents the convention: npm = bundled at compile, native = nix at run time; SDK is runtime-provided/externalized.
Keep esbuild + connector-worker + SDK out of apply-cmd's module-load path (only `lobu apply` with local *.connector.ts needs it). Documented in the AGENTS.md dynamic-import allow-list. Matches connector-run-cmd's pattern.
…oin, init merge - apply: compile `def.sourcePath` (the actual `.ts`) for local connectors, not `def.sourceFile` (an error-message label that may point at a `type: connector` YAML). `source_url` connectors (source fetched into `sourceCode`, no local deps) upload raw for gateway compile instead of being compiled locally. - worker-api: filter the connector_definitions join to `status = 'active'` so it matches the partial unique index `idx_connector_defs_org_key` — archived/draft rows share `(key, org)` and made the runtime lookup nondeterministic. - init: merge into an existing `package.json` (preserve the user's fields, add the SDK devDependency) and never overwrite an existing `tsconfig.json`, so `--here` into an existing project is non-destructive. - docs: note `bun install --ignore-scripts` (the actual command).
📝 WalkthroughWalkthroughThis PR adds local connector compilation during ChangesLocal Connector Compilation with Runtime Nix Dependencies
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI apply
participant DepMgr as ensure-deps-installed
participant Compiler as connector-worker compile
participant ApplyClient as ApplyClient
participant Server as Server
participant Daemon as Worker daemon
participant Executor as SubprocessExecutor
CLI->>DepMgr: ensure deps for sourcePath
DepMgr->>DepMgr: locate lobu.toml, check stale
DepMgr->>DepMgr: bun install --ignore-scripts
CLI->>Compiler: compile .connector.ts file
Compiler->>Compiler: externalize `@lobu/connector-sdk`
Compiler-->>CLI: compiled sourceCode
CLI->>ApplyClient: installConnector(sourceCode, compiled: true)
ApplyClient->>Server: install_connector request
Server->>Server: store sourceCode + compiled flag
Daemon->>Server: pollWorkerJob
Server->>Server: LEFT JOIN connector_definitions for runtime.nix
Server-->>Daemon: PollResponse with nix_packages
Daemon->>Executor: executeCompiledConnector(nixPackages)
Executor->>Executor: validate nix-shell available
Executor->>Executor: spawn('nix-shell', exec node runner)
Executor->>Executor: child inherits nixPackages on PATH
Executor-->>Daemon: ExecutorResult
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/connector-worker/src/executor/subprocess.ts (1)
201-224: 💤 Low valueConsider quoting
execArgvelements for defense-in-depth.The
execArgvarray contents are currently safe (controlled values like--max-old-space-size=512), but they're joined into thenix-shell --runcommand without quoting. IfexecArgvever accepts external input, this could become a shell injection vector.For now this is low risk since
execArgvis constructed internally, but you could defensively quote all arguments:🛡️ Optional defensive fix
- const nodeCmd = ['exec', 'node', ...execArgv, shellQuote(childRunnerPath)].join(' '); + const nodeCmd = ['exec', 'node', ...execArgv.map(shellQuote), shellQuote(childRunnerPath)].join(' ');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/connector-worker/src/executor/subprocess.ts` around lines 201 - 224, The nodeCmd string used for nix-shell --run interpolates execArgv unquoted, which could be a shell-injection risk if execArgv ever contains external input; update the nodeCmd construction in the spawn branch so each execArgv element is passed through the existing shellQuote helper (like shellQuote(execArgv[i])) before joining, ensuring execArgv, childRunnerPath and other parts are properly quoted when building nodeCmd used in nixArgs (--run).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/cli/src/commands/_lib/ensure-deps-installed.ts`:
- Around line 35-42: The installIsStale function currently only compares
bun.lock vs node_modules; update it to also consider package.json modifications:
obtain the mtimeMs for package.json (join(root, "package.json")) and treat the
install as stale if package.json.mtimeMs > node_modules.mtimeMs (or if bun.lock
exists and bun.lock.mtimeMs > node_modules.mtimeMs); if package.json is missing
keep existing behavior, and preserve the try/catch around statSync calls for
node_modules, bun.lock, and package.json when determining staleness in
installIsStale.
In `@packages/cli/src/commands/init.ts`:
- Around line 720-726: The current broad catch around JSON.parse(await
readFile(pkgJsonPath, "utf-8")) masks parse, permission, and other I/O errors
and treats them as "missing file"; change the logic so you first attempt to read
the file and only treat ENOENT (file-not-found) as the condition to create the
fallback pkgJson, while rethrowing or surfaceing other errors (e.g., permission
errors or invalid JSON). Concretely: isolate the readFile call and catch only
errors whose code === "ENOENT" to set pkgJson = { ...fallback... }; run
JSON.parse in its own try and handle JSON parse errors separately (log and exit
or rethrow). Apply the same tightening of error checks to the similar catch
block around the scaffold/write flow referenced near the other catch (lines
~741-744) so you never overwrite an existing project on non-ENOENT errors.
In `@packages/server/src/worker-api.ts`:
- Around line 504-506: The LEFT JOIN to connector_definitions (alias cd) using
cd.key = r.connector_key and cd.organization_id = r.organization_id with
cd.status = 'active' can produce duplicate rows if multiple active rows exist;
change this join to select a single connector_definition per r by using a
LATERAL subquery (or an ordered subquery with ORDER BY ... LIMIT 1) that filters
status = 'active' and matches r.connector_key and r.organization_id (similar to
the strategy used in the claim CTE), ensuring only one cd row is returned per r
to avoid duplication.
---
Nitpick comments:
In `@packages/connector-worker/src/executor/subprocess.ts`:
- Around line 201-224: The nodeCmd string used for nix-shell --run interpolates
execArgv unquoted, which could be a shell-injection risk if execArgv ever
contains external input; update the nodeCmd construction in the spawn branch so
each execArgv element is passed through the existing shellQuote helper (like
shellQuote(execArgv[i])) before joining, ensuring execArgv, childRunnerPath and
other parts are properly quoted when building nodeCmd used in nixArgs (--run).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: a01dae13-cc9c-4755-a986-a93349395ae2
⛔ Files ignored due to path filters (4)
packages/pgvector-embedded/prebuilt/darwin-arm64/vector.dylibis excluded by!**/*.dylibpackages/pgvector-embedded/prebuilt/darwin-x64/vector.dylibis excluded by!**/*.dylibpackages/pgvector-embedded/prebuilt/linux-arm64/vector.sois excluded by!**/*.sopackages/pgvector-embedded/prebuilt/linux-x64/vector.sois excluded by!**/*.so
📒 Files selected for processing (13)
AGENTS.mdpackages/cli/src/commands/_lib/apply/apply-cmd.tspackages/cli/src/commands/_lib/apply/client.tspackages/cli/src/commands/_lib/ensure-deps-installed.tspackages/cli/src/commands/init.tspackages/connector-sdk/src/connector-types.tspackages/connector-worker/src/compile/index.tspackages/connector-worker/src/daemon/client.tspackages/connector-worker/src/daemon/executor.tspackages/connector-worker/src/executor/interface.tspackages/connector-worker/src/executor/runtime.tspackages/connector-worker/src/executor/subprocess.tspackages/server/src/worker-api.ts
| function installIsStale(root: string): boolean { | ||
| const nodeModules = join(root, "node_modules"); | ||
| if (!existsSync(nodeModules)) return true; | ||
| const lock = join(root, "bun.lock"); | ||
| if (!existsSync(lock)) return false; // deps present, no lockfile to compare against | ||
| try { | ||
| return statSync(lock).mtimeMs > statSync(nodeModules).mtimeMs; | ||
| } catch { |
There was a problem hiding this comment.
Staleness check should include package.json changes
Line 35-42 only compares bun.lock vs node_modules. If package.json changed but the lockfile wasn’t updated yet, this returns non-stale and skips install, causing downstream module-resolution failures during compile.
Proposed fix
function installIsStale(root: string): boolean {
const nodeModules = join(root, "node_modules");
if (!existsSync(nodeModules)) return true;
+ const packageJson = join(root, "package.json");
const lock = join(root, "bun.lock");
- if (!existsSync(lock)) return false; // deps present, no lockfile to compare against
try {
- return statSync(lock).mtimeMs > statSync(nodeModules).mtimeMs;
+ const nodeModulesMtime = statSync(nodeModules).mtimeMs;
+ const packageMtime = existsSync(packageJson)
+ ? statSync(packageJson).mtimeMs
+ : 0;
+ const lockMtime = existsSync(lock) ? statSync(lock).mtimeMs : 0;
+ return Math.max(packageMtime, lockMtime) > nodeModulesMtime;
} catch {
return false;
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function installIsStale(root: string): boolean { | |
| const nodeModules = join(root, "node_modules"); | |
| if (!existsSync(nodeModules)) return true; | |
| const lock = join(root, "bun.lock"); | |
| if (!existsSync(lock)) return false; // deps present, no lockfile to compare against | |
| try { | |
| return statSync(lock).mtimeMs > statSync(nodeModules).mtimeMs; | |
| } catch { | |
| function installIsStale(root: string): boolean { | |
| const nodeModules = join(root, "node_modules"); | |
| if (!existsSync(nodeModules)) return true; | |
| const packageJson = join(root, "package.json"); | |
| const lock = join(root, "bun.lock"); | |
| try { | |
| const nodeModulesMtime = statSync(nodeModules).mtimeMs; | |
| const packageMtime = existsSync(packageJson) | |
| ? statSync(packageJson).mtimeMs | |
| : 0; | |
| const lockMtime = existsSync(lock) ? statSync(lock).mtimeMs : 0; | |
| return Math.max(packageMtime, lockMtime) > nodeModulesMtime; | |
| } catch { | |
| return false; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/cli/src/commands/_lib/ensure-deps-installed.ts` around lines 35 -
42, The installIsStale function currently only compares bun.lock vs
node_modules; update it to also consider package.json modifications: obtain the
mtimeMs for package.json (join(root, "package.json")) and treat the install as
stale if package.json.mtimeMs > node_modules.mtimeMs (or if bun.lock exists and
bun.lock.mtimeMs > node_modules.mtimeMs); if package.json is missing keep
existing behavior, and preserve the try/catch around statSync calls for
node_modules, bun.lock, and package.json when determining staleness in
installIsStale.
| try { | ||
| pkgJson = JSON.parse(await readFile(pkgJsonPath, "utf-8")) as Record< | ||
| string, | ||
| unknown | ||
| >; | ||
| } catch { | ||
| pkgJson = { |
There was a problem hiding this comment.
Restrict fallback behavior to missing-file errors only
At Line 720 and Line 741, the broad catch treats parse/permission/path errors as “file missing,” then proceeds to write scaffolding files. In --here flows, this can overwrite existing project files unexpectedly.
Proposed fix
- try {
- pkgJson = JSON.parse(await readFile(pkgJsonPath, "utf-8")) as Record<
- string,
- unknown
- >;
- } catch {
+ try {
+ const raw = await readFile(pkgJsonPath, "utf-8");
+ pkgJson = JSON.parse(raw) as Record<string, unknown>;
+ } catch (err) {
+ const code = (err as NodeJS.ErrnoException).code;
+ if (code !== "ENOENT") throw err;
pkgJson = {
name: projectName,
version: "0.0.0",
private: true,
type: "module",
};
}
@@
- try {
- await readFile(tsconfigPath, "utf-8"); // exists — leave the user's config untouched
- } catch {
+ try {
+ await readFile(tsconfigPath, "utf-8"); // exists — leave the user's config untouched
+ } catch (err) {
+ const code = (err as NodeJS.ErrnoException).code;
+ if (code !== "ENOENT") throw err;
await writeFile(
tsconfigPath,
`${JSON.stringify(Also applies to: 741-744
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/cli/src/commands/init.ts` around lines 720 - 726, The current broad
catch around JSON.parse(await readFile(pkgJsonPath, "utf-8")) masks parse,
permission, and other I/O errors and treats them as "missing file"; change the
logic so you first attempt to read the file and only treat ENOENT
(file-not-found) as the condition to create the fallback pkgJson, while
rethrowing or surfaceing other errors (e.g., permission errors or invalid JSON).
Concretely: isolate the readFile call and catch only errors whose code ===
"ENOENT" to set pkgJson = { ...fallback... }; run JSON.parse in its own try and
handle JSON parse errors separately (log and exit or rethrow). Apply the same
tightening of error checks to the similar catch block around the scaffold/write
flow referenced near the other catch (lines ~741-744) so you never overwrite an
existing project on non-ENOENT errors.
| LEFT JOIN connector_definitions cd ON cd.key = r.connector_key | ||
| AND cd.organization_id = r.organization_id | ||
| AND cd.status = 'active' |
There was a problem hiding this comment.
Potential row duplication if multiple active connector_definitions exist.
The LEFT JOIN connector_definitions at lines 504-506 doesn't include LIMIT 1 or deduplication. If an organization has multiple connector_definitions rows for the same key with status = 'active', this join will produce duplicate result rows. Consider using a LATERAL subquery with LIMIT 1 (similar to the claim CTE at lines 373-380) or adding ORDER BY ... LIMIT 1:
- LEFT JOIN connector_definitions cd ON cd.key = r.connector_key
- AND cd.organization_id = r.organization_id
- AND cd.status = 'active'
+ LEFT JOIN LATERAL (
+ SELECT runtime FROM connector_definitions
+ WHERE key = r.connector_key
+ AND organization_id = r.organization_id
+ AND status = 'active'
+ ORDER BY updated_at DESC, id DESC
+ LIMIT 1
+ ) cd ON true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| LEFT JOIN connector_definitions cd ON cd.key = r.connector_key | |
| AND cd.organization_id = r.organization_id | |
| AND cd.status = 'active' | |
| LEFT JOIN LATERAL ( | |
| SELECT runtime FROM connector_definitions | |
| WHERE key = r.connector_key | |
| AND organization_id = r.organization_id | |
| AND status = 'active' | |
| ORDER BY updated_at DESC, id DESC | |
| LIMIT 1 | |
| ) cd ON true |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/server/src/worker-api.ts` around lines 504 - 506, The LEFT JOIN to
connector_definitions (alias cd) using cd.key = r.connector_key and
cd.organization_id = r.organization_id with cd.status = 'active' can produce
duplicate rows if multiple active rows exist; change this join to select a
single connector_definition per r by using a LATERAL subquery (or an ordered
subquery with ORDER BY ... LIMIT 1) that filters status = 'active' and matches
r.connector_key and r.organization_id (similar to the strategy used in the claim
CTE), ensuring only one cd row is returned per r to avoid duplication.
|
Closing as duplicate. The build-pgvector-embedded workflow opened 6 PRs for the same v0.8.1 rebuild; the build is non-deterministic so these only churn binary artifacts with no functional change. Several also carry already-merged work (#973, #978) or the release-please 9.0.0 commit (#939). main's existing prebuilt artifacts are kept. |
…1051) The push trigger used a bare paths filter, which GitHub ignores for freshly-created refs (it can't diff them). So every release-please tag (lobu-vX.Y.Z) ran the rebuild and, because the binaries aren't byte-reproducible, opened a no-op 'rebuild artifacts' PR each release (#972, #974, #975, #977, #979, #980, #981, #996, #1002, #1005, #1025, #1029). Scope the push trigger to branches: [main] so tag refs no longer match, and drop the workflow file from paths (only build.sh changes the artifacts). workflow_dispatch still covers deliberate pgvector/PG bumps.
Regenerated by the build-pgvector-embedded workflow (pgvector v0.8.1, PG 18).
Summary by CodeRabbit
Release Notes