refactor(pgvector-embedded): vendor shared control/SQL once, not per-platform (−2.7k lines)#978
Conversation
…platform vector.control + vector--<version>.sql are byte-identical across all four platforms (verified) — only the compiled library differs. They were vendored 4×, bloating the tree by ~2,750 lines of duplicated install SQL. Move them to the prebuilt root (vendored once); keep only the platform binary per dir. - injector (src/index.ts): read control/SQL from the shared root, library from prebuilt/<platform>/. - build.sh: stage library → prebuilt/<platform>/, control/SQL → prebuilt/ root. - build-pgvector-embedded.yml: artifact carries the shared files; open-pr stages binary per-platform + the shared control/SQL once. Validated: build.sh reproduces the layout; E2E inject into embedded PG + CREATE EXTENSION vector + a distance query all pass. Net -2,723 lines.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR reorganizes pgvector-embedded artifact storage to avoid duplication: platform-specific native binaries remain in Changespgvector-embedded artifact separation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
The previous commit's biome --write reformatting wasn't re-staged before the commit, so CI format:check flagged it. Commit the formatted version.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/pgvector-embedded/scripts/build.sh`:
- Around line 51-76: The shared prebuilt dir (SHARED_DIR) can retain old
vector.control and vector--*.sql files across version bumps; before copying the
new vector.control and vector--${PGVECTOR_SQL_VERSION}.sql into SHARED_DIR,
remove any existing vector.control and vector--*.sql in SHARED_DIR (e.g., purge
files matching "vector.control" and "vector--*.sql") so only the pinned
SQL/control are staged and no stale artifacts remain.
In `@packages/pgvector-embedded/src/index.ts`:
- Around line 49-54: The hasPrebuilt(platform: string = currentPlatformKey())
function currently returns true if the platform library and the shared control
file exist but doesn't verify the presence of the platform-specific SQL
artifact, causing runtime failures; update hasPrebuilt to also check for the
corresponding vector--*.sql file under PREBUILT_ROOT (e.g., match the platform
key to find a file like "vector--<platform>.sql" or search for files matching
"vector--*.sql") before returning true, using the existing PREBUILT_ROOT
constant and helper utilities (or fs.readdirSync/existsSync) so that
hasPrebuiltLibrary(platform) && existsSync(join(PREBUILT_ROOT,
"vector.control")) && sqlFileExists returns the final boolean.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: eaa93dec-2ce1-49b5-97c5-49414e4abd48
📒 Files selected for processing (11)
.github/workflows/build-pgvector-embedded.ymlpackages/pgvector-embedded/prebuilt/darwin-x64/vector--0.8.1.sqlpackages/pgvector-embedded/prebuilt/darwin-x64/vector.controlpackages/pgvector-embedded/prebuilt/linux-arm64/vector--0.8.1.sqlpackages/pgvector-embedded/prebuilt/linux-arm64/vector.controlpackages/pgvector-embedded/prebuilt/linux-x64/vector--0.8.1.sqlpackages/pgvector-embedded/prebuilt/linux-x64/vector.controlpackages/pgvector-embedded/prebuilt/vector--0.8.1.sqlpackages/pgvector-embedded/prebuilt/vector.controlpackages/pgvector-embedded/scripts/build.shpackages/pgvector-embedded/src/index.ts
💤 Files with no reviewable changes (6)
- packages/pgvector-embedded/prebuilt/linux-x64/vector.control
- packages/pgvector-embedded/prebuilt/darwin-x64/vector--0.8.1.sql
- packages/pgvector-embedded/prebuilt/darwin-x64/vector.control
- packages/pgvector-embedded/prebuilt/linux-x64/vector--0.8.1.sql
- packages/pgvector-embedded/prebuilt/linux-arm64/vector--0.8.1.sql
- packages/pgvector-embedded/prebuilt/linux-arm64/vector.control
| SHARED_DIR="${PKG_ROOT}/prebuilt" | ||
|
|
||
| rm -rf "${OUT_DIR}" | ||
| mkdir -p "${OUT_DIR}" | ||
| mkdir -p "${OUT_DIR}" "${SHARED_DIR}" | ||
|
|
||
| # Stage straight from the build dir — we run `make` but not `make install`, so | ||
| # nothing lands in the OS Postgres's pkglibdir/sharedir. The library, control | ||
| # file, and generated SQL all sit under the cloned/built pgvector tree. | ||
| # | ||
| # Compiled extension library — name differs per platform ($(DLSUFFIX)). | ||
| # Compiled extension library is the only PLATFORM-SPECIFIC artifact → goes in | ||
| # prebuilt/<platform>/ (name differs per platform via $(DLSUFFIX)). | ||
| cp "${WORK}/pgvector/vector"*.so "${OUT_DIR}/" 2>/dev/null || true | ||
| cp "${WORK}/pgvector/vector"*.dylib "${OUT_DIR}/" 2>/dev/null || true | ||
| # Fall back to an installed copy only if the build dir somehow lacks the lib. | ||
| if ! ls "${OUT_DIR}"/vector.* >/dev/null 2>&1; then | ||
| cp "${PKGLIBDIR}/vector".* "${OUT_DIR}/" | ||
| fi | ||
|
|
||
| # Control + the full-install SQL for the pinned version only. CREATE EXTENSION | ||
| # at default_version reads vector--<version>.sql directly; the vector--A--B.sql | ||
| # upgrade scripts are only for ALTER EXTENSION ... UPDATE, which never runs on a | ||
| # fresh embedded DB, so we don't ship them. `vector.control` is checked into the | ||
| # repo; `sql/vector--<version>.sql` is generated by `make`. | ||
| # Control + the full-install SQL for the pinned version are byte-identical | ||
| # across platforms, so they're vendored ONCE at the prebuilt root (not per | ||
| # platform). CREATE EXTENSION at default_version reads vector--<version>.sql | ||
| # directly; the vector--A--B.sql upgrade scripts only run on ALTER EXTENSION | ||
| # UPDATE, never on a fresh embedded DB, so we don't ship them. | ||
| PGVECTOR_SQL_VERSION="${PGVECTOR_VERSION#v}" | ||
| cp "${WORK}/pgvector/vector.control" "${OUT_DIR}/" | ||
| cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${OUT_DIR}/" | ||
| cp "${WORK}/pgvector/vector.control" "${SHARED_DIR}/" | ||
| cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${SHARED_DIR}/" |
There was a problem hiding this comment.
Clear previously vendored shared SQL/control files before staging the pinned version.
The script rewrites platform output but not the shared root. On a version bump, old vector--*.sql can persist and get re-uploaded via the wildcard path, so artifacts become non-deterministic.
Suggested patch
SHARED_DIR="${PKG_ROOT}/prebuilt"
rm -rf "${OUT_DIR}"
mkdir -p "${OUT_DIR}" "${SHARED_DIR}"
+rm -f "${SHARED_DIR}/vector.control"
+rm -f "${SHARED_DIR}"/vector--*.sql
# Stage straight from the build dir — we run `make` but not `make install`, so
# nothing lands in the OS Postgres's pkglibdir/sharedir. The library, control
@@
PGVECTOR_SQL_VERSION="${PGVECTOR_VERSION#v}"
cp "${WORK}/pgvector/vector.control" "${SHARED_DIR}/"
cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${SHARED_DIR}/"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SHARED_DIR="${PKG_ROOT}/prebuilt" | |
| rm -rf "${OUT_DIR}" | |
| mkdir -p "${OUT_DIR}" | |
| mkdir -p "${OUT_DIR}" "${SHARED_DIR}" | |
| # Stage straight from the build dir — we run `make` but not `make install`, so | |
| # nothing lands in the OS Postgres's pkglibdir/sharedir. The library, control | |
| # file, and generated SQL all sit under the cloned/built pgvector tree. | |
| # | |
| # Compiled extension library — name differs per platform ($(DLSUFFIX)). | |
| # Compiled extension library is the only PLATFORM-SPECIFIC artifact → goes in | |
| # prebuilt/<platform>/ (name differs per platform via $(DLSUFFIX)). | |
| cp "${WORK}/pgvector/vector"*.so "${OUT_DIR}/" 2>/dev/null || true | |
| cp "${WORK}/pgvector/vector"*.dylib "${OUT_DIR}/" 2>/dev/null || true | |
| # Fall back to an installed copy only if the build dir somehow lacks the lib. | |
| if ! ls "${OUT_DIR}"/vector.* >/dev/null 2>&1; then | |
| cp "${PKGLIBDIR}/vector".* "${OUT_DIR}/" | |
| fi | |
| # Control + the full-install SQL for the pinned version only. CREATE EXTENSION | |
| # at default_version reads vector--<version>.sql directly; the vector--A--B.sql | |
| # upgrade scripts are only for ALTER EXTENSION ... UPDATE, which never runs on a | |
| # fresh embedded DB, so we don't ship them. `vector.control` is checked into the | |
| # repo; `sql/vector--<version>.sql` is generated by `make`. | |
| # Control + the full-install SQL for the pinned version are byte-identical | |
| # across platforms, so they're vendored ONCE at the prebuilt root (not per | |
| # platform). CREATE EXTENSION at default_version reads vector--<version>.sql | |
| # directly; the vector--A--B.sql upgrade scripts only run on ALTER EXTENSION | |
| # UPDATE, never on a fresh embedded DB, so we don't ship them. | |
| PGVECTOR_SQL_VERSION="${PGVECTOR_VERSION#v}" | |
| cp "${WORK}/pgvector/vector.control" "${OUT_DIR}/" | |
| cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${OUT_DIR}/" | |
| cp "${WORK}/pgvector/vector.control" "${SHARED_DIR}/" | |
| cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${SHARED_DIR}/" | |
| SHARED_DIR="${PKG_ROOT}/prebuilt" | |
| rm -rf "${OUT_DIR}" | |
| mkdir -p "${OUT_DIR}" "${SHARED_DIR}" | |
| rm -f "${SHARED_DIR}/vector.control" | |
| rm -f "${SHARED_DIR}"/vector--*.sql | |
| # Stage straight from the build dir — we run `make` but not `make install`, so | |
| # nothing lands in the OS Postgres's pkglibdir/sharedir. The library, control | |
| # file, and generated SQL all sit under the cloned/built pgvector tree. | |
| # | |
| # Compiled extension library is the only PLATFORM-SPECIFIC artifact → goes in | |
| # prebuilt/<platform>/ (name differs per platform via $(DLSUFFIX)). | |
| cp "${WORK}/pgvector/vector"*.so "${OUT_DIR}/" 2>/dev/null || true | |
| cp "${WORK}/pgvector/vector"*.dylib "${OUT_DIR}/" 2>/dev/null || true | |
| # Fall back to an installed copy only if the build dir somehow lacks the lib. | |
| if ! ls "${OUT_DIR}"/vector.* >/dev/null 2>&1; then | |
| cp "${PKGLIBDIR}/vector".* "${OUT_DIR}/" | |
| fi | |
| # Control + the full-install SQL for the pinned version are byte-identical | |
| # across platforms, so they're vendored ONCE at the prebuilt root (not per | |
| # platform). CREATE EXTENSION at default_version reads vector--<version>.sql | |
| # directly; the vector--A--B.sql upgrade scripts only run on ALTER EXTENSION | |
| # UPDATE, never on a fresh embedded DB, so we don't ship them. | |
| PGVECTOR_SQL_VERSION="${PGVECTOR_VERSION#v}" | |
| cp "${WORK}/pgvector/vector.control" "${SHARED_DIR}/" | |
| cp "${WORK}/pgvector/sql/vector--${PGVECTOR_SQL_VERSION}.sql" "${SHARED_DIR}/" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/pgvector-embedded/scripts/build.sh` around lines 51 - 76, The shared
prebuilt dir (SHARED_DIR) can retain old vector.control and vector--*.sql files
across version bumps; before copying the new vector.control and
vector--${PGVECTOR_SQL_VERSION}.sql into SHARED_DIR, remove any existing
vector.control and vector--*.sql in SHARED_DIR (e.g., purge files matching
"vector.control" and "vector--*.sql") so only the pinned SQL/control are staged
and no stale artifacts remain.
| export function hasPrebuilt(platform: string = currentPlatformKey()): boolean { | ||
| return existsSync(join(prebuiltDir(platform), "vector.control")); | ||
| // Platform-specific library + the shared control file (vendored once at root). | ||
| return ( | ||
| hasPrebuiltLibrary(platform) && | ||
| existsSync(join(PREBUILT_ROOT, "vector.control")) | ||
| ); |
There was a problem hiding this comment.
Require shared SQL presence in hasPrebuilt() before reporting artifacts as usable.
Current check validates library + vector.control only. If vector--*.sql is missing, injection still proceeds and the failure is deferred to extension creation at runtime.
Suggested patch
export function hasPrebuilt(platform: string = currentPlatformKey()): boolean {
// Platform-specific library + the shared control file (vendored once at root).
+ const hasSharedSql =
+ existsSync(PREBUILT_ROOT) &&
+ readdirSync(PREBUILT_ROOT).some(
+ (f) => f.startsWith("vector--") && f.endsWith(".sql")
+ );
return (
hasPrebuiltLibrary(platform) &&
- existsSync(join(PREBUILT_ROOT, "vector.control"))
+ existsSync(join(PREBUILT_ROOT, "vector.control")) &&
+ hasSharedSql
);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/pgvector-embedded/src/index.ts` around lines 49 - 54, The
hasPrebuilt(platform: string = currentPlatformKey()) function currently returns
true if the platform library and the shared control file exist but doesn't
verify the presence of the platform-specific SQL artifact, causing runtime
failures; update hasPrebuilt to also check for the corresponding vector--*.sql
file under PREBUILT_ROOT (e.g., match the platform key to find a file like
"vector--<platform>.sql" or search for files matching "vector--*.sql") before
returning true, using the existing PREBUILT_ROOT constant and helper utilities
(or fs.readdirSync/existsSync) so that hasPrebuiltLibrary(platform) &&
existsSync(join(PREBUILT_ROOT, "vector.control")) && sqlFileExists returns the
final boolean.
|
bug_free 55, simplicity 72, slop 8, bugs 1, 1 blockers Read diff/logs. Typecheck passed. [env] unit suite hit stale connector-sdk dist from connector-deps. Integration failed in embedded PG setup. Exploratory: booted bundled server with DATABASE_URL=file:///tmp/lobu-review-runtime and /health returned 200; embeddings child warned Cannot find module './cjs/index.cjs'. Blockers
Suggested fixes
Full verdict JSON{
"bug_free_confidence": 55,
"bugs": 1,
"slop": 8,
"simplicity": 72,
"blockers": [
"integration suite failed: embedded Postgres test backend aborts during initdb with missing @embedded-postgres native lib symlink (libicudata.77.dylib)"
],
"change_type": "feat",
"behavior_change_risk": "high",
"tests_adequate": false,
"suggested_fixes": [
{
"file": "packages/server/src/__tests__/setup/embedded-postgres-backend.ts",
"line": 46,
"change": "Fix the embedded-postgres test bootstrap so a fresh review/test run can initdb reliably; add a preflight/repair for required native library symlinks or avoid the broken native package resolution path before calling pg.initialise()."
}
],
"notes": "Read diff/logs. Typecheck passed. [env] unit suite hit stale connector-sdk dist from connector-deps. Integration failed in embedded PG setup. Exploratory: booted bundled server with DATABASE_URL=file:///tmp/lobu-review-runtime and /health returned 200; embeddings child warned Cannot find module './cjs/index.cjs'.",
"categories": {
"src": 2950,
"tests": 1516,
"docs": 90,
"config": 302,
"deps": 116,
"migrations": 100,
"ci": 140,
"generated": 922
}
}Local review gate — branch protection can require the |
What
vector.control+vector--0.8.1.sqlare byte-identical across all four platforms (verified by md5) — only the compiled library (vector.so/vector.dylib) is platform-specific. They were vendored 4×, bloating the repo by ~2,750 lines of duplicated install SQL.Move them to the prebuilt root (vendored once); keep only the binary per
prebuilt/<platform>/.Changes
src/index.ts): library fromprebuilt/<platform>/, control/SQL from the shared root.open-prstages binary per-platform + control/SQL once.Validation
build.shreproduces the committed layout (darwin-arm64 binary byte-identical).CREATE EXTENSION vector→ distance query — all pass ('[1,2,3]'::vector <-> '[1,2,4]'::vector = 1).Net: 71 insertions, 2,794 deletions. Follows up #965.
Summary by CodeRabbit
Refactor
Chores