PR5479 cross-platform install validation (throwaway) by danielhanchen · Pull Request #128 · danielhanchen/unsloth-staging-2

danielhanchen · 2026-05-18T12:15:56Z

Throwaway PR on the staging fork to validate unslothai/unsloth#5479's lockfile-strict install path on real ubuntu-latest / macos-14 / windows-latest runners.

This branch carries the PR head plus the pyproject Intel-GPU restore fix, with all upstream workflows stripped and three targeted validation workflows in their place:

.github/workflows/pr5479-validate-linux.yml
.github/workflows/pr5479-validate-macos.yml
.github/workflows/pr5479-validate-windows.yml

Per workflow:

shell or PowerShell syntax check on studio/setup.sh + build.sh (Linux/macOS) or studio/setup.ps1 (Windows)
integrity-hash spot-check across all resolved entries in the three lockfiles
npm ci against studio/, studio/backend/core/data_recipe/oxc-validator/, studio/frontend/
asserts no lockfile or manifest mutation after the installs
lockfile_supply_chain_audit.py positive run + negative run (missing-lockfile must exit 1)
residual sweep for any active bun install / bunx

Not for merge.

build.sh and studio/setup.sh both call naked `bun install` and `npm install`. With caret ranges in package.json (the default for most deps), those commands resolve a fresh minor/patch from the registry if one exists, even though the lockfile pins specific versions. An attacker who hijacks any transitive dep and publishes a malicious patch release can have it pulled into the release build or end-user install without anyone noticing. Both paths now use lockfile-strict mode: bun install -> bun install --frozen-lockfile npm install -> npm ci These install exactly what the committed lockfile pins, verify cryptographic hashes, and fail fast on any drift between package.json and the lockfile. The CI workflows that build the frontend already use `npm ci`; this aligns the local build and end-user setup paths with the same guarantee. Verified `npm ci --no-fund --no-audit --dry-run` exits 0 against the current studio/frontend lockfile (1042 packages, no drift).

Followup to 152fe8d. Three more sites still called naked `bun install` / `npm install`, which honour caret ranges in package.json and can pull a fresh minor/patch of a transitive dep from the registry on the next run. studio/setup.ps1 (4 sites): the Windows end-user installer. bun install -> bun install --frozen-lockfile (both initial and the cache-clear retry); the npm fallback and the OXC validator npm install both -> npm ci. Error messages updated to reference the new command. studio/setup.sh: the OXC validator runtime install for the Unix path was still naked `npm install`. Now `npm ci`. github/workflows/release-desktop.yml: the desktop release build's frontend install was still naked `npm install`. Now `npm ci` so a published .app/.dmg/.AppImage/.msi can never have shipped with a registry-resolved transitive that drifted from the committed lockfile. The pinned Tauri CLI install in the same workflow stays as `npm install --save-dev @tauri-apps/cli@2.10.1` because that line is intentionally adding a specific package to package.json, not syncing from the lockfile. Verified `npm ci --no-fund --no-audit --dry-run` exits 0 against both the studio/frontend and studio/backend/core/data_recipe/ oxc-validator lockfiles.

Followup to 7bb1eb6 (npm install -> npm ci for the oxc-validator runtime install in studio/setup.sh, studio/setup.ps1). That commit worked locally because the lockfile already existed there from an earlier `npm install`, but `npm ci` failed in CI because the lockfile was never committed: npm error EUSAGE npm error The `npm ci` command can only install with an existing npm error package-lock.json or npm-shrinkwrap.json with lockfileVersion >= 1. Root cause: the project-root .gitignore has a bare `package-lock.json` entry left over from a Python-template gitignore. The frontend lockfile was force-added past it; the oxc-validator lockfile never was. So a fresh actions/checkout did not have it. Fix: Force-commit studio/backend/core/data_recipe/oxc-validator/package-lock.json (5 packages, lockfileVersion 3, integrity-pinned). Replace the bare gitignore rule with explicit `!` exceptions for the two committed npm-project lockfiles, with a comment explaining why stray lockfiles in random Python subtrees are still ignored. The pyproject.toml package-data glob `backend/core/data_recipe/oxc-validator/*.json` already pulls the lockfile into the pip-installed wheel; the only gap was that fresh git checkouts (CI) didn't have it.

Collapse the worked-example narrative to one-line WHYs. Code is unchanged.

Two issues codex flagged: 1. bun.lock gate (studio/frontend/.gitignore line 14 ignores bun.lock, so it is never committed). bun install --frozen-lockfile cannot migrate from package-lock.json, so without a bun.lock the bun path always fails. setup.sh then misclassifies that as a corrupt cache, clears the user's bun cache, and re-runs the same guaranteed-failing command before falling back to npm. build.sh, studio/setup.sh, and studio/setup.ps1 now only enter the bun path when bun.lock is present; otherwise we go straight to npm ci. 2. OXC validator lockfile was outside the npm supply-chain scan surface. lockfile_supply_chain_audit.py default, npm audit, OSV, scan_npm_packages.py invocation, and the diff-for-new-install-scripts step all now cover both lockfiles. security-audit.yml pull_request paths filter triggers on changes to either. wheel-smoke checks the built wheel ships the OXC lockfile too. Verified: python3 scripts/lockfile_supply_chain_audit.py > OK: 0 findings across 2 npm + 1 cargo lockfile(s) python3 scripts/scan_npm_packages.py --lockfile oxc-validator/... > OK

Reviewer found a real bug in 0ddfc10: the new two-scan step ran under `set -o pipefail` (default `set -e` from the step shell), so a HIGH or CRITICAL on the frontend lockfile would abort the step before the OXC scan ran. Both reports are most useful exactly when one has already failed. Capture each rc via PIPESTATUS, run both scans unconditionally, write both into the step summary, and only then propagate the worst rc.

Two blockers from the parallel Opus review batch: 1. The Tauri CLI install in release-desktop.yml was the last unfrozen install path: `npm install --save-dev --prefix studio @tauri-apps/cli@2.10.1 --no-fund --no-audit` pins the top-level version but leaves the transitive closure floating, defeats the pre-install lockfile audit (no lockfile to scan), and skips integrity verification. Committed a minimal studio/package.json (devDep @tauri-apps/cli@2.10.1) plus the resolved studio/package-lock.json (12 packages: CLI + 11 platform-native binaries, all with integrity hashes, lockfileVersion 3). Switched the step to `npm ci --prefix studio` and added a pre-install lockfile_supply_chain_audit.py step ahead of it so any tarball postinstall is gated by the structural scan. Allowlisted studio/package-lock.json .gitignore and added it to the audit script's default scan set. 2. The bun branch was dead code in build.sh, studio/setup.sh, and studio/setup.ps1: nowhere in the repo is a bun.lock committed, and `bun install --frozen-lockfile` cannot migrate from package-lock.json. With no lockfile, every entry to the bun path either silently regenerates a bun.lock (under permissive install modes -- a fresh attack surface) or fails outright (under frozen-lockfile). Removed `npm install -g bun` bootstrap, the `_try_bun_install` helper + cache-retry, every `if bun.lock && command -v bun` guard, and the now-unreachable "fall back to npm" messaging. All three scripts now have a single `npm ci` path. bun.lock skip entries in lint-ci.yml + wheel-smoke.yml are kept as forward-compat sanity checks -- they assert bun.lock is NOT shipped / scanned, which is stronger after this commit, not weaker. Smoke-tested locally: `npm ci --prefix studio` resolves 3 packages (CLI + 2 linux native binaries), `npx --prefix studio tauri --version` prints `tauri-cli 2.10.1` exactly. `python3 scripts/lockfile_supply_chain_audit.py` scans 3 npm + 1 cargo lockfiles, 0 findings. `bash -n build.sh`, `bash -n studio/setup.sh`, and a pwsh scriptblock parse of studio/setup.ps1 all succeed.

Brings the parallel CI paths into line with the lockfile-pinned release path and tightens the supply-chain audit surface: studio-tauri-smoke.yml: run lockfile_supply_chain_audit.py before the Tauri CLI install, and install via `npm ci --prefix studio` against the committed studio/package-lock.json (was a mutable `npm install --save-dev` post-audit). This relocates the existing pre-install lockfile supply-chain audit step; the step's name and command are preserved verbatim so its purpose is unchanged, only its position relative to the install. The earlier security rationale about lifecycle scripts and the postinstall-dropper class is preserved on the Frontend build step where it actually applies (vite/esbuild lifecycle scripts run on the frontend install); the Tauri CLI install step gets a new rationale tied to `npm ci` semantics. security-audit.yml: * add studio/package.json and studio/package-lock.json to the PR path filter so a Tauri CLI lockfile change cannot bypass the workflow, * extend OSV-Scanner, scan_npm_packages.py (with LOG3 and exit- code propagation), and the install-script diff to cover studio/package-lock.json, * add an npm audit step for the Tauri CLI holder project, * extend the npm-provenance-and-install-scripts job with --ignore-scripts installs + npm audit signatures for the oxc-validator and Tauri CLI holder projects; the existing frontend audit-signatures step is renamed to "(Studio frontend, informational)" purely for disambiguation against the two new sibling steps, with its log path rerouted through $GITHUB_WORKSPACE so a single artifact upload can collect all three logs, * update the lockfile-audit step summary to list the Tauri CLI holder lockfile, * fix the stale "Initially non-blocking" comment on the now- blocking npm scan-packages step. build.sh and studio/setup.ps1 (oxc): pass --no-fund --no-audit to npm ci for parity with the other call sites. studio/setup.sh and studio/setup.ps1: restore the bun.lock exclusion in the frontend staleness check so a leftover local bun.lock from the migration does not trigger a spurious rebuild. scripts/lockfile_supply_chain_audit.py: emit a HIGH-severity missing-lockfile Finding when a requested lockfile does not exist, so a deleted default cannot silently pass the audit. Uses the script's own Finding accumulator pattern (sibling scripts/scan_npm_packages.py implements the same intent via an rc=2 hard-fail, its single-lockfile-per-invocation idiom; this script aggregates multiple lockfiles so Finding is the natural channel). scripts/check_frontend_dep_removal.py: add studio/package.json and studio/package-lock.json to EXPECTED_NOISE_FILES; the new Tauri CLI holder manifests must not count as frontend dep usage.

When a caller passes an explicit --npm-lockfile or --cargo-lockfile, they are scoping the scan to the paths they listed; the script was still silently grafting the other ecosystem's defaults on top, which meant `--npm-lockfile X` would also audit DEFAULT_CARGO_LOCKFILES. With the missing-lockfile Finding now emitted, that surfaced as a false positive whenever a caller explicitly scoped only one ecosystem. Default fallback is now reserved for the no-args CI invocation, where every default path is expected to exist.

Collapse 4-6 line "why this matters" blocks to 1-2 lines stating the single load-bearing fact in lockfile_supply_chain_audit.py (missing- lockfile finding rationale, CLI default-scoping rationale) and in studio-tauri-smoke.yml (pre-install audit ordering, npm ci semantics, frontend lifecycle-script context). No behaviour change.

The merge of main into ci/frozen-lockfile-installs in 888347d resolved pyproject.toml against the pre-merge branch state, silently dropping the intel-gpu xpu extras that landed on main in f7069b2 and were pinned in e20bbef (unslothai#5499) after this branch was created. None of the lockfile-pin feature commits touch pyproject.toml; this restore reverts only the unintended deletion and keeps the supply-chain hardening intact. Restored extras: intelgputorch271 + intel-gpu-torch271 intelgputorch291 + intel-gpu-torch291 intelgputorch2110 + intel-gpu-torch2110 intelgputorch2120 + intel-gpu-torch2120 Restored the triton-xpu 3.6.0 pin section inside intelgputorch210.

gemini-code-assist

Code Review

This pull request introduces several infrastructure and reliability improvements, including a robust offline mode for Hugging Face model loading, a new per-account and per-IP login rate limiter, and enhanced error handling for llama-server subprocesses. Key changes include DNS-based offline detection to prevent long-running network timeouts, improved shell command sanitization, and a new CopyableErrorChip component for the frontend. My review identified potential thread accumulation in the DNS probe and a security concern regarding the trust model for X-Forwarded-For headers in the rate limiter.

gemini-code-assist · 2026-05-18T12:26:26Z

+def _probe_dns_dead(host: str = "huggingface.co", timeout: float = 2.0) -> bool:
+    """Quick DNS check. Runs on a daemon thread so concurrent sockets
+    in the same process are not affected by socket.setdefaulttimeout."""
+    result: list[Optional[bool]] = [None]
+
+    def _probe() -> None:
+        try:
+            socket.gethostbyname(host)
+            result[0] = False
+        except Exception:
+            result[0] = True
+
+    t = threading.Thread(target = _probe, daemon = True)
+    t.start()
+    t.join(timeout)
+    # Thread still running -> resolver wedged -> treat as dead.
+    return True if result[0] is None else result[0]


The DNS probe creates a new thread on every call. If the DNS resolver is slow or wedged, and this function is called frequently (e.g., during concurrent model loads where the download phase is not locked), it could lead to an accumulation of threads. Consider caching the probe result for a short duration (e.g., 60 seconds) to mitigate this risk.

gemini-code-assist · 2026-05-18T12:26:26Z

+        xff = request.headers.get("x-forwarded-for", "")
+        if xff:
+            # First entry is the originating client.
+            normalized = _normalize_forwarded_addr(xff.split(",", 1)[0])
+            if normalized:
+                return normalized


Trusting the leftmost IP in the X-Forwarded-For header is susceptible to spoofing if the edge proxy does not strip or overwrite incoming X-Forwarded-For headers from the client. An attacker can provide a fake IP in the header to bypass per-IP rate limits. In a typical deployment with a trusted proxy, it is safer to either take the IP from the right side of the chain (the one added by your trusted proxy) or allow the user to configure the number of trusted proxy hops.

Opus reviewer 3 caught a crash path: when a lockfile is unreadable (chmod 000, permission denied, is-a-directory, broken pipe, etc.) the audit script bubbles up a raw Python traceback and exits 1 with no Finding emitted. In CI, a transient permission or encoding error is indistinguishable from a malicious lockfile -- the maintainer reads the log and sees noise, not a structured diagnosis. Wraps both path.read_text() calls (npm and cargo branches) in a try/except OSError that appends a new Finding kind "unreadable-lockfile" with the original errno detail, mirroring how missing-lockfile is handled today. Backward compatible (additive); all 24 existing audit fixtures still pass. Verified by fixture: chmod 000 on a lockfile -> exit 1 with [unreadable-lockfile] finding, no traceback.

…adable-lockfile

Adds bun.lock alongside the existing package-lock.json for: studio/ (Tauri CLI holder, 13 pkgs) studio/backend/core/data_recipe/oxc-validator/ (49 pkgs) studio/frontend/ (1090 pkgs) Generated with `bun install --lockfile-only --save-text-lockfile` against Bun 1.3.11. Each lockfile carries sha512 integrity entries and full platform-conditional optionalDependencies coverage for linux/darwin/win32 x64/arm64/musl variants. Updates .gitignore to mirror the existing package-lock.json opt-in pattern: a top-level `bun.lock` ignore with negations for the three surfaces, plus removal of the legacy `bun.lock` entry in studio/frontend/.gitignore. This unlocks `bun install --frozen-lockfile` as the fast path in setup.sh / setup.ps1 / build.sh (follow-up commit).

PR 5479 dropped Bun support because no bun.lock was committed. With the bun.lock files now committed for all three Studio install surfaces (previous commit), Bun --frozen-lockfile becomes a viable fast path again -- ~5-10x faster than npm ci in practice (measured locally: frontend 1.4s vs 8s, oxc 35ms vs 371ms, studio 14ms vs 444ms). Install logic per surface (studio/setup.sh, studio/setup.ps1, build.sh): 1. If bun.lock exists AND `bun` is on PATH: try `bun install --frozen-lockfile` verify critical binaries are present (tsc + vite for frontend, oxc-parser for oxc validator) -- workaround for the known bun-cache-corruption bug where install can exit 0 but leave binaries missing on validation failure: rm -rf node_modules + clear bun cache, fall through to npm ci 2. Else (no bun.lock, no bun, or bun failed): `npm ci` against the committed package-lock.json Both paths run lockfile-strict, so the install is byte-reproducible from whichever lockfile the chosen package manager understands. The build always runs through Node (`npm run build`) -- avoids bun runtime quirks on some platforms. Bun auto-install via `npm install -g bun` is NOT restored: a user who wants the speed-up installs Bun themselves, and the npm ci path remains the default-available install route. This matches the upstream Studio docs' install instructions.

Adds audit_bun_lockfile() that parses Bun's text lockfile format (bun.lock, lockfileVersion 1) and applies the same supply-chain checks already used on npm and cargo lockfiles: - non-registry-resolved-url (git+/file:/tarball sources) - missing-integrity-hash (no sha-prefixed tail on registry entry) - blocked-known-malicious (BLOCKED_NPM_VERSIONS hit) - known-ioc-string (IOC substring in raw body) - missing-lockfile (path doesn't exist) - unreadable-lockfile (chmod 000 / OSError) - malformed-lockfile (JSONC parse failure) - unsupported-lockfile-version (anything != 1) bun.lock is JSONC (valid JSON with trailing commas allowed); the parser strips trailing commas via a single regex before json.loads. Each package entry is a 4-element array: ["name@version", "<registry-url-or-empty>", {metadata}, "sha512-..."] Default scope adds the three bun.lock paths committed alongside the three package-lock.json paths. New repeatable --bun-lockfile flag mirrors --npm-lockfile and --cargo-lockfile. Verified locally: positive (3 npm + 3 bun + 1 cargo, real PR lockfiles): exit 0 unreadable bun.lock (chmod 000): exit 1, [unreadable-lockfile] malformed bun.lock: exit 1, [malformed-lockfile] missing bun.lock: exit 1, [missing-lockfile]

for more information, see https://pre-commit.ci

danielhanchen and others added 14 commits May 16, 2026 05:34

ci: tighten install-path comments

75a8129

Collapse the worked-example narrative to one-line WHYs. Code is unchanged.

Merge remote-tracking branch 'origin/main' into

d43b0e3

Merge branch 'main' into ci/frozen-lockfile-installs

c244a9d

Merge branch 'main' into ci/frozen-lockfile-installs

888347d

danielhanchen force-pushed the pr5479-validate branch 2 times, most recently from 1575d87 to 45d7761 Compare May 18, 2026 12:19

gemini-code-assist Bot reviewed May 18, 2026

View reviewed changes

danielhanchen force-pushed the pr5479-validate branch from 45d7761 to 3483891 Compare May 18, 2026 13:28

danielhanchen force-pushed the pr5479-validate branch from 3483891 to 33187ec Compare May 19, 2026 02:50

danielhanchen added 4 commits May 19, 2026 02:50

Merge remote-tracking branch 'origin/main' into pr5479-fix-audit-unre…

35a30f1

…adable-lockfile

danielhanchen force-pushed the pr5479-validate branch from 33187ec to 087ce93 Compare May 19, 2026 04:08

pre-commit-ci Bot and others added 3 commits May 19, 2026 04:17

[pre-commit.ci] auto fixes from pre-commit.com hooks

ca0d2fd

for more information, see https://pre-commit.ci

bun: pin version + auto-install + harden cache-corruption recovery

2f7cc2d

pr5479-r5: bun pin + auto-install + recovery validation

89462d3

danielhanchen force-pushed the pr5479-validate branch from 087ce93 to 89462d3 Compare May 19, 2026 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR5479 cross-platform install validation (throwaway)#128

PR5479 cross-platform install validation (throwaway)#128
danielhanchen wants to merge 22 commits into
mainfrom
pr5479-validate

danielhanchen commented May 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

gemini-code-assist Bot May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

danielhanchen commented May 18, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants