Skip to content

ci: deterministic check for studio/frontend dep removals#5478

Merged
danielhanchen merged 14 commits into
mainfrom
studio/dep-removal-safety-check
May 16, 2026
Merged

ci: deterministic check for studio/frontend dep removals#5478
danielhanchen merged 14 commits into
mainfrom
studio/dep-removal-safety-check

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

  • Adds a deterministic CI gate that catches accidental breakage when dependencies are removed from studio/frontend/package.json.
  • The gate diffs the new manifest against the PR base, finds every package no longer declared, and confirms each one is either unused everywhere in the repo or still resolvable from the new dependency tree.
  • Backed by a 64-case test suite that covers common removals, adversarial / sneaky removal patterns, and direct classify-function unit cases.

This was motivated by the recent #5438 (drop unused next dep) and #5477 (drop 11 more unused deps). Both are safe today, but only because every removal was hand-verified. This change turns that verification into a repeatable check.

What the check catches

For every removed package, it greps the entire repo (every .ts/.tsx/.js/.jsx/.mjs/.cjs/.html/.htm/.css/.scss/.sass/.json/.jsonc/.md/.mdx/.py/.rs/.toml/.yml/.yaml/.sh/.ps1/.bat/Dockerfile* outside node_modules, dist, build, target, .git, __pycache__, .venv) for these patterns, then feeds an 8-line window into the classifier so multi-line statements are also caught:

  • import X from "pkg" / import { X } from "pkg/sub" (static imports, single or multi-line)
  • import type { X } from "pkg" (type-only imports)
  • import "pkg" (side-effect imports; CSS @import distinguished via lookbehind)
  • import("pkg") (dynamic imports)
  • export * from "pkg" / export { x } from "pkg" / export type { Foo } from "pkg" (re-exports, single or multi-line)
  • require("pkg") and require.resolve("pkg")
  • @import "pkg" (CSS imports)
  • <script src="...pkg"> and <link href="...pkg"> (HTML)
  • /// <reference types="pkg" /> (TypeScript triple-slash)
  • new URL("pkg/...", import.meta.url) patterns
  • CSS url(.../pkg/...) references
  • Backtick template literals containing the package path
  • JSDoc @import("pkg") references
  • Bare quoted-string occurrences in any JS-like file (catches plugin lists in vite.config, tsconfig paths, eslint plugin arrays, shadcn components.json, biome configs, etc.)

Word-boundary safeguards keep foo from matching foobar, lodash from matching lodash-es, and @radix-ui/react-label from matching @radix-ui/react-label-extra. Package names appearing only in plain-text comments or URL tails do not trip the detector.

How the install-graph reachability works

After collecting source references, the checker resolves whether the package would still install after the removal:

  • BFS from the new package.json's declared deps through the lockfile dep graph.
  • Only top-level node_modules/<name> paths count as importable from src/. Nested copies under another package are noted but flagged as not importable from bare specifiers.
  • Reachability is computed from the new package.json, not the lockfile's <root> map, so a stale lockfile cannot give a false OK-via-transitive verdict.

A pip install playwright reference in a Python file or workflow is filtered out so removing the npm playwright is reported correctly.

Hygiene warnings (non-fatal unless --strict)

  • package-lock.json's <root> dep map disagrees with package.json (lockfile drift, run npm install).
  • @types/<X> is declared but <X> is no longer declared anywhere.
  • An import statement in studio/frontend/src/ references a bare specifier that has no matching dep declared in any field of package.json.

Test suite (tests/studio/test_frontend_dep_removal.py)

Three suites, 64 cases total, all passing.

Suite 1: 24 end-to-end edge cases (subprocess-driven, full pipeline)

ID Behaviour Expected
C1 remove next-themes (2 src imports, no transitive parent) FAIL
C2 remove @xyflow/react (root-only, src imports recipe-studio) FAIL
C3 remove katex (kept top-level by @streamdown/math, mermaid, rehype-katex) PASS
C4 remove clsx (kept by streamdown) PASS
C5 remove react (kept by countless peers) PASS
C6 remove @radix-ui/react-slot (kept by radix-ui + @assistant-ui/react) PASS
C7 remove zustand (top-level via @assistant-ui/react; nested xyflow 4.x irrelevant) PASS
C8 multi-remove with mixed safety FAIL [3 pkgs]
C9 remove @huggingface/hub (5 src imports) FAIL
C10 remove tailwind-merge (kept by streamdown) PASS
C11 remove a name that never appeared in the base PASS (no-op)
C12 move @hugeicons/react from deps to devDeps PASS (not a removal)
C13 remove @huggingface/hub AND @xyflow/react together FAIL [both]
C14 remove dexie (multiple src imports) FAIL
C15 remove motion (20+ src imports) FAIL
C16 remove canvas-confetti (used in confetti.tsx) FAIL
C17 remove recharts (used in chart.tsx) FAIL
C18 remove js-yaml (kept by @eslint/eslintrc; @types/js-yaml orphan warning) PASS
C19 remove node-forge (used in providers-api.ts) FAIL
C20 remove @tauri-apps/api (kept by all 5 @tauri-apps plugins) PASS
C21 remove mammoth FAIL
C22 remove unpdf FAIL
C23 remove remark-gfm (kept by streamdown) PASS
C24 remove date-fns (kept by react-day-picker and @base-ui/react) PASS

Suite 2: 28 classify() unit cases (direct function call against hand-crafted snippets)

Covers each pattern shape: static import, side-effect import, dynamic import, require, css_import, html_script, html_link, re_export (4 variants: *, named, type, multi-line), template_literal, new_url, tsc_triple_slash, jsdoc_import, string_literal. False-positive guards: substring collision (foo vs foobar), exact-prefix collision (lodash vs lodash-es), scoped-pkg collision (@radix-ui/react-label vs ...-extra), plain-text comments, URL tails, Python files, markdown links.

Suite 3: 12 adversarial end-to-end cases (write synthetic files, run full script, clean up)

Each case drops a synthetic file under studio/frontend/src/__dep_check_adversarial__/, runs the script as a subprocess against a base that adds a synthetic package, and asserts the outcome. Then cleans up.

ID Sneaky shape Expected
A01 multi-line import { ... } from "removed_pkg" FAIL
A02 export * from "removed_pkg" FAIL
A03 export { foo } from "removed_pkg" FAIL
A04 export type { Foo } from "removed_pkg" FAIL
A05 similar-prefix collision (pkg_e vs pkg_e_extra) PASS (no false positive)
A06 dynamic import FAIL
A07 new URL("pkg/...") FAIL
A08 string-concat dynamic import (genuinely unanalyzable) PASS (acknowledged limit)
A09 usage only inside a JS comment FAIL (conservative; err on safe side)
A10 Python file mentions pkg name PASS (Python cannot import npm)
A11 markdown doc mentions pkg PASS
A12 JSDoc @import("removed_pkg") FAIL

CI integration

studio-frontend-ci.yml gets one new step on pull_request events:

- name: Dependency removal safety check
  if: github.event_name == 'pull_request'
  working-directory: ${{ github.workspace }}
  run: |
    python3 scripts/check_frontend_dep_removal.py --base "origin/${{ github.base_ref }}"
    python3 tests/studio/test_frontend_dep_removal.py

The script reads package-lock.json directly, so it completes in well under a second and runs before npm ci. The full test suite (64 cases) runs end-to-end in a few seconds.

Manual usage

python3 scripts/check_frontend_dep_removal.py
python3 scripts/check_frontend_dep_removal.py --base HEAD~1
python3 scripts/check_frontend_dep_removal.py --base origin/main --strict
python3 scripts/check_frontend_dep_removal.py \
  --base-pkg /tmp/old_package.json --head-pkg studio/frontend/package.json

Exit codes: 0 clean, 1 one or more removed deps still referenced and not reachable, 2 invocation error.

Test plan

  • Script run from a clean branch (no removals) returns [OK] no dependencies removed.
  • Script run against studio/frontend: drop unused dependencies, move type pkg to devDeps #5477's branch reports all 11 removals as OK / OK-via-transitive with reachability paths.
  • All 24 edge cases pass.
  • All 28 classify() unit cases pass.
  • All 12 adversarial end-to-end cases pass.
  • End-to-end FAIL trace for next-themes returns exit 1 with file:line citations of every offending import.
  • CI: studio-frontend-ci runs the new step on this PR (and reports no findings since this PR removes nothing).

Adds a CI gate that catches the common foot-gun: a dep dropped from
studio/frontend/package.json that something in src/ still imports.

scripts/check_frontend_dep_removal.py
  Diffs package.json against a git base ref, collects every package
  no longer declared, and for each one:
    1. Greps the entire repo for any usage pattern (static / dynamic /
       side-effect imports, require, CSS @import, HTML script/link
       src, new URL(), triple-slash references, template literals,
       bare quoted strings in JS-like files).
    2. Resolves whether the package would still install by BFS'ing
       the dep graph in the new lockfile starting from the new
       package.json's declared deps (so a stale lockfile does not
       give false OK-via-transitive results).
    3. Distinguishes top-level node_modules/<name> from nested copies
       under other packages. Bare src/ imports only resolve to the
       top-level path.
    4. Pip-installed playwright references are filtered, so removing
       the npm playwright (CI uses the pip one) is reported correctly.

  Additional hygiene checks (warnings, fail with --strict):
    - lockfile <root> dep map matches package.json (catches drift).
    - @types/X is not orphaned when X is no longer declared.
    - No src/ import points at a package not declared in any field.

tests/studio/test_frontend_dep_removal.py
  24 deterministic cases. Each patches a copy of the head
  package.json, runs the script, and asserts (exit status,
  reported FAIL list). Covers:
    - Genuinely-breaking removals: next-themes, @xyflow/react,
      @huggingface/hub, dexie, motion, canvas-confetti, recharts,
      node-forge, mammoth, unpdf.
    - Safe-via-transitive removals: katex, clsx, react,
      @radix-ui/react-slot, zustand, tailwind-merge, remark-gfm,
      date-fns, js-yaml, @tauri-apps/api.
    - Mixed multi-removal failing on the unsafe entries only.
    - Non-existent / not-in-base names (no-op).
    - Move from deps to devDeps (not a removal).

.github/workflows/studio-frontend-ci.yml
  Runs the checker on pull_request events against
  origin/${{ github.base_ref }}, plus the edge-case suite.
classify() now catches sneaky shapes that an earlier line-only scan
would miss:
  - multi-line `import { a, b } from "pkg"` and the same shape for
    `export { ... } from "pkg"` / `export * from "pkg"` /
    `export type ... from "pkg"`.
  - JSDoc `@import("pkg")` references.
  - Word-boundary fix so `foo` no longer matches `foobar` (subpath gate:
    after the package name we require closing quote or `/`).
  - Negative-lookbehind on `(?<!@)\bimport\b` so CSS `@import "X"` is
    classified as css_import, not side_effect_import.

find_usage() now feeds an 8-line window (4 above / 4 below the grep
hit) into classify() so multi-line import statements are picked up
even though the initial grep is line-based.

tests/studio/test_frontend_dep_removal.py now exercises three suites:
  - 24 edge cases: subprocess-driven, full-pipeline.
  - 28 classify() unit cases: direct function call against hand-crafted
    snippets. Covers static / side-effect / dynamic / require /
    css_import / html_script / html_link / re_export (4 variants) /
    template_literal / new_url / tsc_triple_slash / jsdoc_import /
    string_literal, plus false-positive guards (substring collision,
    plain-text comments, URL path tails, Python files, markdown).
  - 12 adversarial cases: write synthetic files under
    studio/frontend/src/__dep_check_adversarial__/, run the full
    script, then clean up. Confirms multi-line imports, re-exports,
    JSDoc @import, new URL, dynamic imports all FAIL when the
    underlying package is removed.

Current total: 64 / 64 cases pass.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 231d4f12cf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

working-directory: ${{ github.workspace }}
run: |
python3 scripts/check_frontend_dep_removal.py \
--base "origin/${{ github.base_ref }}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fetch the PR base before diffing dependencies

On pull_request runs, the checkout at the top of this job uses actions/checkout's default shallow fetch, which only fetches the triggering ref/commit, so origin/${{ github.base_ref }} is not guaranteed to exist in the workspace. In that common PR context, this command makes git show origin/main:studio/frontend/package.json fail inside read_pkg_at(), causing the new safety check to exit 2 and block frontend CI before typecheck/build even run; fetch the base ref/history or use github.event.pull_request.base.sha after fetching it.

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new utility script, scripts/check_frontend_dep_removal.py, and a corresponding test suite to prevent breaking changes when removing npm dependencies from the frontend. The script verifies that removed packages are no longer referenced in the codebase and are not required transitively. Review feedback identifies duplicated logic in the main function regarding lockfile handling, suggests explicitly specifying UTF-8 encoding for file reads to ensure cross-platform consistency, and recommends removing the unused lock_resolvable function.

Comment on lines +440 to +443
for file, lineno, content in rows:
if pkg == "playwright" and PIP_PLAYWRIGHT.search(content):
continue
# Try the single-line classify first.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This block contains duplicated code. The line head_lock = read_pkg_file(Path(args.head_lock)) is executed regardless of the condition. Additionally, the --base-lock argument is defined but its value is never used, which can be misleading.

You can simplify this to a single line. If the --base-lock argument is indeed unnecessary, consider removing it from the argument parser as well.

    head_lock = read_pkg_file(Path(args.head_lock))
References
  1. When a condition or calculated value is used across multiple conditional branches, compute it once and reuse the result to ensure consistency and improve maintainability.

Comment on lines +116 to +119
res = subprocess.run(
cmd,
cwd = cwd or REPO_ROOT,
stdout = subprocess.PIPE,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice to explicitly specify the encoding when reading text files to avoid issues on different platforms. Since package.json and package-lock.json files are expected to be UTF-8, please add encoding="utf-8".

Suggested change
res = subprocess.run(
cmd,
cwd = cwd or REPO_ROOT,
stdout = subprocess.PIPE,
def read_pkg_file(path: Path) -> dict:
if not path.exists():
return {}
return json.loads(path.read_text(encoding="utf-8"))

Comment on lines +129 to +139
if not out.strip():
return {}
return json.loads(out)


def read_pkg_file(path: Path) -> dict:
if not path.exists():
return {}
return json.loads(path.read_text())


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function lock_resolvable appears to be unused. To improve maintainability, it's best to remove dead code.

danielhanchen and others added 2 commits May 16, 2026 05:17
Catches the last common false-negative: removing a package whose
bin is only referenced through `package.json` scripts (e.g. dropping
typescript while `"build": "tsc -b && vite build"` calls tsc).

Cross-checked the patterns Vercel/Next.js, Vite, and TanStack use
in their own manifests; the bin/scripts pairing is the one
consumer-side pattern dep checkers commonly miss.

How it works:
  - Build a bin-to-package map from each lockfile entry's `bin`
    field. The map is global so a stale lockfile still resolves
    bins from packages about to be pruned.
  - Tokenize each script value, splitting on `&&`, `||`, `;`, `|`.
    Strip env-var assignments and `npx / pnpx / yarn / pnpm / bunx`
    prefixes, plus `./node_modules/.bin/` and `node_modules/.bin/`
    path prefixes. Look up the leading token in the bin map.
  - Hits are reported as `script_bin` and feed the same reachability
    gate as source imports. A bin still installed transitively
    (e.g. vite via @vitejs/plugin-react peer) is OK-via-transitive;
    an orphaned bin is FAIL.

Test additions:
  - 5 new edge cases: removing vite, typescript, eslint, @biomejs/biome,
    and (@biomejs/biome + @vitejs/plugin-react) together. Correctly
    flags @biomejs/biome and the combo as FAIL while vite / typescript
    / eslint are kept by peers.
  - 8 new classify() unit cases: TypeScript ambient `declare module`,
    namespace imports, combined default+named, default-as-named,
    re-export default (4 forms), `.then()` dynamic imports without
    await, and TypeScript `import()` in type position.

Current total: 29 edge + 36 classify-unit + 12 adversarial = 77 / 77.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17705256c3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread scripts/check_frontend_dep_removal.py Outdated
Comment on lines +574 to +577
if args.base_lock:
head_lock = read_pkg_file(Path(args.head_lock))
else:
head_lock = read_pkg_file(Path(args.head_lock))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use --base-lock when the override is provided

The --base-lock flag is parsed and documented, but this branch reads args.head_lock in both cases, so providing --base-lock has no effect. That breaks the advertised CLI contract for offline/synthetic comparisons and can silently produce results from the wrong lockfile when users think they are overriding the base lock input.

Useful? React with 👍 / 👎.

Comment on lines +95 to +97
python3 scripts/check_frontend_dep_removal.py \
--base "origin/${{ github.base_ref }}"
python3 tests/studio/test_frontend_dep_removal.py

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include checker files in PR path triggers

This new CI step executes scripts/check_frontend_dep_removal.py and tests/studio/test_frontend_dep_removal.py, but the workflow's pull_request.paths filter only includes studio/frontend/** and the workflow file itself. As a result, PRs that modify either checker file can bypass this job entirely, allowing regressions in the new safety gate to merge without running its own CI.

Useful? React with 👍 / 👎.

danielhanchen and others added 3 commits May 16, 2026 05:25
After surveying package.json patterns in 10+ popular repos (React,
Vue/Svelte/Astro/Next.js, Vite, Storybook, TanStack/Query, Tailwind,
ESLint, TypeScript, Prettier, SvelteKit), several config fields in
package.json itself can reference packages by string. My checker
filtered all of package.json out of the string_literal fallback,
so removing a package that is only referenced from one of these
fields was a false negative.

Now covered (new pkg_json_field kind):
  - overrides / resolutions / pnpm.overrides keys
  - pnpm.patchedDependencies keys
  - peerDependenciesMeta keys
  - prettier: "@my/prettier-config" string
  - eslintConfig.extends (string or array)
  - stylelint.extends / stylelint.plugins
  - babel.presets / babel.plugins
  - jest.preset / jest.setupFiles / jest.transform
  - commitlint.extends
  - renovate.extends
  - remarkConfig.plugins
  - any other tool config field whose strings/keys equal the pkg
    name or `pkg/subpath`

False-positive guards (do not flag string values inside):
  - browserslist (browser queries)
  - keywords (free-form strings)
  - engines / engineStrict / packageManager / volta (version pins)
  - files / directories / publishConfig (paths)
  - workspaces (paths/globs)
  - main / module / browser / types / typings / exports / imports /
    bin / man (author-side fields)
  - scripts (already handled separately via scripts_bin_refs)
  - name / version / description / author / repository / homepage etc.

Test additions: new PkgFieldCase suite with 19 cases covering each
tool config field, subpath references, and the 5 false-positive
guards. Combined with the existing 29 edge / 36 classify / 12
adversarial cases, the suite is 96 / 96.
Adds an opt-in dead-dep enumeration to the existing safety check.
Iterates every package declared in studio/frontend/package.json
(all four dep fields combined) and reports each as one of:

  used               at least one detected reference -- in src/, a
                     config file, package.json scripts (bin), a
                     package.json tool-config field (overrides /
                     prettier / eslintConfig / stylelint / babel /
                     jest / commitlint / renovate / etc.), or
                     tsconfig.compilerOptions.types

  unused             no detected reference anywhere

  type_pkg_kept      @types/X where X is still declared (or X = node,
                     always implicit)

  type_pkg_orphan    @types/X where X is no longer declared --
                     candidate for removal alongside X

Wiring:
  - New CLI flag `--enumerate-dead` (off by default).
  - CI workflow now passes `--enumerate-dead` so the report shows on
    every PR run; the report is informational unless `--strict` is
    also set.
  - With `--strict`, unused / type_pkg_orphan entries fail the run.

Tests:
  - 5 new EnumCase scenarios:
    E01 fake dep with no usage -> reported unused
    E02 fake dep imported by a synthetic src file -> reported used
    E03 fake dep referenced only in overrides -> reported used
    E04 @types/X paired with X (also imported) -> kept
    E05 @types/X without X -> orphan

Running the new flag against the current main reproduces exactly the
11 deps PR #5477 removed, validating the heuristic end to end.

Current total: 29 edge + 36 classify + 12 adversarial + 19 pkg-json
field + 5 enumeration = 101 / 101.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2660f408ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +135 to +136
if not path.exists():
return {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat missing head lockfile as invocation error

read_pkg_file() returns {} when the --head-lock path is missing, but main() never validates that case before proceeding. That makes the checker silently run without lock metadata and can report false PASS results for removals that are only referenced through npm script bins (for example, removing @biomejs/biome with a bad --head-lock path returns 0 even though biome scripts break). Since the CLI contract says missing files should be an invocation error, this should exit 2 instead of continuing with an empty lock graph.

Useful? React with 👍 / 👎.

actions/checkout uses fetch-depth: 1 by default, so when the
dependency removal check ran `git show origin/main:.../package.json`
the ref wasn't available locally and the script exited 2 with
"could not read base package.json at origin/main:...".

Fetch the single base commit before invoking the check so the
git-show lookup resolves. --depth=1 keeps the extra fetch cheap.
Five issues flagged across gemini and codex:

  * --base-lock argparse arg was defined and advertised in the
    docstring, but main() always read args.head_lock in both branches
    -- the flag did nothing. Dropped the dead arg and the misleading
    docstring line; the lockfile-reachability analysis only needs the
    head lockfile.

  * lock_resolvable() was defined but never called. Removed.

  * read_pkg_file() did not specify an encoding for read_text().
    Added encoding="utf-8" for cross-platform stability.

  * read_pkg_file() returned {} when the path did not exist, so a
    bad --head-lock value silently bypassed the reachability checks
    (false PASS for removals that resolve through npm script bins).
    main() now exits 2 with a clear message when the head lockfile
    is missing, matching the existing behavior for the head pkg.

  * studio-frontend-ci.yml pull_request paths filter only matched
    studio/frontend/** and the workflow file, so PRs that modified
    the checker script or its test could skip this job. Added both
    files to the trigger.
@danielhanchen

Copy link
Copy Markdown
Member Author

Addressing the codex + gemini review:

Source Issue Resolution
codex P1 (resolved) base ref fetch missing 340f16cgit fetch --no-tags --depth=1 origin "${{ github.base_ref }}" before the check
codex P2, gemini HIGH --base-lock defined but unused; duplicated head_lock = read_pkg_file(...) bdf6f84 — dropped the dead arg, simplified to one read, updated docstring
gemini MEDIUM missing encoding="utf-8" in read_pkg_file bdf6f84 — added
gemini MEDIUM lock_resolvable is dead code bdf6f84 — removed
codex P2 read_pkg_file returns {} for missing path, silent false PASS bdf6f84 — main() now exits 2 when --head-lock path is missing, matching the existing behavior for --head-pkg
codex P2 pull_request.paths filter doesn't include checker script + test bdf6f84 — added scripts/check_frontend_dep_removal.py and tests/studio/test_frontend_dep_removal.py to the trigger

python3 tests/studio/test_frontend_dep_removal.py still passes 101/101 against the trimmed code path.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bdf6f84844

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

print()

reachable_paths = reachable_from_head(head_pkg, head_lock) if head_lock else set()
bin_to_pkg = build_bin_to_pkg(head_lock) if head_lock else {}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Track script bins from the base lockfile

When a PR removes a package that is used only through an npm script and updates package-lock.json at the same time, building bin_to_pkg from the head lockfile drops the removed package's bin metadata before scripts_bin_refs() runs. For example, removing @biomejs/biome from both package.json and the lockfile leaves biome:check/biome:fix pointing at a missing binary, but this checker reports [OK] @biomejs/biome because the head lock no longer contains the biome -> @biomejs/biome mapping. This lets the new safety gate miss broken npm scripts for properly updated removals.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new script, check_frontend_dep_removal.py, along with a comprehensive test suite to prevent breaking changes when removing npm dependencies. The script identifies removed packages by diffing package.json and verifies they are no longer referenced in the codebase or remain resolvable transitively. Review feedback suggests several improvements to the script's robustness and efficiency, including expanding the import detection regex to cover dynamic imports and require calls, updating the list of Node.js builtins and internal aliases to prevent false positives, and optimizing the repository scanning process to avoid redundant file system traversals.

Comment thread scripts/check_frontend_dep_removal.py Outdated
for f in DEP_FIELDS:
decl.update((head_pkg.get(f) or {}).keys())
# Also: anything tsconfig path-aliases (just '@/...' here) is internal
pattern = r"(?:from|import)\s+['\"]([^'\"./][^'\"]*)['\"]"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex pattern used for the reverse check (finding imports without declarations) only identifies static import and from statements. It misses dynamic import() calls and require() statements, which are correctly handled in the primary removal check. This inconsistency might lead to missing undeclared dependencies that are only used dynamically.

Suggested change
pattern = r"(?:from|import)\s+['\"]([^'\"./][^'\"]*)['\"]"
pattern = r"(?:from|import|require)\s*\(?\s*['\"]([^'\"./][^'\"]*)['\"]"

Comment on lines +622 to +623
continue
if pkg_name in {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The internal alias check only accounts for the exact string "@". Most TypeScript configurations in this project likely use "@/*" for path mapping (as hinted by the comment), resulting in specifiers like "@/components/...". These will currently trigger false positive "missing dependency" warnings because they don't match the exact "@" check.

            # Internal aliases like '@/foo' or starts with builtin names
            if pkg_name == "@" or pkg_name.startswith("@/"):
                continue

Comment on lines +623 to +636
if pkg_name in {
"node:fs",
"node:path",
"fs",
"path",
"url",
"stream",
"crypto",
"buffer",
"util",
"events",
"child_process",
}:
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list of Node.js builtin modules is incomplete. Common modules such as os, http, https, and zlib are missing. Additionally, the check does not account for the node: protocol prefix, which is the modern standard for importing builtins. This can lead to false positive warnings during the hygiene check.

            if pkg_name in {
                "assert", "async_hooks", "buffer", "child_process", "cluster",
                "console", "constants", "crypto", "dgram", "diagnostics_channel",
                "dns", "domain", "events", "fs", "http", "http2", "https",
                "inspector", "module", "net", "os", "path", "perf_hooks",
                "process", "punycode", "querystring", "readline", "repl",
                "stream", "string_decoder", "timers", "tls", "trace_events",
                "tty", "url", "util", "v8", "vm", "wasi", "worker_threads", "zlib",
            } or pkg_name.startswith("node:"):
                continue


failures: list[tuple[str, list[Hit]]] = []
for name in removed:
hits = find_usage(name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Executing a full repository grep for every removed package is inefficient. If multiple dependencies are removed, the entire codebase is scanned repeatedly. This scales poorly, especially when using the --enumerate-dead flag which triggers a scan for every declared dependency (currently ~67). Batching the search into a single grep execution using a combined regex pattern for all target packages would significantly improve performance.

References
  1. To improve efficiency, avoid redundant data iterations. Combine checks and transformations into a single loop and return computed values for callers to reuse.
  2. To improve efficiency, avoid redundant file traversals.

danielhanchen and others added 2 commits May 16, 2026 10:09
Eight P1s and three P2s surfaced across 10 codex reviewers; this
commit addresses all of them.

P1s:

1. Workflow refspec. `git fetch --depth=1 origin <base_ref>` may only
   create FETCH_HEAD in shallow PR checkouts; the checker then dies
   with `fatal: invalid object name 'origin/main'`. Use the explicit
   refspec `<base>:refs/remotes/origin/<base>` so origin/<base> is
   reliably created.

2. `_deps_of()` was counting optional peer dependencies as reachable.
   npm only installs an optional peer when another package declares
   the same dep, so for "is this removed package still in the tree"
   they cannot keep it alive on their own. Skip entries marked
   `optional: true` in `peerDependenciesMeta`.

3. JS-syntactic classifiers (static_import, side_effect_import,
   dynamic_import, require, re_export, jsdoc_import, template_literal,
   tsc_triple_slash, new_url) now gate on file extension. Previously
   only the final string-literal fallback was gated, so a JS-shaped
   string inside a Python fixture or a Markdown code fence triggered
   a false FAIL. Added U37-U40 covering .py / .md / .sh / .yml.

4. HTML `<script src=>` and `<link href=>` patterns now respect a
   package-name boundary so `/node_modules/foo-extra/...` is not
   treated as a usage of `foo`. Added U41-U43.

5. New `find_command_usage()` detects CLI invocations in .sh / .yml
   / .yaml / .ps1 / .bat / Dockerfile* (npx pkg, bunx pkg, pnpm exec
   pkg, yarn dlx pkg, or a bare pkg --flag). Also covers scoped CLI
   packages exposed by their unscoped tail (@biomejs/biome -> biome).

6. `build_bin_to_pkg(head_lock)` was losing the bin -> package map
   for packages the PR correctly removed from the lockfile, so
   `scripts.biome:check` no longer flagged when @biomejs/biome was
   being dropped. Now also read the base lockfile (via `git show` or
   the new `--base-lock` override) and layer its bin map on top for
   any package in the removed set.

7. `--strict` now runs hygiene checks (lockfile sync, @types
   orphans, undeclared imports, dead-deps) on the no-removal path
   too. Previously the early return at "[OK] no dependencies removed"
   skipped them, so `--strict` silently passed on a tree with
   uncommitted lockfile drift or unused deps.

8. Removed `@types/X` packages are now matched against the runtime
   target name `X`: `/// <reference types="X" />`, tsconfig
   compilerOptions.types entries, AND runtime `import "X"` shapes.
   Handles the npm scope encoding (`@types/foo__bar` -> `@foo/bar`).

P2s:

9. CSS `url(...)` now accepts both quoted and unquoted forms (added
   U44-U45). The previous regex required `/{pkg}/` after a slash,
   missing bare-package urls like `url(katex/fonts/x.woff2)`.

10. `find_imports_without_decl()` now covers all static-import
    shapes: `import "pkg"`, `import Foo from "pkg"`,
    `import { Foo } from "pkg"`, `import type { Foo } from "pkg"`,
    `await import("pkg")`, `require("pkg")`.

11. (Same as #8.) Removed `@types/X` is also linked to runtime
    imports of `X`, not just type-only references.

Test suite expanded from 101 to 110 cases; all pass. Real-world
enumerate-dead still flags the same 11 unused packages on
studio/dep-removal-safety-check (matches PR 5477's removal set).
@danielhanchen

Copy link
Copy Markdown
Member Author

Addressing the 10x reviewer run (commit d0a5d5a). All 8 P1s and 3 P2s land in one focused commit; test suite expanded from 101 to 110 cases, all green.

# Pri Finding Fix
1 P1 workflow git fetch --depth=1 origin <base> may only update FETCH_HEAD explicit refspec <base>:refs/remotes/origin/<base>
2 P1 _deps_of() counted optional peers as reachable skip entries marked optional: true in peerDependenciesMeta
3 P1 JS import classifiers ran on Python/Markdown/shell/YAML gated all JS-syntactic patterns on SCRIPT_LIKE_EXT (added U37-U40)
4 P1 HTML <script> / <link> matched /node_modules/foo-extra/ as usage of foo added path-segment boundary (?:/[^'\"#?]*)?(?=['\"#?]) (U41-U43)
5 P1 npx pkg / bunx pkg / pnpm exec pkg in shell/yml/ps1/Dockerfile ignored new find_command_usage(), includes scoped-CLI tail bin names (@biomejs/biome -> biome)
6 P1 build_bin_to_pkg(head_lock) lost bin->pkg map for packages the PR correctly removed layer build_bin_to_pkg(base_lock) on top for removed packages; new --base-lock override flag (re-introduced and now actually wired)
7 P1 --strict skipped hygiene checks on no-removal path hygiene runs unconditionally before the early no-removal return
8 P1 removed @types/X not matched against runtime X find_types_runtime_usage() checks import "X", /// <reference types="X" />, and compilerOptions.types: ["X"], with npm scope decoding (@types/foo__bar -> @foo/bar)
9 P2 CSS url(...) missed unquoted bare-package paths extended regex (U44-U45)
10 P2 find_imports_without_decl() only matched import "pkg", not import Foo from "pkg" new pattern covers all static-import shapes plus require() and await import()
11 P2 (subsumed by #8) same fix

Local verification:

python3 tests/studio/test_frontend_dep_removal.py
  29/29 edge cases pass
  45/45 classify-unit cases pass
  12/12 adversarial cases pass
  19/19 package.json-field cases pass
  5/5 enumeration cases pass

python3 scripts/check_frontend_dep_removal.py --base origin/main --enumerate-dead
  [OK] no dependencies removed
  unused (11): @assistant-ui/react-markdown, @assistant-ui/react-streamdown,
               @langchain/core, @radix-ui/react-{checkbox,label,select,separator},
               @streamdown/cjk, date-fns, playwright, remark-gfm

Real-world enumeration still flags the exact set PR #5477 was originally dropping, so the new gates don't introduce false positives on this repo.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8d1b21c0ed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread scripts/check_frontend_dep_removal.py Outdated
Comment on lines +533 to +537
if first in {"npx", "pnpx", "yarn", "pnpm", "bunx"} and idx + 1 < len(
words
):
idx += 1
first = words[idx]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse pnpm exec and yarn dlx script wrappers correctly

Update the wrapper parsing in scripts_bin_refs() to skip subcommands before resolving the binary name. Right now the code advances only one token for pnpm/yarn, so pnpm exec biome check is interpreted as binary exec and yarn dlx eslint as binary dlx, which never map back to the removed package. This creates a false PASS when a dependency is only referenced via those script forms. (For example, pnpm help exec shows usage pnpm ... exec <command>, so the executable token is after exec, not immediately after pnpm.)

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 16, 2026 11:05
Three blockers from the parallel Opus review batch:

1. scripts_bin_refs ignored every script that began with a wrapper.
   The original "first non-env token wins" heuristic credited
   cross-env / dotenv / dotenvx / env-cmd as the bin, so a script like
   `cross-env CI=1 biome check` left @biomejs/biome looking unused.
   Rewrote into _next_real_bin(), which peels env prefixes, the
   leading package-manager runner (npx / pnpx / bunx / pnpm exec /
   yarn dlx), and the known wrapper bins (with --/-flag-arg handling)
   before returning the real CLI. shlex tokenization preserves quoted
   env values like `FOO="a b"`.

2. enumerate_dep_usage skipped find_command_usage. The non-enumerate
   path already credited deps used only from CI / Dockerfile / shell
   scripts, but `--enumerate-dead` did not, so packages referenced
   only from a workflow were silently listed as dead. Added the same
   call (gated against @types/* to avoid the unscoped-tail false
   positive).

3. classify multi-line window was ±4 lines. Prettier formats long
   named-import lists one identifier per line, so a 20-import block
   pushed the `import` keyword out of the window and the dep dropped
   to the string-literal fallback (or worse, was missed entirely).
   Widened to ±25 -- still bounded enough to keep false-positives
   negligible, wide enough for the realistic Prettier ceiling.

Tests: added 10 _next_real_bin unit cases + 4 scripts_bin_refs
end-to-end cases (W01-W10 + I01-I04) and a 22-identifier multi-line
import adversarial case (A13). Full suite: 125/125.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 418b63b7cd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1139 to +1141
# CLI invocations in shell scripts / workflows / Dockerfiles.
hits.extend(find_command_usage(name))
# @types/X is "used" if X is referenced as a type or as a

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Skip command-bin detection for removed @types packages

The removal loop always calls find_command_usage(name) even when name is an @types/* package, but find_command_usage() derives candidate binaries from the package tail (for @types/npm, it scans for npm). That makes unrelated workflow/shell prose or commands containing that word count as a usage and can incorrectly fail safe removals of type-only deps that are not runtime CLIs. In practice this produces false FAIL results for synthetic removals like @types/npm, so dependency cleanup PRs can be blocked even when no TypeScript/runtime reference exists.

Useful? React with 👍 / 👎.

@danielhanchen

Copy link
Copy Markdown
Member Author

Addressing the 4x Opus reviewer batch (commit 6b70e72, pre-commit.ci formatter pass on top in 418b63b).

Per the earlier direction, P1s + P2s only.

# Pri Finding Fix
1 P2 scripts_bin_refs only looked at the first non-env token. A script like cross-env CI=1 biome check credited cross-env instead of biome, so a removed @biomejs/biome looked unused. dotenv / dotenvx / env-cmd had the same hole, and quoted env values (FOO="a b" biome) broke the naive split. Rewrote as _next_real_bin(): peels env-prefix tokens (env-prefix run), the leading package-manager runner (npx / pnpx / bunx / pnpm exec / yarn dlx), and the known wrapper bins (with -- and -flag <arg> handling). shlex.split preserves quoted env values.
2 P2 enumerate_dep_usage did not call find_command_usage, while the non-enumerate path did. A dep referenced only from a workflow / Dockerfile / shell script was silently listed as dead under --enumerate-dead. Mirrored the call (gated against @types/* to avoid the unscoped-tail false positive: a removed @types/foo should not match invocations of foo).
3 P2 Multi-line classify window was ±4 lines. Prettier formats a 20+ named-import block one identifier per line, so import and from "pkg" were >4 lines apart and the dep dropped to the string-literal fallback or was missed. Widened to ±25. Still bounded enough that false-positives are negligible, wide enough for the realistic Prettier ceiling.

Tests: 10 _next_real_bin unit cases + 4 scripts_bin_refs end-to-end cases (W01-W10 + I01-I04) and a 22-identifier multi-line import adversarial case (A13). Full local suite: 125/125.

The three P3 comment-accuracy items about Bun lockfile migration wording on #5479 were left for a follow-up — the defensible policy ("skip Bun when no bun.lock is committed so a release install never silently generates one") is correct even if local Bun 1.3.11 does migrate. That work merged into the bun-removal commit on #5479 (the entire branch is now gone, so the wording becomes moot).

@danielhanchen danielhanchen merged commit 44989ea into main May 16, 2026
23 checks passed
@danielhanchen danielhanchen deleted the studio/dep-removal-safety-check branch May 16, 2026 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant