Skip to content

refactor(ci): unify release pipeline under publish.yml (closes #1609)#1610

Merged
magyargergo merged 14 commits into
mainfrom
ci/unify-publish-workflow
May 16, 2026
Merged

refactor(ci): unify release pipeline under publish.yml (closes #1609)#1610
magyargergo merged 14 commits into
mainfrom
ci/unify-publish-workflow

Conversation

@magyargergo

@magyargergo magyargergo commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Summary

Collapse release-candidate.yml into publish.yml so there is exactly one workflow that publishes gitnexus to npm, creates GitHub Releases, and triggers Docker builds — for both release candidates and stable releases.

Closes #1609 architecturally: a single publisher cannot race itself. Two-workflow design is gone.

Why draft

This is a high-risk release-infrastructure change. The first real RC after merge is the live-fire test for steps the pre-merge dry_run rehearsal cannot exercise (App-token mint, atomic tag push, real npm publish, GitHub Release creation, Docker invocation). Merge only after the checklist below is green.

Commits on this branch

  1. 36414e0d — initial unification (route → rc-guard → ci → publish → docker; trigger filter; vtag integrity gate; per-job permissions)
  2. 919acab7 — zizmor findings fixed (persist-credentials, inline auth, env-passthrough, explicit docker secrets)
  3. 34f6f0a9 — npm Trusted Publishing + GitHub App tokens + dropped secrets: inherit on ci.yml
  4. 446b9ffd — RC release commit authored as the App (<slug>[bot]), not github-actions[bot]
  5. 820cefae — multi-agent code-review findings (see ledger below)

Architecture in one paragraph

publish.yml listens on push: main (RC mode), push: tags: ['v*', '!v*-rc.*'] (stable mode — negative glob prevents self-trigger), and workflow_dispatch (RC, main-only, with dry_run rehearsal input). A first-stage route job classifies the event into rc / stable and fails closed on malformed/unrecognised shapes. RC path: rc-guard (dedup marker + release-PR skip) → ci.yml → publish (App token mint, separate checkout per mode, version resolve, atomic tag push, vtag integrity gate, npm publish --tag rc via OIDC, GitHub prerelease, if: failure() cleanup, docker.yml). Stable path: same up to ci.yml, then verify version vs package.json, npm publish --tag latest via OIDC, stable GitHub Release (no docker — RC-only by R6).

What's in the workflow

Decision What Why
Self-trigger prevention tags: ['v*', '!v*-rc.*'] negative glob Without it, every RC publish double-fires (the #1609 bug). Header comment names the invariant; if a future prerelease channel is added, the glob list MUST be extended in lock-step.
Per-mode auth Two distinct actions/checkout steps gated by route.mode (no conditional token: expression) Conditional empty-string token: is undefined behavior; || github.token silently degrades a missing token. Two-step pattern fails loud at checkout.
vtag integrity gate Regex check fails closed on empty or mode-mismatched vtag before any Release/Docker step Prevents softprops/action-gh-release from creating a Release named main from a github.ref fallback. Now also exercises a synthetic vtag in dry-run to catch regex regressions before live-fire.
Per-job permissions Workflow-level permissions: {} (deny-all) + per-job minimums Each job declares exactly what it needs. id-token: write only on jobs doing OIDC.
workflow_dispatch guardrails Rejected on non-main refs; dry_run rejected on main Defense in depth against retained-input abuse after merge.
npm Trusted Publishing OIDC handshake; no NODE_AUTH_TOKEN Eliminates the NPM_TOKEN secret after first publish. Provenance auto-attaches.
GitHub App token actions/create-github-app-token@v3.2.0; replaces a fine-grained PAT Short-lived (~1h), not user-tied, audit-trail via App installation events.
Bot-identity attribution RC release commit authored as <app-slug>[bot]; user-id resolved via gh api /users/<slug>[bot] with 3-attempt retry Lookup retries handle newly-installed App propagation delay and transient 5xx. Requires Metadata: read on the App.
Partial-failure auto-cleanup if: failure() step in publish deletes the v-tag + rc-marker Eliminates the external-consumer phantom-version ingestion window (Renovate / Dependabot / Releases RSS).
ACTIONS_STEP_DEBUG hardening set +x wraps the auth-header compute Prevents tracing the base64-encoded App token for the one line between compute and ::add-mask::.
dry_run mechanical merge-blocker .github/scripts/check-no-dry-run-on-main.py wired into ci-quality.yml Greps publish.yml for any inputs.dry_run reference and fails CI. Forces the rehearsal-removal contract rather than relying on memory. This check is intentionally failing on this PR until the final cleanup commit lands.
Trusted callee contracts docker.yml's workflow_call.secrets: declares DOCKERHUB_USERNAME + DOCKERHUB_TOKEN explicitly; ci.yml call drops secrets: inherit (chain uses zero secrets — verified by grep) Callee surface is auditable from the caller. Replaces blanket inherit pattern.

Files

  • .github/workflows/publish.yml — full rewrite, then iteratively hardened across the 5 commits above
  • .github/workflows/release-candidate.yml — deleted (commit 1)
  • .github/workflows/docker.ymlworkflow_call.secrets: contract declared
  • .github/workflows/ci.yml — comment update (reusable workflow caller renamed)
  • .github/workflows/ci-quality.yml — wires in the dry_run guard script
  • .github/scripts/check-no-dry-run-on-main.py — new; dependency-free Python merge-blocker
  • .github/zizmor.yml — comment update
  • CONTRIBUTING.md — Releases section rewritten for the unified flow; recovery procedures updated (auto-cleanup behavior, release-PR subject pattern, working gh run rerun recovery, GH-Release-failed recovery)
  • README.md — Docker section caller updates

Pre-merge external setup (MERGE-BLOCKERS)

One-time external configuration. Each must be ticked off explicitly.

npm Trusted Publishing

  • On https://www.npmjs.com/package/gitnexus/accessPublishing accessTrusted PublishersGitHub Actions, register a trusted publisher bound to:
    • Owner: abhigyanpatwari
    • Repository: GitNexus
    • Workflow: publish.yml (the filename, not the workflow display name)
    • Environment: (leave blank)
  • Verify the publisher is listed and active.
  • After the first successful publish via this PR's workflow, delete the NPM_TOKEN repo secret.

GitHub App for tag pushes

  • Create a GitHub App (Settings → Developer settings → GitHub Apps → New GitHub App). Suggested name: gitnexus-release-bot.
  • Permissions: Repository → Contents: Read and write, Repository → Workflows: Read and write, Repository → Metadata: Read (required for the bot-user-id lookup).
  • Install the App on this repository only.
  • Generate and download a private key (PEM).
  • In repo Settings → Secrets and variables → Actions:
    • Add secret RELEASE_APP_ID = the App's numeric ID.
    • Add secret RELEASE_APP_PRIVATE_KEY = the PEM content of the private key.
  • After the first successful RC via this PR's workflow, delete the RELEASE_PUSH_TOKEN PAT and repo secret.

Pre-merge rehearsal checklist

  • actionlint clean. Local run passes; CI confirms.
  • workflow_dispatch with dry_run: true against this branch:
    • mode == 'rc' in route logs
    • rc-guard.should_run == 'true' on a clean SHA
    • App token mint succeeds (requires the App setup above)
    • computed rc_version agrees with manual npm view gitnexus versions --json math
    • vtag integrity gate prints rehearsal: rc regex would accept synthetic vtag ✓
    • no real npm publish, no tag push, no Release, no Docker side effect
  • dry_run cleanup commit lands BEFORE merge. Remove the input and every if: inputs.dry_run != 'true' gate. The check-no-dry-run-on-main.py step in ci-quality.yml is the mechanical gate; CI stays red until this commit lands. Search for DRY_RUN_REMOVE_BEFORE_MERGE in publish.yml to find every site.
  • Branch-protection audit. Repo Settings → Branches → main → required status checks. Verify no check name references the deleted "Release Candidate" workflow or its job names. Update or remove orphaned references.
  • zizmor re-scan clean. PR security tab shows 0 open zizmor alerts on the latest commit.

What dry_run does NOT validate (the live-fire surface)

  • App token used end-to-end against origin (mint succeeds in dry-run; push doesn't happen)
  • Atomic tag-push + marker write
  • npm publish via OIDC trusted publishing (real registry call)
  • softprops/action-gh-release with dynamic tag_name from publish.outputs.vtag
  • docker.yml invocation with explicit DOCKERHUB_* secrets passthrough
  • The !v*-rc.* self-trigger prevention (tag triggers fire against the workflow definition on the default branch — the PR branch's publish.yml is never the active workflow for a tag push)
  • The if: failure() cleanup step (only fires on a real partial failure)

Code-review findings ledger

Round 2 of /ce-code-review ran 10 reviewers in parallel: correctness, testing, maintainability, project-standards, agent-native, learnings-researcher (always-on); security, reliability, adversarial, previous-comments (conditional). 20 distinct findings surfaced; 14 fixed inline (commit 820cefae); 6 deferred to follow-ups.

Fixed inline: release-PR regex case-insensitivity; bot user-id gh api retry + Metadata permission doc; broken docker recovery doc snippet; partial-failure cleanup step; KTD/S-ID dangling references; dry_run mechanical merge-blocker; faithful dry-run pack; vtag-gate synthetic-vtag regex check; App-token TTL invariant comment; npm view stderr-grep harmonization; ACTIONS_STEP_DEBUG hardening; GitHub Release recovery doc; curated npx semver errors; dry-run vtag sentinel.

Deferred to follow-up issues (none block this PR): future prerelease channel CI lint; concurrency on head_sha vs ref; Node-heredoc extraction to .github/scripts/ with vitest coverage; ci.yml secrets future-drift lint.

Rollback

git revert of the unification commit (36414e0) restores release-candidate.yml and the old publish.yml. State on origin/npm from a partial RC publish is mostly auto-cleaned by the if: failure() step now; manual fallback steps for the rare cases auto-cleanup couldn't reach are in CONTRIBUTING.md → Releases. Important: do NOT delete NPM_TOKEN or RELEASE_PUSH_TOKEN until after the first successful publish via the new workflow — they are the revert path.

Collapse release-candidate.yml into publish.yml so there is exactly one
workflow that publishes gitnexus to npm, creates GitHub Releases, and
triggers Docker builds — for both release candidates and stable releases.

Closes the double-publish failure mode from #1609 at the architectural
level: a single publisher cannot race itself. The previous design had
release-candidate.yml push v<X.Y.Z>-rc.<N> tags that re-fired publish.yml's
'v*' trigger, causing npm E403 on every RC.

The unified publish.yml routes between two modes via a first-stage 'route'
job. Key design decisions (see plan for full rationale):

  KTD-1 Self-trigger prevention via negative-glob on the 'tags' filter:
        '- v*' followed by '- !v*-rc.*'. Header comment documents the
        invariant; any new prerelease channel must extend the exclusion
        list in lock-step.
  KTD-4 Two distinct actions/checkout steps gated by route.mode. The RC
        step requires RELEASE_PUSH_TOKEN (fails loud and early if unset);
        the stable step omits 'token:' entirely. Eliminates the empty-
        string-token footgun.
  KTD-5 vtag integrity gate runs after publish and before Release / Docker.
        Fails closed on empty or mode-mismatched vtag — no fallback to
        refs/heads/main as Release name.
  KTD-6 Workflow-level deny-all permissions; each job declares its
        minimum. Annotation-injection sanitization on any logged ref.
  KTD-7 workflow_dispatch rejected on non-main refs. Documented authz
        gap re GitHub Environments deferred.
  KTD-8 secrets: inherit retained for docker.yml with explicit security
        rationale (Docker stays RC-only, surface does not expand).
  KTD-9 ci.yml invocation uses secrets: inherit (matching the prior
        release-candidate.yml contract).
  KTD-10 release-candidate.yml deleted in this same commit. No
         deprecation period — keeping both alive re-introduces the race.

A temporary workflow_dispatch 'dry_run' input is wired into all side-
effect steps for pre-merge rehearsal. It is rejected on refs/heads/main
by the route job; even so, it MUST be removed before final merge — the
PR checklist gates on 'grep -c inputs.dry_run .github/workflows/publish.yml'
returning 0.

Plan: docs/plans/2026-05-15-001-refactor-unify-publish-workflow-plan.md
@vercel

vercel Bot commented May 15, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
gitnexus Ready Ready Preview, Comment May 16, 2026 6:19am

Request Review

Comment thread .github/workflows/publish.yml Fixed
Comment thread .github/workflows/publish.yml Fixed
Comment thread .github/workflows/publish.yml Fixed
Comment thread .github/workflows/publish.yml Fixed
Comment thread .github/workflows/publish.yml Fixed
Five zizmor alerts on PR #1610; four fixed in code, one resolved
structurally by tightening the reusable-workflow contract.

  artipacked (3 instances) — actions/checkout was persisting credentials
    in .git/config by default on three checkout steps. The rc-guard and
    stable-mode checkouts now set `persist-credentials: false` (no pushes
    happen from those paths). The RC checkout also sets
    `persist-credentials: false`; the subsequent atomic tag push now
    supplies auth inline via `http.extraheader` (mirroring the pattern
    already in pr-autofix-apply.yml). The base64-encoded header is
    masked alongside the raw token.

  template-injection — the stable-mode "Set vtag" step interpolated
    `${{ github.ref_name }}` directly into the shell source. Routed
    through `env: REF_NAME` instead, eliminating the template-expansion
    path even though git ref names are constrained by naming rules.

  secrets-inherit — replaced `secrets: inherit` on the docker.yml call
    with an explicit secrets passthrough. docker.yml's workflow_call
    block now declares the two secrets it actually consumes
    (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN); GITHUB_TOKEN remains
    implicit. The callee's secret surface is now auditable from the
    caller without enumeration drift.
…ecrets

Three best-practice upgrades surfaced by the release-pipeline audit.

  npm Trusted Publishing (GA 2025-07-31)
    - Drop `env: NODE_AUTH_TOKEN` (empty string would break OIDC fallback;
      the var must be unset, not blanked).
    - Drop the explicit `--provenance` flag (registry auto-attaches it on
      trusted-publisher publishes).
    - `id-token: write` permission retained for the OIDC exchange.
    - Prerequisite: register the package as a trusted publisher on
      npmjs.com bound to this repo + publish.yml. Once configured, the
      NPM_TOKEN repo secret can be deleted entirely.

  GitHub App token replaces RELEASE_PUSH_TOKEN PAT
    - New `actions/create-github-app-token@v3.2.0` step mints a short-lived
      (~1h) installation token before the RC checkout.
    - Token is consumed by `actions/checkout` (with `persist-credentials:
      false`) and by the inline `http.extraheader` at git-push time.
    - Same fine-grained permission surface (Contents: write + Workflows:
      write), not tied to a user seat, organizationally auditable.
    - Prerequisite: create the GitHub App, install on this repo with the
      required permissions, and set `vars.RELEASE_APP_ID` (numeric ID,
      not sensitive) + `secrets.RELEASE_APP_PRIVATE_KEY` (PEM).

  Drop `secrets: inherit` from the ci.yml call
    - Verified by grep: ci.yml and its entire reusable-workflow chain
      (ci-quality, ci-tests, ci-e2e, ci-scope-parity, ci-report) reference
      zero `secrets.*` values. The inherit was passing through nothing.
    - GITHUB_TOKEN is implicit and remains available.

Refs the audit at PR #1610.
@github-actions

github-actions Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

CI Report

All checks passed

Pipeline Status

Stage Status Details
✅ Typecheck success tsc --noEmit
✅ Tests success unit tests, 3 platforms
✅ E2E success gitnexus-web changes only

Test Results

Tests Passed Failed Skipped Duration
9081 9080 0 1 454s

✅ All 9080 tests passed

1 test(s) skipped — expand for details
  • buildTypeEnv > known limitations (documented skip tests) > Ruby block parameter: users.each { |user| } — closure param inference, different feature

Code Coverage

Tests

Metric Coverage Covered Base Delta Status
Statements 78.32% 28853/36836 N/A% 🟢 ███████████████░░░░░
Branches 66.72% 18299/27423 N/A% 🟢 █████████████░░░░░░░
Functions 83.1% 2882/3468 N/A% 🟢 ████████████████░░░░
Lines 81.6% 26044/31914 N/A% 🟢 ████████████████░░░░

📋 View full run · Generated by CI

…tions[bot]

The detached release commit was being authored as github-actions[bot]
(a leftover from the GITHUB_TOKEN era). Now that the App mints the
token and pushes the tag, the commit should carry the App's identity
so PR / release / blame views attribute the action correctly.

Resolves the bot user-id at runtime via `gh api /users/<slug>[bot]`
since actions/create-github-app-token does not expose the numeric ID
directly. Constructs the canonical
  <id>+<slug>[bot]@users.noreply.github.com
noreply email shape.
Round 2 of /ce-code-review surfaced 20 findings; applying the 14 with
concrete fix shapes. Six items (KTD-1 future-channel lint, concurrency
on head_sha vs ref, extraction of Node heredocs to .github/scripts/ with
vitest, ci.yml secrets-drift lint, three other defense-in-depth items)
are deferred — they need design decisions or non-trivial new files that
should land in follow-up PRs to keep this PR's blast radius bounded.

Correctness / safety:
  • rc-guard release-PR-skip regex is now case-insensitive via
    `shopt -s nocasematch`. `Chore: Release v1.2.3` (IDE auto-cap)
    would have slipped through and re-opened the #1609 failure class
    under unification.
  • Bot user-id resolution (`gh api /users/<slug>[bot]`) now has a
    3-attempt retry with curated error output naming the three real
    failure classes (newly-installed App propagation, missing
    Metadata: read permission, transient 5xx). App-permissions
    comment block now lists Metadata: read explicitly.
  • App-token TTL is now a documented invariant tied to
    `timeout-minutes`. Comment lives on the publish job declaration.
  • New `if: failure()` cleanup step in the publish job auto-deletes
    the v-tag and rc-marker on post-tag-push failure, eliminating the
    external-consumer phantom-version ingestion window. Cleanup uses
    the same App token + inline http.extraheader auth as the original
    push, so the credential never lands on disk.

Hardening:
  • Wrapped the inline `http.extraheader` compute in `set +x` /
    conditional re-enable so ACTIONS_STEP_DEBUG can't trace the
    base64-encoded auth header for the one line between compute and
    `::add-mask::` registration.
  • Harmonized the three `npm view` stderr-grep patterns to one shape
    (`grep -qiE 'E404|not found'`). Prevents divergent error
    classification across the three callers.
  • Wrapped both `npx semver -i` invocations in `semver_bump()` with
    stderr capture and a curated error message naming the kind /
    current. Bare npx errors were opaque on registry failures.

Rehearsal fidelity:
  • `Apply rc version in-CI` now runs in dry-run too — the subsequent
    `Dry-run publish` pack reflects the intended rc version instead of
    the un-bumped working tree.
  • vtag integrity gate's dry-run path now exercises the regex against
    a synthetic vtag built from `steps.rc-version.outputs.rc_version`.
    Previously the gate's core check was never run in any rehearsal —
    a regex regression would only surface on the first live RC.
  • vtag output in dry-run is now a sentinel `DRY_RUN_NO_VTAG`
    instead of empty, preventing future composition traps where a
    `vtag != ''` consumer silently succeeds in rehearsal.

Maintainability:
  • Stripped all KTD-N and S-ID references from inline comments. They
    pointed at a local-only plan and at claude-mem observations that
    aren't in the repo — pure dangling references for future readers.
    Each site now carries adjacent prose that explains the WHY.
  • Removed the plan-doc path from the file header. #1609 stays as
    the durable external pointer.
  • Tagged the dry_run input declaration with a `DRY_RUN_REMOVE_BEFORE_MERGE`
    banner so search-and-remove is mechanical.

Mechanical merge-blocker:
  • Added `.github/scripts/check-no-dry-run-on-main.py` (dependency-
    free Python, matches repo convention). Greps publish.yml for any
    `inputs.dry_run` reference and exits 1 with remediation guidance
    if found.
  • Wired into ci-quality.yml's workflow-convention job. The check
    fires on every PR including this one — CI will stay red until the
    final cleanup commit lands, forcing the rehearsal-removal contract
    rather than relying on maintainer memory.

Docs:
  • CONTRIBUTING.md: documented the exact release-PR subject pattern
    the rc-guard recognizes (case-insensitive, with the `(#NNNN)`
    suffix). Replaced the broken `gh workflow run docker.yml` recovery
    snippet with the actual working `gh run rerun <run-id> --failed`.
    Added a recovery snippet for the "npm published but GitHub Release
    failed" partial state.
User stored the App ID as a repo secret (alongside the private key)
rather than as a variable. Minimum-friction fix is to switch the
workflow expression from `vars.RELEASE_APP_ID` to
`secrets.RELEASE_APP_ID`. App IDs are technically not sensitive but
storing as a secret is harmless and avoids mixing storage classes for
the same App.
@magyargergo magyargergo marked this pull request as ready for review May 15, 2026 12:38
The route classifier rejected workflow_dispatch on any ref other than
refs/heads/main, AND the separate 'Reject dry_run against main' step
rejected dry_run=true on main. Net effect: dry_run could never run
successfully anywhere — exactly the contract gap the agent-native
reviewer flagged as AN-4.

Pre-merge rehearsal MUST run on the PR branch because the workflow file
with the new logic doesn't exist on main yet. Updated Classify so:

  dispatch  ref            dry_run  → result
  -------- -------------- -------  ------
  ok       refs/heads/*    true     → mode=rc (rehearsal, all side-effects skipped)
  ok       refs/heads/main false    → mode=rc (real publish)
  reject   refs/heads/main true     → rejected by 'Reject dry_run against main' step
                                       (defense against retained input post-merge)
  reject   refs/heads/*    false    → rejected by Classify (no real publishes off main)
  reject   non-branch ref  any      → rejected by Classify

The two complementary gates close every direction.
First rehearsal run revealed the gate was blocking itself: the
`workflow_dispatch` rehearsal flow invokes ci.yml via publish.yml,
and the unconditional `Block dry_run from merging to main` step
inside ci-quality.yml failed the workflow_dispatch run before the
publish job could exercise its rehearsal logic.

Scope the gate to the events that actually need it:
  pull_request               → fires (merge-blocker, the whole point)
  push to refs/heads/main    → fires (regression catch if anything slipped through)
  workflow_dispatch          → skipped (rehearsal)
  push to refs/tags/v*       → skipped (stable releases, dry_run already removed)

The check still mechanically enforces removal before merge: any PR
containing inputs.dry_run references will have a red required check.
Rehearsal via workflow_dispatch now proceeds end-to-end through the
publish job's dry-run path.
The v3.2.0 release deprecated `app-id` in favor of `client-id`. The
input accepts both the numeric App ID and the OAuth-style Client ID.
Rehearsal run 25919563064 surfaced the deprecation warning; harmless
today, future versions will remove the alias.
The workflow_dispatch dry_run input and every `inputs.dry_run` reference
served as a pre-merge rehearsal affordance for this unification PR. The
rehearsal completed successfully on run 25919563064 — all dry-run-
exercisable steps validated, all side-effect steps correctly skipped.

This commit removes:

  publish.yml
    • The dry_run workflow_dispatch input + DRY_RUN_REMOVE_BEFORE_MERGE
      marker comment block.
    • Route output passthrough and dry_run-aware branch in Classify.
    • The 'Reject dry_run against main' defense-in-depth step (no
      longer needed once the input is gone).
    • Per-step dry_run gates on Apply rc version, Create and push rc
      tags, Publish to npm, Create GitHub Release, Cleanup pushed tags
      on partial failure, and the docker job.
    • The vtag-integrity-gate's synthetic-rehearsal branch + DRY_RUN
      env var. The gate now runs only its real validation path.

  ci-quality.yml
    • The 'Block dry_run from merging to main' step (the mechanical
      merge-blocker), now obsolete.

  .github/scripts/check-no-dry-run-on-main.py
    • Auxiliary guard script, deleted.

Net: 215 lines removed, 11 added. The workflow now contains only the
load-bearing release logic. The first real RC after merge is the
live-fire test for the steps dry_run could not exercise (atomic tag
push, real npm publish via OIDC, GitHub Release creation, docker.yml
invocation under explicit secrets passthrough, the if: failure()
cleanup step).

Closes the rehearsal contract on this PR.
@magyargergo magyargergo merged commit 83fbd4b into main May 16, 2026
34 checks passed
@magyargergo magyargergo deleted the ci/unify-publish-workflow branch May 16, 2026 06:46
magyargergo added a commit that referenced this pull request May 16, 2026
First live-fire RC publish after #1610 failed at npm publish with E404. The if: failure() cleanup correctly auto-deleted the partial v-tag and rc-marker, but OIDC never engaged. Root cause: two coordinated upstream bugs.

1. actions/setup-node@v6 with registry-url: writes _authToken into the runner .npmrc AND exports NODE_AUTH_TOKEN from its token: input (defaulting to github.token). npm publish sends GITHUB_TOKEN as the bearer and the registry returns 404. OIDC never tried because npm thinks it already has a credential. See actions/setup-node#1440.

2. The Node 22 runner ships with npm 10.9.x. npm Trusted Publishing OIDC support requires npm >= 11.5.1.

Fix: omit registry-url: from the setup-node step (per the consensus workaround in community discussion #176761), and add npm install -g npm@latest before publish. --provenance flag is NOT added; npm auto-attaches provenance under Trusted Publishing.

Sources:
- actions/setup-node#1440
- https://github.com/orgs/community/discussions/176761
- https://docs.npmjs.com/trusted-publishers/
hohaivu pushed a commit to hohaivu/GitNexus that referenced this pull request May 19, 2026
…ri#1610)

Collapse release-candidate.yml into publish.yml so there is exactly one workflow that publishes gitnexus to npm, creates GitHub Releases, and triggers Docker builds — for both release candidates and stable releases. Closes abhigyanpatwari#1609 architecturally.

A first-stage `route` job classifies push-to-main / push-tag / workflow_dispatch into `rc` / `stable` modes and fails closed on malformed shapes. RC path runs rc-guard → ci.yml → publish (mint GitHub App token → checkout with persist-credentials:false → resolve next rc version → atomic v-tag + rc/<SHA> marker push → vtag integrity gate → npm publish via OIDC → GitHub prerelease → if: failure() cleanup) → docker.yml. Stable path verifies package.json matches the tag and publishes to `latest` via OIDC (no docker).

Hardening:

  • Self-trigger prevention via negative-glob `tags: ['v*', '!v*-rc.*']` — the bug class behind abhigyanpatwari#1609 cannot recur.
  • Two distinct actions/checkout steps per mode (no conditional `token:` expression footgun).
  • Workflow-level `permissions: {}` deny-all + per-job grants; `id-token: write` only where OIDC is used.
  • npm Trusted Publishing replaces NPM_TOKEN (delete the secret after the first successful publish).
  • GitHub App installation token (actions/create-github-app-token@v3.2.0) replaces the long-lived RELEASE_PUSH_TOKEN PAT (delete after first successful RC).
  • vtag integrity gate fails closed on empty / mode-mismatched output (prevents Release named `main` from a github.ref fallback).
  • Annotation-injection sanitization on every logged ref.
  • Explicit `secrets:` passthrough on docker.yml (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN); ci.yml no longer inherits anything.
  • `if: failure()` cleanup auto-deletes v-tag + rc-marker on partial failure (eliminates the external-consumer phantom-version ingestion window).
  • ACTIONS_STEP_DEBUG window closed via `set +x` wrap on the inline auth-header compute.
  • Curated retry-loud error handling on `gh api` bot-user-id lookup and `npx semver`.

Pre-merge validation:

  • 10-reviewer multi-agent code-review pass; 14 findings fixed inline (commit 820cefa), 6 deferred to follow-ups.
  • End-to-end dry-run rehearsal via workflow_dispatch (run 25919563064) validated route classification, rc-guard, App token mint, RC checkout, version resolver, vtag synthetic-regex check, and faithful tarball pack at the bumped version.
  • All zizmor findings on the unification commits closed.
  • Branch-protection required checks all green.

Post-merge actions:

  • After the first successful RC, delete the `NPM_TOKEN` and `RELEASE_PUSH_TOKEN` secrets — they are no longer used.
  • The first real RC after merge is the live-fire test for steps dry-run could not exercise (atomic tag push, real npm OIDC handshake, GitHub Release creation, docker.yml under explicit secrets passthrough). The if: failure() cleanup step handles the partial-failure recovery automatically; the Rollback Runbook in CONTRIBUTING.md covers the rare cases auto-cleanup can't reach.
hohaivu pushed a commit to hohaivu/GitNexus that referenced this pull request May 19, 2026
…#1627)

First live-fire RC publish after abhigyanpatwari#1610 failed at npm publish with E404. The if: failure() cleanup correctly auto-deleted the partial v-tag and rc-marker, but OIDC never engaged. Root cause: two coordinated upstream bugs.

1. actions/setup-node@v6 with registry-url: writes _authToken into the runner .npmrc AND exports NODE_AUTH_TOKEN from its token: input (defaulting to github.token). npm publish sends GITHUB_TOKEN as the bearer and the registry returns 404. OIDC never tried because npm thinks it already has a credential. See actions/setup-node#1440.

2. The Node 22 runner ships with npm 10.9.x. npm Trusted Publishing OIDC support requires npm >= 11.5.1.

Fix: omit registry-url: from the setup-node step (per the consensus workaround in community discussion #176761), and add npm install -g npm@latest before publish. --provenance flag is NOT added; npm auto-attaches provenance under Trusted Publishing.

Sources:
- actions/setup-node#1440
- https://github.com/orgs/community/discussions/176761
- https://docs.npmjs.com/trusted-publishers/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: double-publish of RC tag causes npm E403 in publish.yml

2 participants