fix(ci): eliminate dev-release tag-vs-downstream race + CI hygiene audit#1827
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a new pre-commit gate, workflow-tag-lifecycle, implemented via a Python script to prevent race conditions in GitHub Actions workflows where tags are created and subsequently deleted in the same run. It also adds actionlint and zizmor hooks to the pre-commit configuration and updates the project documentation. Feedback focused on improving the robustness of the regular expressions in the new script to better handle multi-line commands and alternative CLI flags.
| _TAG_CREATE_RE = re.compile( | ||
| r"gh\s+api[^\n]*git/refs[^/\w][\s\S]{0,300}?-f\s+ref=[\"']?refs/tags/", | ||
| re.MULTILINE, | ||
| ) |
There was a problem hiding this comment.
The _TAG_CREATE_RE regex is brittle because it uses [^ ]* between gh api and the endpoint. This prevents matching commands split across multiple lines using backslashes (e.g., gh api \ git/refs ...), which is common in workflow files. Additionally, it assumes a specific order of arguments (endpoint before -f ref=). Consider using a more flexible pattern that allows newlines and different argument orders within a reasonable character window.
| _TAG_CREATE_RE = re.compile( | |
| r"gh\s+api[^\n]*git/refs[^/\w][\s\S]{0,300}?-f\s+ref=[\"']?refs/tags/", | |
| re.MULTILINE, | |
| ) | |
| _TAG_CREATE_RE = re.compile( | |
| r"gh\s+api[\s\S]{0,500}?(?:git/refs[^/\w][\s\S]{0,300}?-f\s+ref=[\"']?refs/tags/|-f\s+ref=[\"']?refs/tags/[\s\S]{0,300}?git/refs[^/\w])", | |
| re.MULTILINE, | |
| ) |
| _TAG_DELETE_RE = re.compile( | ||
| r"gh\s+api\s+-X\s+DELETE[^\n]*git/refs/tags/", | ||
| re.MULTILINE, | ||
| ) |
There was a problem hiding this comment.
The _TAG_DELETE_RE regex also suffers from the multi-line limitation due to [^ ]*. It also misses alternative flag names like --method DELETE or cases where the method flag appears after the endpoint. Furthermore, it doesn't account for gh release delete --cleanup-tag, which can also trigger the race condition described in #1818.
| _TAG_DELETE_RE = re.compile( | |
| r"gh\s+api\s+-X\s+DELETE[^\n]*git/refs/tags/", | |
| re.MULTILINE, | |
| ) | |
| _TAG_DELETE_RE = re.compile( | |
| r"gh\s+(?:api[\s\S]{0,200}?(?:-X\s+DELETE|--method\s+DELETE)[\s\S]{0,200}?git/refs/tags/|release\s+delete[\s\S]{0,100}?--cleanup-tag)", | |
| re.MULTILINE, | |
| ) |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/finalize-release.yml (1)
149-158:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSplit error handling for
gh release viewto distinguish transient failures from missing releases.At this point both upstream workflows can already be complete, so a transient auth/rate-limit/API failure takes the same
proceed=falsebranch as a genuine missing draft. The final reporter then leaves the commitpending, and there may be no laterworkflow_runleft to retry publication. GitHub CLI exits non-zero on errors, but distinguishing the failure cause requires parsing stderr.Suggested error split
- if ! IS_DRAFT=$(gh release view "$TAG" --repo "$GITHUB_REPOSITORY" \ - --json isDraft --jq '.isDraft' 2>/dev/null); then - echo "Release $TAG not found -- nothing to publish." - echo "proceed=false" >> "$GITHUB_OUTPUT" - exit 0 - fi + release_err="$(mktemp)" + if IS_DRAFT=$(gh release view "$TAG" --repo "$GITHUB_REPOSITORY" \ + --json isDraft --jq '.isDraft' 2>"$release_err"); then + : + elif grep -qiE 'not found|could not resolve' "$release_err"; then + echo "Release $TAG not found -- nothing to publish." + echo "proceed=false" >> "$GITHUB_OUTPUT" + exit 0 + else + echo "::error::gh release view failed for $TAG" + cat "$release_err" >&2 + exit 1 + fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/finalize-release.yml around lines 149 - 158, Capture both stdout and stderr when running gh release view for TAG in GITHUB_REPOSITORY (e.g. IS_DRAFT=$(gh release view ... 2>err) or similar), then if gh exits non-zero inspect the captured stderr: if it contains a “not found”/404/release-not-found indicator treat it as a missing draft and write "proceed=false" to GITHUB_OUTPUT, otherwise treat it as a transient/auth/API error and fail the step (exit 1) so the workflow can be retried; keep references to IS_DRAFT, gh release view, TAG, GITHUB_REPOSITORY and proceed=false when implementing the checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/dev-release.yml:
- Around line 237-252: The "Verify minted tag survived the run" step currently
only runs on success paths; update its if condition to include always() so the
check runs regardless of prior step failures—i.e., modify the step's if
expression for the step named "Verify minted tag survived the run" to prepend
always() (combined with the existing checks on steps.check-tag.outputs.skip,
steps.version.outputs.skip, and steps.tag-exists.outputs.skip) so the
tag-verification block executes even when the job failed.
In @.pre-commit-config.yaml:
- Around line 226-231: The new pre-commit hook entry with id
"workflow-tag-lifecycle" currently only triggers on changes to workflow YAMLs
(files: ^\.github/workflows/.*\.ya?ml$) which allows changes to the checker or
its config to bypass the gate; update the hook's files pattern to also include
the checker and its configuration (e.g., include
scripts/check_workflow_tag_lifecycle.py and the repo pre-commit config or any
related AST/config paths such as .pre-commit-config.yaml) so edits to
scripts/check_workflow_tag_lifecycle.py or the config will re-run the "entry:
uv" hook and prevent regression. Ensure you reference the existing id
"workflow-tag-lifecycle" when making the change.
In `@docs/reference/claude-reference.md`:
- Around line 101-102: Update the "Dev Release" doc paragraph to reflect the
current release-note format: change the described release title from "Dev build
`#N` toward vX.Y.Z" to use the minted tag variable ($DEV_TAG) and replace the
claim that the full commit body is copied with the current behavior that only
the short SHA and the commit subject are written into the --notes-file;
reference the dev-release.yml workflow, the DEV_TAG variable, the git log -1
usage and the gh release create --notes-file path so the wording matches the
actual implementation.
In `@scripts/check_workflow_tag_lifecycle.py`:
- Around line 77-86: _TAG_DELETE_RE currently only matches single-line "gh api
-X DELETE ...git/refs/tags/..." calls and can be bypassed by backslash-newline
continuations; update the regex used to compile _TAG_DELETE_RE so it tolerates
line continuations and newlines between tokens (e.g., allow backslash+newline or
other whitespace between "gh api -X DELETE" and "git/refs/tags/"), or compile
with a flag/construct that permits matching across lines, ensuring the pattern
still anchors to "gh api", "-X DELETE", and "git/refs/tags/" (reference:
_TAG_DELETE_RE).
---
Outside diff comments:
In @.github/workflows/finalize-release.yml:
- Around line 149-158: Capture both stdout and stderr when running gh release
view for TAG in GITHUB_REPOSITORY (e.g. IS_DRAFT=$(gh release view ... 2>err) or
similar), then if gh exits non-zero inspect the captured stderr: if it contains
a “not found”/404/release-not-found indicator treat it as a missing draft and
write "proceed=false" to GITHUB_OUTPUT, otherwise treat it as a
transient/auth/API error and fail the step (exit 1) so the workflow can be
retried; keep references to IS_DRAFT, gh release view, TAG, GITHUB_REPOSITORY
and proceed=false when implementing the checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5b98f6e6-5bca-4888-b5b3-de3b38c02c88
📒 Files selected for processing (8)
.github/workflows/ci.yml.github/workflows/cli.yml.github/workflows/dev-release.yml.github/workflows/finalize-release.yml.pre-commit-config.yamlCLAUDE.mddocs/reference/claude-reference.mdscripts/check_workflow_tag_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lighthouse Site
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Numeric claims in README.md and public docs (docs/index.md, docs/roadmap/index.md, docs/architecture/decisions.md) about test count, release, Mem0 stars, provider count, subagent count must be sourced from
data/runtime_stats.yamlvia HTML-comment markers<!--RS:NAME-->display value<!--/RS-->Static historical counts and illustrative scale numbers may carry per-line opt-out:
<!-- lint-allow: doc-numeric-macros -- <reason> -->(reason mandatory)Every implementation plan must be presented to the user for accept/deny before coding; be critical at every phase; surface improvements as suggestions, not silent changes; prioritize by dependency order, not priority labels
Files:
docs/reference/claude-reference.md
docs/**/*.{md,d2,mermaid}
📄 CodeRabbit inference engine (CLAUDE.md)
Use fenced code blocks with language tags:
d2for architecture/nested containers,mermaidfor flowcharts/sequence/pipelines; use markdown tables for tabular data; never usetextfences with ASCII box-drawing
Files:
docs/reference/claude-reference.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Opt-in telemetry (off by default); every event property must be in `_ALLOWED_PROPERTIES` keyed by event type; unknown keys raise `PrivacyViolationError` and are dropped; never bypass the scrubber
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Use Git commits with format `<type>: <description>` (feat/fix/refactor/docs/test/chore/perf/ci); enforced by commitizen
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Signed commits required on protected refs (GPG/SSH signed or GitHub App-signed via `synthorg-repo-bot`); see [github-environments.md](docs/reference/github-environments.md#release_bot_app_)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Use branch naming `<type>/<slug>` from main; see `.pre-commit-config.yaml` for pre-commit hooks (ruff, gitleaks, hadolint, no-em-dashes, no-redundant-timeout, check-single-migration-per-pr, check-no-modify-migration, no-release-please-token, workflow-shell-git-commits); eslint-web runs at pre-push only
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Hookify rules: `block-pr-create` (use `/pre-pr-review`), `block-double-push` (5-min throttle when open PR exists), `enforce-parallel-tests` (`-n 8`), `no-cd-prefix`, `no-local-coverage`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Pre-push hooks: mypy + pytest (affected modules) + golangci-lint + go vet + go test (CLI) + eslint-web + conditional `orphan-fixtures`/`setting-to-startup-trace` gates; foundational module changes or conftest edits trigger full runs
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Use GitHub issue queries via `gh issue list`, NOT MCP `list_issues` (unreliable field data)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Merge strategy: squash; PR body becomes squash commit message; trailers (`Release-As`, `Closes `#N``) must be in PR body to land
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Preserve existing `Closes `#NNN`` in PR issue references; never remove unless explicitly asked
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: After finishing an issue: branch (`<type>/<slug>`), commit, push; do NOT auto-create a PR; ALWAYS use `/pre-pr-review` to create PRs (gh pr create is hookify-blocked); trivial/docs-only: `/pre-pr-review quick`; after PR exists, `/aurelio-review-pr` handles external reviewer feedback
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Fix EVERYTHING valid review agents find (including pre-existing issues in surrounding code, suggestions, and adjacent findings); no deferring, no 'out of scope'
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: Any PR establishing or expanding a project-wide convention (error hierarchies, persistence boundary, mock-spec, regional defaults, typed boundary, settings-to-startup wiring, secret-log redaction, API-DTO `extra="forbid"`, no-magic-numbers, no-em-dashes, etc.) MUST include the AST/script gate preventing regression; PRs proposing convention without enforcement are rejected
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:16:26.335Z
Learning: The machine-readable inventory of every MANDATORY paragraph lives in `scripts/convention_gate_map.yaml`; meta-gate `scripts/check_convention_gate_inventory.py` enforces every MANDATORY has either registered gate or explicit `exempt: { reason }` entry; adding new MANDATORY without updating YAML fails pre-push
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
scripts/check_workflow_tag_lifecycle.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1827 +/- ##
=======================================
Coverage 84.76% 84.76%
=======================================
Files 1798 1798
Lines 104306 104306
Branches 9128 9128
=======================================
+ Hits 88417 88418 +1
+ Misses 13675 13673 -2
- Partials 2214 2215 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dev-release.yml: add always() to 'Verify minted tag survived' so the regression guard runs on failure paths where tag loss is most likely. CodeRabbit major. .pre-commit-config.yaml: self-protect workflow-tag-lifecycle hook by also triggering on edits to the checker script and the pre-commit config itself. CodeRabbit major. docs/reference/claude-reference.md: update Dev Release paragraph -- release title is the minted tag (DEV_TAG), and notes-file carries only short SHA + commit subject (no full body). CodeRabbit minor. finalize-release.yml: split gh release view error handling -- 'not found' is a missing draft (proceed=false), every other failure is transient (exit 1) so the next workflow_run retries instead of silently leaving the commit pending. CodeRabbit major outside-diff. scripts/check_workflow_tag_lifecycle.py: broaden CREATE regex to tolerate backslash-newline continuations and reversed argument order (Gemini). Broaden DELETE regex to tolerate continuations, --method DELETE, and gh release delete --cleanup-tag (CodeRabbit + Gemini). Fix _SHELL_COMMENT_RE: backslash-s-star eats preceding blank-line newlines via cross-line greedy match, switch to character-class for space and tab so newlines are preserved. Add per-line opt-out machinery (# lint-allow: workflow-tag-lifecycle -- reason). dev-release.yml: per-line opt-out on the bulk cleanup line with rationale (deletes 5+-revision-old tags whose downstream workflows have completed; not the just-minted tag). tests/unit/scripts/test_check_workflow_tag_lifecycle.py: 18-test suite covering positive matches, negative cases, opt-out machinery, shell-comment scrubber line-number regression, and a repo-level smoke test.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/finalize-release.yml:
- Around line 149-166: The current finalize-release step treats a transient "gh
release view" 404 as a permanent missing draft and sets proceed=false (using
variables release_err and IS_DRAFT with gh release view "$TAG"), which can leave
finalization stuck when upstream workflows already succeeded; change this to
perform bounded polling/retries: call gh release view "$TAG" in a loop with a
short sleep and a limited retry count, checking release_err / IS_DRAFT each
attempt, only treat "not found" as a skip after retries are exhausted, and if
still failing due to transient API errors after the retry limit, fail the job
(non-zero exit) so the run is visible and can be retried instead of silently
writing proceed=false.
In @.pre-commit-config.yaml:
- Around line 226-231: The pre-commit hook "workflow-tag-lifecycle" currently
relies on the default pass_filenames behavior which allows checker/config-only
edits to bypass the workflow scan; update the hook configuration for id
"workflow-tag-lifecycle" to include pass_filenames: false so the hook runs
against the full repository (or alternatively modify
scripts/check_workflow_tag_lifecycle.py to detect an empty/only-non-workflow
filename list and fall back to scanning all .github/workflows/*.yml/.yaml
files), ensuring the checker cannot be skipped by edits that only touch the
checker or .pre-commit-config.yaml.
In `@scripts/check_workflow_tag_lifecycle.py`:
- Around line 78-85: The _TAG_CREATE_RE currently only matches lowercase "-f"
flag; update its pattern to accept both "-f" and "-F" by replacing occurrences
of "-f" in the alternation with "-[fF]" (so both branches that contain
"-f\s+ref=..." become "-[fF]\s+ref=...") and run/add unit tests that include
typed-field forms like "gh api -F ref=refs/tags/v1 -F sha=... .../git/refs" to
ensure the regex now catches the typed-field form; the symbol to edit is
_TAG_CREATE_RE and add test cases covering "-F ref=refs/tags/..." usage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: f0b3ca24-f28d-417e-b576-4683873fc7d5
📒 Files selected for processing (6)
.github/workflows/dev-release.yml.github/workflows/finalize-release.yml.pre-commit-config.yamldocs/reference/claude-reference.mdscripts/check_workflow_tag_lifecycle.pytests/unit/scripts/test_check_workflow_tag_lifecycle.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Site
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
.pre-commit-config.yaml
📄 CodeRabbit inference engine (CLAUDE.md)
Pre-commit hooks configured in
.pre-commit-config.yaml: ruff, gitleaks, hadolint, no-em-dashes, no-redundant-timeout, check-single-migration-per-pr, check-no-modify-migration (bypassSYNTHORG_MIGRATION_SQUASH=1), no-release-please-token, workflow-shell-git-commits;eslint-webruns at pre-push onlyWire each new gate into
.pre-commit-config.yaml(pre-commit or pre-push stage) with# lint-allow: <gate-name> -- <reason>per-line opt-outs
Files:
.pre-commit-config.yaml
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use fenced code blocks with
d2language tag for architecture/nested container diagrams in documentationUse
mermaidlanguage tag for flowcharts, sequence diagrams, and pipeline diagrams in documentationUse markdown tables for tabular data in documentation; never use
textfences with ASCII box-drawingD2 diagrams in documentation must use theme 200 (Dark Mauve) with dark-only configuration as specified in
mkdocs.yml
Files:
docs/reference/claude-reference.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use
from __future__ import annotations(Python 3.14 has native PEP 649 lazy annotations)Use PEP 758 except syntax:
except A, B:(no parens) when not binding;as excrequires parens
Files:
tests/unit/scripts/test_check_workflow_tag_lifecycle.pyscripts/check_workflow_tag_lifecycle.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every test must declare
spec=ConcreteClassonMock()/AsyncMock()/MagicMock()(enforced by gate); pre-existing sites frozen in baseline; regenerate via--updateflagUse
mock_dispatcherfromtests/conftest.py(AsyncMock(spec=NotificationDispatcher)) for shared mocks in testsTime-driven tests: import
FakeClockfromtests._shared.fake_clock; inject viaclock=parameter;FakeClock.sleepadvances virtual time and yields once viaasyncio.sleep(0)Test markers:
@pytest.mark.unit/integration/e2e/slow; run with-n 8parallelism always using--dist=loadfileGlobal test timeout: 30s per test in
pyproject.toml; non-default liketimeout(60)is allowed per-testNever
monkeypatch.setattr(module.logger, "info", spy)(stale bound method caching); usetry/finally del proxy.<level>instead; see_logger_info_spyin test examplesParametrize similar test cases via
@pytest.mark.parametrizeProperty-based testing via Hypothesis (Python) with 10 deterministic examples in CI (
derandomize=True); Hypothesis failures are real bugs—fix the bug and add@example(...)decoratorNever skip/xfail/dismiss flaky tests; fix fundamentally. FakeClock-first when the class accepts
clock=. For 'block until cancelled', useasyncio.Event().wait()notasyncio.sleep(large)
Files:
tests/unit/scripts/test_check_workflow_tag_lifecycle.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/scripts/test_check_workflow_tag_lifecycle.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Read the relevant `docs/design/` page before implementing or planning; deviations require explicit user approval and design page updates
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding; surface improvements as suggestions, not silent changes; prioritize by dependency order, not priority labels
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: CI generates and injects runtime stats from `data/runtime_stats.yaml` BEFORE `zensical build`; regenerate via `scripts/generate_runtime_stats.py` and `scripts/inject_runtime_stats.py`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Minimum 80% code coverage required in CI (benchmarks excluded); coverage enforcement in pytest runs
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Commit messages follow `<type>: <description>` format (feat/fix/refactor/docs/test/chore/perf/ci); enforced by commitizen
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Signed commits required on every protected ref: GPG/SSH signed OR GitHub App-signed via `synthorg-repo-bot` (Git Data API with installation token)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Branch names: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Hook rules in `.claude/hookify.*.md`: `block-pr-create` (use `/pre-pr-review`), `block-double-push` (5-min throttle; opt-out via `.claude/state/allow-double-push.flag`), `enforce-parallel-tests` (`-n 8`), `no-cd-prefix`, `no-local-coverage`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Pre-push hooks: mypy + pytest (affected) + golangci-lint + go vet + go test (CLI) + eslint-web + orphan-fixtures (opt-in `SYNTHORG_CHECK_ORPHAN_FIXTURES=1`) + setting-to-startup-trace (conditional); foundational module changes or conftest edits trigger full runs
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: GitHub issue queries: use `gh issue list` via Bash, NOT MCP `list_issues` (unreliable field data)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: PR merge strategy: squash; PR body becomes squash commit message; trailers (`Release-As`, `Closes `#N``) must be in PR body to land
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Preserve existing `Closes `#NNN`` in PR issue references; never remove unless explicitly asked
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: After finishing an issue: branch (`<type>/<slug>`), commit, push; do NOT auto-create a PR
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Always use `/pre-pr-review` to create PRs (`gh pr create` is hookify-blocked); trivial/docs-only: `/pre-pr-review quick`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: After PR exists, `/aurelio-review-pr` handles external reviewer feedback; fix EVERYTHING valid review agents find, including pre-existing issues in surrounding code
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Any PR establishing or expanding a project-wide convention MUST include the AST/script gate preventing regression; PRs proposing conventions without enforcement are rejected; gate catches the SECOND occurrence; audit finds the FIRST
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: When tests fail due to timeout/slowness/contention: NEVER delete, skip, or `xfail`; NEVER `--no-verify`; NEVER edit `tests/baselines/unit_timing.json`; fix source code instead
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-08T22:59:37.235Z
Learning: Isolation regression gate: `scripts/run_affected_tests.py` re-runs affected subset under `pytest-repeat --count 2` after green pass; real failures block gate; scattered native crashes are advisory; opt-out via `SYNTHORG_SKIP_ISOLATION_GATE=1`
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.
Applied to files:
tests/unit/scripts/test_check_workflow_tag_lifecycle.pyscripts/check_workflow_tag_lifecycle.py
finalize-release.yml: replace single-shot 'not found' skip with bounded polling. Both upstream Docker + CLI workflows are already complete by this point so a transient release-page 404 (visibility lag for a brand-new release) needs to be retried before being treated as a legitimate skip; without retries, the run silently writes proceed=false and there is no later workflow_run event to retry on, leaving finalize-release pending. 6 attempts with linear backoff (5s + 10s + 15s + 20s + 25s + 30s = 105s budget) before giving up. Non-404 transient errors still fail-fast non-zero so the run is visible. .pre-commit-config.yaml: add pass_filenames: false to the workflow-tag-lifecycle hook AND pass --scan-all explicitly so an edit limited to the checker script or to .pre-commit-config.yaml itself can no longer bypass the gate. Without this, the default pass_filenames=true would invoke the script with only the non-workflow path; the script would skip it on the .yml/.yaml suffix filter and exit 0 with zero workflows scanned. scripts/check_workflow_tag_lifecycle.py: widen _TAG_CREATE_RE to match both -f and -F field flags. gh api -F (typed field) POSTs to the same endpoint as -f (raw string), so a workflow using gh api -F ref=refs/tags/v1 ... git/refs would create a tag identically and bypass the gate. Both alternation branches now use -[fF]. tests: add 2 cases for the typed-field -F form (single-line + reversed argument order). Total now 20 tests.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Improved error logging and Prometheus instrumentation provide better system monitoring. - Eliminated race conditions in CI tagging for more reliable development releases. - Fixed critical configuration access and kill-switch bugs to enhance system stability. - Enhanced client experience with retry-after headers and better websocket reconnect behavior. ### What's new - Introduced composite indexes and cursor pagination for faster data queries. - Added server-sent events rate limiting and Ollama input sanitization for improved security. ### Under the hood - Centralized workflow error mappings to standardize error handling. - Refactored API lifecycle fallback to use a configuration snapshot for consistency. - Tightened startup settings baseline and reduced controller error baseline to zero. - Replaced flaky contributor-assistant GitHub action with a custom stable step. - Consolidated Renovate dependency groups to avoid update conflicts. - Upgraded in-toto-golang dependency to fix security vulnerabilities and dropped unnecessary CVE waivers. - Extensive lock file maintenance and multiple infrastructure and Python dependency updates. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.8.2](v0.8.1...v0.8.2) (2026-05-10) ### Features * close audit gaps in error logging and Prometheus instrumentation ([#1821](#1821)) ([ef00fdc](ef00fdc)) ### Bug Fixes * **ci:** eliminate dev-release tag-vs-downstream race + CI hygiene audit ([#1827](#1827)) ([b7b9a59](b7b9a59)) * **config:** close 6 settings reachability + kill-switch gaps ([#1798](#1798)) ([410cb3b](410cb3b)) * correctness / safety fixes from 2026-05-05 audit (Wave 28) ([#1823](#1823)) ([d01e624](d01e624)) ### Performance * composite indexes + cursor pagination + clock seam + SSE rate-limit + Ollama sanitization + retry-after web client + WS reconnect jitter ([#1822](#1822)) ([d1faf86](d1faf86)) ### Refactoring * **api:** move activities lifecycle-cap fallback to ApiBridgeConfig snapshot ([#1840](#1840)) ([7a56e9c](7a56e9c)) * centralise workflow error mapping and shared error codes ([#1778](#1778) sub-tasks A + E) ([#1843](#1843)) ([11132cd](11132cd)) * drive controller-error baseline to zero ([#1778](#1778) sub-task A tail) ([#1846](#1846)) ([e96ae20](e96ae20)) * slim CLAUDE.md, port pr-review-toolkit agents, sync .opencode parity ([#1833](#1833)) ([e6372b8](e6372b8)) * tighten settings → startup-trace baseline (8 → 0) ([#1847](#1847)) ([3376ee2](3376ee2)) ### Documentation * fix CLAUDE.md inaccuracies and drop drift-prone counts ([#1844](#1844)) ([371925f](371925f)) ### Tests * replace test placeholders with real subsystem wiring ([#1845](#1845)) ([ddbb666](ddbb666)) ### CI/CD * **cla:** replace flaky contributor-assistant action with custom read-path step ([#1819](#1819)) ([11aeafe](11aeafe)) * tidy dev-release notes + stagger renovate lockfile day ([#1824](#1824)) ([ec746a9](ec746a9)) ### Maintenance * cleanup roundup, sub-tasks a/c/d/g/h/j/l/m of [#1781](#1781) ([#1838](#1838)) ([099b871](099b871)) * close remaining 5 sub-tasks of [#1781](#1781) (b/e/f/i/k) ([#1852](#1852)) ([59cf0b2](59cf0b2)) * collapse Renovate dep groups into Python / Web / Infrastructure to remove cross-PR overlap ([#1813](#1813)) ([4cbd857](4cbd857)) * **deps,security:** bump in-toto-golang v0.11.0 + drop two patched CVE waivers ([#1851](#1851)) ([0b8b5bb](0b8b5bb)) * disable Renovate vulnerabilityAlerts so security flows into normal updates ([#1834](#1834)) ([6b7d15f](6b7d15f)) * Lock file maintenance ([#1820](#1820)) ([ccbad73](ccbad73)) * Lock file maintenance ([#1842](#1842)) ([13b68a5](13b68a5)) * Lock file maintenance ([#1853](#1853)) ([db6650b](db6650b)) * Update dhi.io/nats:2.14-debian13 Docker digest to eb768bf ([#1841](#1841)) ([37f84fc](37f84fc)) * Update Infrastructure dependencies ([#1815](#1815)) ([75b12fe](75b12fe)) * Update Infrastructure dependencies ([#1831](#1831)) ([3f3c50b](3f3c50b)) * Update Python dependencies ([#1817](#1817)) ([e11332f](e11332f)) * Update Python dependencies ([#1832](#1832)) ([4515c8e](4515c8e)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
closes #1818
Summary
Fixes the tag-vs-downstream-checkout race in
dev-release.yml(#1818) and bundles five MED-severity CI hygiene findings the audit surfaced into the same PR.What changed
Primary fix (#1818)
dev-release.yml: dropped thegh api -X DELETE git/refs/tags/$DEV_TAGcleanup that fired ongh release createfailure. The tag-create push had already triggered downstreamtags: v*-listening workflows (cli.yml,docker.yml); deleting the tag while theiractions/checkoutstep was in flight is what produced thefatal: couldn't find remote ref refs/tags/v0.8.2-dev.2404 reported in the issue. A failed release-create now exits 1 with the orphan tag preserved; the existing 5-most-recent dev-pre-release sweeper +finalize-release.yml's stable-release sweep garbage-collect it later.dev-release.yml: new end-of-job stepVerify minted tag survived the runre-resolvesrefs/tags/$DEV_TAGand exits 1 with::error::if absent, routing through the existingreport-failurejob into thedev-release regressiontracking issue. Catches any future regression directly instead of debugging from downstream symptoms.scripts/check_workflow_tag_lifecycle.py(new): static pre-push gate enforcing "no tag CREATE + conditional DELETE within a single workflow file". Modelled onscripts/check_workflow_shell_git_commits.py. Wired in.pre-commit-config.yamlasid: workflow-tag-lifecycle. Per the CLAUDE.md "Convention Rollout (MANDATORY)" rule: every new convention ships its enforcement gate.Audit-bundled fixes
finalize-release.yml: cleanup loops (dev-release sweep + orphan-tag sweep) replaced\|\| true-suppressedwhile readshape with explicit-capture-and-check +mapfile+ per-tag warnings + final exit-on-failure. Pre-PR review surfaced thatmapfile -t arr < <(gh api ...)does not propagate the innergh apiexit code; the second-round fix captures the API output to a variable with explicit exit check before mapfile reads from it. Either failure mode (API down, partial deletes) now fails the job loudly.finalize-release.yml: dropped2>/dev/nullongh api commits/$HEAD_SHA/pulls(Highlights PR lookup) so auth / rate-limit / API errors surface in stderr; kept on three legitimate not-found-fallback sites with intent comments. Pre-PR review surfaced that the surroundinggh pr view ... \|\| truewas conflating "PR genuinely deleted" (legit skip) with "auth failure"; that path now splits into capture + classify, with::notice::on not-found and::warning::on real errors.cli.yml: split the fuzz step intoList fuzz targets(nocontinue-on-error, fails loudly ongo test -listcompile errors) +Run fuzz tests(continue-on-errorretained, fuzzer-found crashes remain advisory via the existingReport fuzz outcomewarning). Compile failures in fuzz targets were previously invisible.ci.yml: branch-protection-drift audit kept its job-levelcontinue-on-error: true(intentional during the refactor(release): GitHub App + full release-pipeline rework (signed commits end-to-end, consolidated workflows, safety nets) #1555 stabilization window), but a new always-runningReport drift outcomestep now emits::warning::annotations whenever the audit detected drift. Real drift is visible in run summaries even while the job stays green..pre-commit-config.yaml: wiredactionlint v1.7.12andzizmor v1.24.1as workflow-scoped pre-commit hooks (files: ^\.github/workflows/.*\.ya?ml$). actionlint is NOT in thepre-commit.ci skip:list (no dedicated CI job covers it); zizmor is added to the skip list (already covered byzizmor.yml).Docs
CLAUDE.md: appendedcheck_workflow_tag_lifecycle.pyto the existing gate inventory list under "Convention Rollout (MANDATORY)".docs/reference/claude-reference.md: updated "Dev Release" + "Finalize Release" subsections to reflect orphan-tag preservation, the regression-guard step, themapfile-based cleanup pattern, and the gh-pr-view capture+classify split. Pre-PR review surfaced this as MAJOR doc drift.Audit confirmation
tags: v*push events arecli.yml:9-10anddocker.yml:6(verified viaGrepover.github/workflows/).finalize-release.ymlusesworkflow_run(nottags: v*), so it is not directly affected by the race.gh api ... /git/refsPOST and agh api -X DELETE git/refs/tags/...(verified by runningscripts/check_workflow_tag_lifecycle.py --scan-all; exit 0).Test plan
Workflow YAML changes have a small static-test surface; validation done via:
actionlint v1.7.12— clean on all four changed workflows.zizmor v1.24.1— clean on all four changed workflows.pre-commit run --all-files— all 60+ hooks pass (including the newworkflow-tag-lifecyclegate).CREATE at file:8+DELETE at file:12.mainafter merge will exercise both theVerify minted tag survived the runregression guard and the hardenedfinalize-release.ymlcleanup paths.Review coverage
Pre-PR review ran 8 specialized agents:
a4ce64b99):mapfile -t arr < <(gh api ...)silently swallowed inner-process failures (CRITICAL — same silent-failure class as ci(dev-release): tag created before release object causes downstream CLI/Docker checkout 404 when release-create fails #1818).docs/reference/claude-reference.md(MAJOR).gh pr view ... \|\| trueconflating "not found" with "auth failure" (MEDIUM).Issue #1818 acceptance criteria all RESOLVED (verified by issue-resolution-verifier at 100% confidence on each).
Acceptance criteria mapping
dev-release.yml:189-209+230-252).finalize-release.ymlonly deletes;cli.yml/docker.ymlonly consume tags) + future enforcement viascripts/check_workflow_tag_lifecycle.py.Verify minted tag survived the runstep, routed through existingreport-failuretracking-issue lane).