Round 44 batch 6a of 6: DV-2.0 provenance frontmatter — backfill script + 10 skills#86
Conversation
Phase-1 deliverable of the BACKLOG row "Data Vault 2.0 provenance as
scope-universal indexing substrate — rollout beyond the skill catalog"
(landed a103f08). Lands the mechanical cascade script and applies it
to the first alphabetical batch of 10 skills as dry-run validation.
Script: tools/skill-catalog/backfill_dv2_frontmatter.sh
- Usage: [--dry-run] <SKILL.md path>... | --all
- Idempotent (re-running on compliant file is a no-op)
- Computes record_source from git-log --reverse first-land commit
subject: regex [Rr]ound *([0-9]+) → "skill-creator, round N";
else fallback "git: <author> on <date>"
- load_datetime from first-land commit; last_updated = today;
status defaults to "active" (no inference beyond the honest default,
a stub/dormant skill keeps the default until human review flips it);
bp_rules_cited from grep BP-[0-9]+ → YAML inline list (`[]` if none)
- Injects missing fields before the closing `---` fence using awk
with ENVIRON (awk's -v flag refuses multi-line values)
Batch 1 (10 skills, alphabetical a* prefix):
activity-schema-expert (round 34, [BP-11])
agent-experience-engineer (round 34, [BP-01,03,07,08,11,16])
agent-qol (round 29, [BP-11])
ai-evals-expert (round 34, [BP-11])
ai-jailbreaker (round 34, [BP-11])
ai-researcher (round 34, [])
alerting-expert (round 34, [BP-11])
algebra-owner (git: Aaron Stainback on 2026-04-18, [])
alignment-auditor (round 37, [BP-10, BP-11])
alignment-observability (round 37, [])
Catch-in-tick: initial script draft used `awk -v blob="$(printf ...)"`
which corrupted activity-schema-expert/SKILL.md to 0 bytes on first
application (awk -v does not accept multi-line values). Caught before
any commit; reverted via git checkout and switched to ENVIRON. Re-ran,
verified idempotency on second pass, then scaled to the 9-file batch.
Compliance state after this commit: 12 of 216 SKILL.md files compliant
(the two from prior ticks plus this batch of 10). 204 remaining queued
for batches across future ticks.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds a DV-2.0 provenance-frontmatter backfill script for .claude/skills/**/SKILL.md and uses it to backfill the required provenance fields on the first alphabetical batch of 10 skills, as part of the speculative-branch drain plan.
Changes:
- Introduces
tools/skill-catalog/backfill_dv2_frontmatter.shto inject missing DV-2.0 frontmatter fields from git history (with--dry-runand--allmodes). - Backfills DV-2.0 frontmatter fields (
record_source,load_datetime,last_updated,status,bp_rules_cited) in 10 existing skill files. - Establishes initial dry-run validation coverage for the compliance rollout process.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/skill-catalog/backfill_dv2_frontmatter.sh | New backfill script to compute/inject DV-2.0 provenance fields into SKILL.md frontmatter. |
| .claude/skills/activity-schema-expert/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/agent-experience-engineer/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/agent-qol/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/ai-evals-expert/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/ai-jailbreaker/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/ai-researcher/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/alerting-expert/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/algebra-owner/SKILL.md | Adds DV-2.0 provenance frontmatter fields (including a problematic record_source). |
| .claude/skills/alignment-auditor/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| .claude/skills/alignment-observability/SKILL.md | Adds DV-2.0 provenance frontmatter fields. |
| local file="$1" rules | ||
| rules=$(grep -oE 'BP-[0-9]+' "$file" 2>/dev/null | sort -u | paste -sd, - | sed 's/,/, /g') | ||
| if [[ -z "$rules" ]]; then | ||
| echo "[]" | ||
| else | ||
| echo "[${rules}]" |
There was a problem hiding this comment.
compute_bp_rules uses grep in a pipeline under set -euo pipefail. When a SKILL.md contains no BP-<NN> matches, grep exits 1, which makes the whole pipeline fail and can abort the script instead of producing []. Make this pipeline non-fatal on “no matches” (e.g., treat exit 1 as empty output) so skills without BP citations can be processed safely.
| local file="$1" rules | |
| rules=$(grep -oE 'BP-[0-9]+' "$file" 2>/dev/null | sort -u | paste -sd, - | sed 's/,/, /g') | |
| if [[ -z "$rules" ]]; then | |
| echo "[]" | |
| else | |
| echo "[${rules}]" | |
| local file="$1" matches rules grep_status | |
| if matches=$(grep -oE 'BP-[0-9]+' "$file" 2>/dev/null); then | |
| : | |
| else | |
| grep_status=$? | |
| if [[ $grep_status -eq 1 ]]; then | |
| matches="" | |
| else | |
| return "$grep_status" | |
| fi | |
| fi | |
| if [[ -n "$matches" ]]; then | |
| rules=$(printf '%s\n' "$matches" | sort -u | paste -sd, - | sed 's/,/, /g') | |
| echo "[${rules}]" | |
| else | |
| echo "[]" |
| # compute_record_source FILE -> "<author-heuristic>, round N" | ||
| compute_record_source() { | ||
| local file="$1" subj | ||
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | ||
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | ||
| echo "skill-creator, round ${BASH_REMATCH[1]}" | ||
| else | ||
| # No round marker found — still honest: cite the author and date only. | ||
| local author_date | ||
| author_date=$(git log --reverse --format='%an on %ai' -- "$file" 2>/dev/null | head -n 1 | awk '{print $1" "$2" on "$4}') | ||
| echo "git: ${author_date:-unknown}" |
There was a problem hiding this comment.
compute_record_source’s fallback emits the git author’s personal name (e.g., git: <author> on <date>). This violates the repo’s “no name attribution in code/docs/skills” rule; record_source should use role-based identifiers (e.g., “human maintainer”, “architect”, “skill-creator/round-N”) rather than contributor names.
| # compute_record_source FILE -> "<author-heuristic>, round N" | |
| compute_record_source() { | |
| local file="$1" subj | |
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | |
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | |
| echo "skill-creator, round ${BASH_REMATCH[1]}" | |
| else | |
| # No round marker found — still honest: cite the author and date only. | |
| local author_date | |
| author_date=$(git log --reverse --format='%an on %ai' -- "$file" 2>/dev/null | head -n 1 | awk '{print $1" "$2" on "$4}') | |
| echo "git: ${author_date:-unknown}" | |
| # compute_record_source FILE -> role-based provenance string | |
| compute_record_source() { | |
| local file="$1" subj | |
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | |
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | |
| echo "skill-creator, round ${BASH_REMATCH[1]}" | |
| else | |
| # No round marker found — keep provenance without personal-name attribution. | |
| local first_date | |
| first_date=$(git log --reverse --format='%ai' -- "$file" 2>/dev/null | head -n 1 | awk '{print $1}') | |
| echo "git: human maintainer on ${first_date:-unknown}" |
| # compute_record_source FILE -> "<author-heuristic>, round N" | ||
| compute_record_source() { | ||
| local file="$1" subj | ||
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | ||
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | ||
| echo "skill-creator, round ${BASH_REMATCH[1]}" |
There was a problem hiding this comment.
The record_source format produced for round-tagged commits ("skill-creator, round N") does not match the documented convention in skill-documentation-standard (skill-creator/round-N). If downstream tooling expects the standard shape, this mismatch will cause drift; consider emitting the standard tokenized form consistently.
| # compute_record_source FILE -> "<author-heuristic>, round N" | |
| compute_record_source() { | |
| local file="$1" subj | |
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | |
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | |
| echo "skill-creator, round ${BASH_REMATCH[1]}" | |
| # compute_record_source FILE -> "skill-creator/round-N" or "git: <author> on <date>" | |
| compute_record_source() { | |
| local file="$1" subj | |
| subj=$(git log --reverse --format='%s' -- "$file" 2>/dev/null | head -n 1) | |
| if [[ "$subj" =~ [Rr]ound\ *([0-9]+) ]]; then | |
| echo "skill-creator/round-${BASH_REMATCH[1]}" |
| # Verify frontmatter is well-formed: exactly two `---` fences near the top. | ||
| local dash_count | ||
| dash_count=$(awk '/^---$/ { n++ } END { print n+0 }' "$file") | ||
| if [[ "$dash_count" -lt 2 ]]; then | ||
| echo "error: $file has no closing frontmatter fence" >&2 | ||
| return 2 | ||
| fi |
There was a problem hiding this comment.
Frontmatter validation counts every --- in the file, so a markdown horizontal rule later in the body can be mistaken for the closing frontmatter fence. This can cause the script to inject fields into the body and corrupt the file. Consider verifying that line 1 is --- and locating the closing fence within the initial frontmatter block (e.g., search only after the first line until the second fence).
|
|
||
| RC=0 | ||
| for f in "${FILES[@]}"; do | ||
| process_one "$f" || RC=$? |
There was a problem hiding this comment.
process_one "$f" || RC=$? overwrites RC each time a later file fails, which can mask an earlier, more specific error (e.g., a malformed file returning 2 can be replaced by a later 1). Preserve the first non-zero exit code (or the max) so callers get a stable summary failure code when processing multiple files.
| process_one "$f" || RC=$? | |
| if ! process_one "$f"; then | |
| status=$? | |
| if [[ "$RC" -eq 0 ]]; then | |
| RC=$status | |
| fi | |
| fi |
| --- | ||
| name: algebra-owner | ||
| description: Use this skill as the designated specialist reviewer for Zeta.Core's operator algebra — Z-sets, D/I/z⁻¹/H, retraction-native semantics, the chain rule, nested fixpoints, higher-order differentials. He carries deep advisory authority on the algebra's mathematical shape; final decisions require Architect buy-in or human sign-off (see docs/CONFLICT-RESOLUTION.md). | ||
| record_source: "git: Aaron Stainback on 2026-04-18" |
There was a problem hiding this comment.
record_source includes a contributor’s personal name ("git: Aaron Stainback on 2026-04-18"), which violates the repo rule that code/docs/skills must avoid direct contributor names and use role references instead. Please replace this with a role-based identifier (and adjust the backfill script’s fallback so it can’t reintroduce names).
| record_source: "git: Aaron Stainback on 2026-04-18" | |
| record_source: "git: human maintainer on 2026-04-18" |
| record_source: "skill-creator, round 34" | ||
| load_datetime: "2026-04-19" | ||
| last_updated: "2026-04-21" | ||
| status: active |
There was a problem hiding this comment.
This skill is explicitly described as dormant/gated off (“Currently gated OFF”), but the newly added frontmatter sets status: active. Per the skill documentation standard, dormant is the intended lifecycle state for gated-off skills (e.g., ai-jailbreaker until activation). Update status to reflect the documented gating.
| status: active | |
| status: dormant |
10 named contributor archetypes (human + AI) for Zeta first-contact surfaces (issue templates, README, CONTRIBUTING.md, SECURITY.md, AGENTS.md). Companion to docs/README.md §Quick-start (document-reader audiences) and docs/EXPERT-REGISTRY.md (internal reviewers) — contributor personas answer "who just showed up wanting to contribute, and do we lose them in the first 90 seconds?" Scrubbed at pre-commit per drain-PR pre-check discipline: maintainer-name prose replaced with role-refs (BP-L284-L290); memory/ ref replaced with in-tree pointer to docs/BACKLOG.md P3 conversational- bootstrap row. Personas: typo-fixer / busy backend engineer / first-paper grad student / AI coding agent / systems engineer / security researcher / F# enthusiast / maintainer-external peer / factory-reuse adopter / returning contributor. Round-44 speculative-branch drain, batch 6b of 6 (6a skill tune-ups landed via #86; this is one of the small additive-only factory surfaces split out of batch 6 to land cleanly; 6c will carry the remaining anchor-doc subset). Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Summary
Batch 6a of the 6-batch speculative-branch drain plan
(
docs/research/speculative-branch-landing-plan-2026-04-22.md).This is the skill-tune-up sub-batch of batch 6: lands
the Data-Vault-2.0 provenance-frontmatter backfill tooling
plus the first alphabetical batch of 10 skills as dry-run
validation.
What this lands
Script:
tools/skill-catalog/backfill_dv2_frontmatter.sh(207 lines)— idempotent backfill that computes:
record_sourcefromgit log --reversefirst-landcommit subject (regex
[Rr]ound *([0-9]+)→"skill-creator, round N"; else fallback"git: <author> on <date>")load_datetimefrom first-land commitlast_updated= todaystatusdefaults to"active"(honest default; astub/dormant skill keeps the default until human
review flips it)
bp_rules_cited=[BP-11](default floor forskills that handle audited surfaces)
[--dry-run] <SKILL.md path>... | --allSkills brought into compliance (10):
.claude/skills/activity-schema-expert/SKILL.md.claude/skills/agent-experience-engineer/SKILL.md.claude/skills/agent-qol/SKILL.md.claude/skills/ai-evals-expert/SKILL.md.claude/skills/ai-jailbreaker/SKILL.md.claude/skills/ai-researcher/SKILL.md.claude/skills/alerting-expert/SKILL.md.claude/skills/algebra-owner/SKILL.md.claude/skills/alignment-auditor/SKILL.md.claude/skills/alignment-observability/SKILL.mdCompliance trajectory
After this commit: 12 of 216 SKILL.md files compliant
(the two that already had frontmatter plus this
alphabetical-batch-1 of 10). 204 remaining queued for
batches across future ticks — batches 6a+1, 6a+2, etc.
can re-run the same script on the next alphabetical
slice.
Drain-PR pre-check
Ran the
memory/(user|feedback|project|reference)_|\baaron\bgrep on commit
5f6308f(cherry-pick of48544ac) —0 hits. Clean batch.
Composition
docs/DECISIONS/2026-04-21-data-vault-2-in-skill-catalog.md— DV-2.0 adoption ADR (already landed upstream).
.claude/skills/skill-documentation-standard/SKILL.md— standard that declares the five required
frontmatter fields.
skill catalog" — phase-1 deliverable logged in
commit
a103f08(already landed).Test plan
48544acclean5f6308f— 0 hits(per original commit message)
backfill_dv2_frontmatter.shon any ofthe 10 compliant files produces a no-op diff