Skip to content
110 changes: 110 additions & 0 deletions .cursor/rules/devops/agentic-automation.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
---
description: Agentic automation rules β€” plan-then-apply for state changes, read-only by default, validation-before-success, provenance, idempotency, fail-stop on ambiguity. Applies to skills, hooks, and any AI-driven DevOps workflow.
globs:
- .cursor/skills/**
- .cursor/rules/**
- .cursor/hooks/**
- .github/workflows/**
- .github/scripts/**
- scripts/**
alwaysApply: false
---

# Agentic Automation

Rules governing agent-driven automation in DevOps surfaces β€” Cursor skills, hooks, MCPs, AI-invoked scripts, and any workflow where an agent takes action on the user's or repo's behalf.

## Core stance

- **AI-first, human-gated.** Agents propose; humans approve state changes. The agent should always be the one drafting the diff, the command, the PR body β€” never the one quietly running `apply`.

## Read-only default

- **Skills default to read-only.** Mutations to the local repo, the remote (GitHub state, releases, branch protection), or external systems (cloud, k8s, Slack, Asana) require explicit user opt-in per call.
- **The `pr-review` pattern is the reference**: print the exact command, wait for user confirmation, then execute. Never auto-execute.
- **No silent file mutations** to the user's working tree. If a skill writes outside `/tmp/` or its own `_lib/` cache, it announces what and why first.
- **No silent git mutations**: forbidden without explicit user instruction β€” `git switch`, `git checkout` (any ref/file), `git reset`, `git restore`, `git stash`, `git pull`, `git merge`, `git rebase`, `git cherry-pick`, `git clean`, `gh pr checkout`. (See `.cursor/skills/pr-review/SKILL.md` "Safety rules" section for the canonical wording.)

## Plan-then-apply for state changes

- Any state-changing operation MUST present a plan/diff and require human confirmation before execution. State-changing means: anything that, if undone, requires effort.
- Examples that REQUIRE plan-then-apply:
- `terraform apply` β€” show `terraform plan` output first
- `kubectl apply` β€” show `kubectl diff -f <file>` (or `kubectl apply --dry-run=server -f <file>`) first
- `helm install` / `helm upgrade` β€” show `helm diff upgrade` (via the `helm-diff` plugin) or `--dry-run` output first
- `gh ruleset edit` / branch protection updates β€” show before/after
- Secret rotation β€” show the rotation steps and rollback before executing
- Release tag creation, release publication
- `gh pr merge` (any merge method)
- Any `gcloud … set-iam-policy` / `aws … put-policy` / equivalent
- Repository setting changes (`gh repo edit`)
- **Production-impacting ops always require human-in-the-loop**, even if the user has previously approved similar ops. There is no "trusted auto-apply" mode.
- **Confirmation is per-operation, not per-session.** A blanket "yes, do everything" is not accepted.

## Idempotency

- **Skills must be idempotent.** Re-running with identical inputs produces the same effect, or reports "already applied" without making changes.
- **Generated artifacts (NOTICE files, changelog entries, generated docs) sort deterministically** so re-runs produce byte-identical output and clean diffs (the `notice-generate` skill is the reference).
- **Network-dependent skills** cache responses where reasonable (`/tmp/<skill>-<id>.json`) and document staleness.

## Validation before success

- Skills MUST validate their output before reporting "done."
- File mutations β†’ re-read, parse, lint
- Git operations β†’ `git status` shows expected state
- PR operations β†’ `gh pr view` confirms state
- Workflow file edits β†’ `actionlint` clean
- **No "done" without evidence.** The user should never have to ask "did it work?"
- **Print the verification step** in the chat output, not just "βœ… done".

## Fail-stop on ambiguity

- **Stop when state is unexpected.** Examples that trigger fail-stop:
- Uncommitted local changes when the skill expects a clean tree
- Branch is not what the skill expected
- A required env var, secret, or file is missing
- Two conflicting cursor rules apply
- A required tool is not on PATH
- PR data fetched but the SHA does not match the expected one
- **Failure mode is "stop and ask," never "guess and proceed."**
- **Partial success is reported as partial success.** Don't summarize "everything worked" when one of three steps failed.

## Provenance and traceability

- **Agent-authored commits/PRs are labeled.** PR body includes a footer indicating which agent and which skill produced the work, so reviewers know what to audit harder.
- **MCP tool descriptors are read before invocation.** Never call an MCP tool blind. (Reinforced in the system prompt; rule-level for emphasis.)
- **Sensitive operations are logged.** Skills that touch secrets, deploy, or modify protected branches append a redacted summary to a session log under `/tmp/agent-session-<id>.log`. The log path is mentioned in the chat output.
- **Skills declare which secrets they read** in their `SKILL.md` frontmatter description or "Prerequisites" section. No silent secret access.

## Bounded resource use

- **Bound shell-call count per skill.** State an explicit budget in the skill's "Efficiency rules" section (the `pr-review` skill caps at ~5–8 calls; treat that as the ceiling unless justified).
- **Cache fetched data once per session** (`/tmp/<skill>-<id>.json`). Never re-fetch the same PR/run/file in one session.
- **Use dedicated tools, not shell.** `Read` instead of `cat`/`head`/`tail`, `Grep` instead of `grep`/`rg`, `Glob` instead of `find`, `Write`/`StrReplace` instead of `echo >` / heredoc.
- **Prefer scripts in `_lib/<domain>/`** over inline shell pipelines. Scripts are testable, deterministic, and share-able across skills.

## Skill authoring conventions

- **One skill = one capability.** Don't compose unrelated workflows in a single skill.
- **Naming**: kebab-case, prefix-grouped (`devops-*`, `sdk-*`, `addon-*`, `pr-*`).
- **`SKILL.md` body ≀ 500 lines.** Push detail to `references/*.md` and link one level deep.
- **Slash-command discoverability**: include `/<skill-name>` in the description.
- **Auto-invoke vs explicit**: `disable-model-invocation: true` for any skill that mutates state, posts publicly, or runs slow / expensive tooling. Auto-invoke only for read-only, fast, contextually obvious helpers.
- **Description includes WHAT and WHEN** in third person.
- **Co-locate shared logic** under `.cursor/skills/_lib/<domain>/` when two or more skills want the same primitive.

## Hooks

- **Hooks run on every applicable agent event.** Keep them fast (<2s) and side-effect-free unless the side effect is the whole point.
- **Hooks must not silently modify the user's working tree** (see read-only default).
- **Hooks that block the agent** must say so loudly and explain how to unblock.

## Secrets and external systems

- **Secrets handling defers to `.cursor/rules/devops/secrets-and-credentials.mdc`** β€” every rule there applies.
- **External writes (Slack, Asana, Linear, GitHub) require explicit user confirmation** for the specific message/issue/PR being created.
- **Drafts over publishes**: where the platform supports drafts (PRs, releases, Slack scheduled), prefer creating a draft and surfacing the link to the user.

## When the rails block a useful flow

- If a rail is blocking a flow that's clearly safe and clearly useful, the answer is to relax the rail in this rule (with a PR), not to override it case-by-case in a skill. Code that quietly bypasses these rules is a bug.
137 changes: 137 additions & 0 deletions .cursor/rules/devops/github-actions.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
---
description: GitHub Actions conventions β€” pinned actions, least-privilege permissions, OIDC, hardened runners, caching, concurrency, environments, reusable workflows
globs:
- .github/workflows/**
- .github/actions/**
alwaysApply: false
---

# GitHub Actions

Pod scope and high-level principles live in `.cursor/rules/devops/main.mdc`. This file is the deep-dive for workflow and composite-action authoring.

## Action references

- **Third-party actions MUST be pinned to a 40-character commit SHA** with a trailing `# v<version>` comment. Tags are silently mutable.
- βœ… `uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # 6.0.2` (this repo's current pin)
- ❌ `uses: actions/checkout@v4`
- **Pin transitively.** If a composite action under `.github/actions/<name>/` uses a third-party action, that nested reference is also pinned to a SHA.
- **First-party actions** in this repo (`.github/actions/<name>/`) may be referenced by relative path or by SHA. Prefer relative path for in-repo usage; SHA when the action is consumed by a workflow that may run on a different ref.
- **Action upgrades land in their own PR**, never bundled with feature changes. The PR body links the action's release notes and includes the SHA diff.

## Permissions

- **Every workflow declares `permissions:` explicitly** β€” top-level OR per-job. Relying on repository-default permissions is forbidden.
- **Top-level** is preferred when all jobs share the same scope; default to least privilege:
```yaml
permissions:
contents: read
```
- **Per-job** is preferred when scopes differ between jobs (e.g., one job needs `id-token: write` for OIDC, another only needs `contents: read`). Per-job blocks REPLACE the top-level β€” they do not merge.
- **Widen per-job, not per-workflow,** when a single job needs more than the rest (e.g., `pull-requests: write` for label automation).
- **`id-token: write` is required for OIDC.** Grant only on the specific job that authenticates to a cloud provider.
- **Never use `permissions: write-all`.** If you think you need it, you don't.

## Triggers and untrusted input

- **`pull_request_target` is dangerous.** Use it only when the workflow does not check out PR HEAD code. Common safe uses: labeling, commenting, gating on metadata.
- **Never reference user-controlled `${{ github.event.* }}`, `github.head_ref`, `github.event.pull_request.title`, `body`, or `commits[*].message` directly inside `run:` blocks.** Pipe via `env:` and quote.
- βœ…
```yaml
env:
PR_TITLE: ${{ github.event.pull_request.title }}
run: |
echo "$PR_TITLE"
```
- ❌ `run: echo "${{ github.event.pull_request.title }}"`
- **Token scope on fork PRs**: `pull_request` from forks gets a read-only `GITHUB_TOKEN` and no secrets. Do not work around this β€” design fork-safe workflows or split into a `pull_request_target` follow-up that does not check out PR code.

## OIDC over long-lived credentials

- **Cloud authentication uses OIDC.** No long-lived access keys / service-account JSON in secrets where OIDC is supported.
- GCP: `google-github-actions/auth` with `workload_identity_provider`
- AWS: `aws-actions/configure-aws-credentials` with `role-to-assume`
- Azure: `azure/login` with `client-id` + `tenant-id` + `subscription-id`
- **OIDC role/policy must be scoped per-workflow or per-environment**, never repo-wide read-write.
- **Document the trust relationship** in a comment near the auth step (which IdP, which role, which environment).

## Hardened runners

- **Sensitive workflows run `step-security/harden-runner`** as the first step, in `audit` mode initially and `block` mode once egress is characterized.
- Examples of sensitive: any workflow handling secrets, publishing artifacts, deploying to prod, or with `id-token: write`.
- **Document allowed egress** in the `harden-runner` config; do not silently expand it.

## Caching

- `actions/cache` keys are deterministic and prefixed:
```yaml
key: ${{ runner.os }}-${{ matrix.target }}-${{ hashFiles('**/lockfile') }}
restore-keys: |
${{ runner.os }}-${{ matrix.target }}-
```
- **Never write to a shared cache from PR-triggered jobs running on fork code.** Cache poisoning is a real attack. Restrict cache writes to `push` / `workflow_dispatch` / `merge_group` triggers.
- Caches scoped to a branch by default; that's fine for build artifacts. Cross-branch caches require justification.

## Concurrency

- **Every workflow declares a `concurrency` block** unless concurrent runs are explicitly required.
```yaml
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
```
- **Release workflows do NOT cancel in progress.** Use `cancel-in-progress: false` on `release-*.yml` and any workflow that mutates external state.

## Matrix builds

- `fail-fast: true` is the default. Set `fail-fast: false` only for diagnostic / cross-platform matrices where seeing all failures is more useful than stopping early.
- Matrix dimension names are concrete and stable: `os`, `arch`, `node-version`, `target` β€” not `config`, `variant`, `flavor`.
- `include` and `exclude` are documented in a comment when used; they obscure the matrix.

## Environments

- **Production deployments target a GitHub Environment** with required reviewers configured in the GitHub UI, not in YAML.
- **Environment names are stable**: `production`, `staging`, `preview-<feature>`. Do not invent per-PR environment names that won't be reused.
- **Environment secrets** are scoped to the environment, not the repo. Promote from preview β†’ staging β†’ production.

## Reusable workflows and composite actions

- **Reusable workflow** (`workflow_call`): for multi-step orchestration with its own triggers, jobs, and concurrency.
- **Composite action** (`.github/actions/<name>/action.yml`): for a sequence of steps that runs inside a job. One responsibility per action.
- **Extract when duplicated** across two or more workflows.
- **Reusable workflow inputs and outputs are documented** in the workflow header. No magic strings.

## Status checks

- **Job names are stable.** Branch protection rulesets bind to job names; renaming a job breaks protection silently.
- **Names are descriptive and lowercase-hyphenated** (e.g., `lint`, `unit-tests`, `cpp-tests`). Choose once and treat as part of the public contract.
- **Required checks are declared in the GitHub ruleset**, not in YAML. The ruleset is the source of truth.
- **Renaming a required check requires a coordinated change**: update the ruleset and the workflow in the same change window, or add the new name as required while leaving the old one until merges drain.

## Outputs and step IDs

- **Use `$GITHUB_OUTPUT`**, never `::set-output::` (deprecated).
```bash
echo "version=$VERSION" >> "$GITHUB_OUTPUT"
```
- **Every step that produces an output has an `id:`.** Reference as `${{ steps.<id>.outputs.<key> }}`.

## Failure handling

- **`continue-on-error: true` is forbidden on Tier-1 checks** (lint, format, type-check, security scans, tests). Allowed on diagnostic-only steps.
- **`if: always()`** on cleanup steps so artifacts and logs are uploaded even on failure.
- **Timeouts**: every job declares `timeout-minutes`. Default budget is 30 minutes; longer requires a comment explaining why.

## File layout and naming

- Workflow filenames describe the trigger and target. Existing repo conventions:
- `on-pr-<pkg>.yml` β€” triggered by PR events on a package surface
- `on-merge-<pkg>.yml` β€” triggered when a PR merges to main
- `on-pr-close-<pkg>.yml` β€” triggered when a PR closes
- `release-<pkg>.yml` / `create-github-release-<pkg>.yml` β€” release pipelines
- `prebuilds-<pkg>.yml` β€” prebuilt artifact generation
- `pr-test-*.yml` / `pr-validation-*.yml` / `pr-checks-*.yml` β€” PR test/validation pipelines
- `integration-<scope>-<pkg>.yml` β€” cross-package integration / e2e suites (e.g., mobile device-farm runs)
- `reusable-*.yml` β€” reusable workflow building block (consumed via `workflow_call`)
- `trigger-reusable-*.yml` β€” entrypoint workflow whose only job is to call a `reusable-*.yml`
- **One responsibility per workflow file.** Split when triggers diverge.
Loading
Loading