ci: delete mirror-sync and fork-ci/ (companion to storage SOURCE_REF PR) by entlein · Pull Request #27 · k8sstormcenter/node-agent

entlein · 2026-04-24T12:11:01Z

Summary

Delete the mirror-sync workflow and the empty `fork-ci/` directory. Pairs with k8sstormcenter/storage 'ci: replace mirror-sync with SOURCE_REF input on build.yaml'.

Why

Mirror-sync was going to force-push `test-mirror/**` branches with overlaid fork workflows. GitHub Actions does not allow GITHUB_TOKEN to create or update files under `.github/workflows/` on any ref, so the overlay approach is dead regardless of token. Rather than widen a long-lived PAT to get around it, we're removing the concept entirely.

The replacement lives on storage: `build.yaml` there gains a `SOURCE_REF` workflow_dispatch input, and its `trigger-node-agent` job dispatches this repo's existing `build.yaml` at the matching ref. Node-agent's `build.yaml` is already workflow_dispatch-only with IMAGE_TAG + STORAGE_REF inputs, so no new node-agent workflow is needed.

Deletions

`.github/workflows/mirror-sync.yaml`
`fork-ci/` (README-only placeholder)

Test plan

Merge this + the storage counterpart
Dispatch from storage with `SOURCE_REF=upstream-pr/analyzer-zero-alloc`
Confirm node-agent `build-image` runs on the corresponding upstream-pr ref and its component-tests chain triggers

Paired with k8sstormcenter/storage deleting its mirror-sync in favour of a SOURCE_REF input on build.yaml. Mirror-sync is dead because GITHUB_TOKEN cannot push .github/workflows/** on any ref, and we explicitly don't want to widen a long-lived PAT to work around that. Node-agent's existing build.yaml is already workflow_dispatch-only and takes IMAGE_TAG + STORAGE_REF, so storage's trigger-node-agent step can dispatch it directly at any ref (e.g. upstream-pr/** without those branches ever needing fork-specific workflow files). No new node-agent workflow needed on this side. Deletions: - .github/workflows/mirror-sync.yaml - fork-ci/ (was a placeholder; no overlay concept remains) Signed-off-by: Entlein <eineintlein@gmail.com>

Three CR concerns addressed in one commit (all submitted 2026-05-04 10:09): 1. tamper_alert.go (Major) — only emit R1016 on actual signature mismatch. Previously every error from VerifyObjectAllowUntrusted (hash computation, verifier construction, malformed annotations, signature mismatch) was treated as a tamper event. With EnableSignatureVerification=true that meant a transient operational failure could drop a valid overlay AND emit a false R1016. Fix: - Add signature.ErrSignatureMismatch sentinel in pkg/signature/annotations.go - Wrap the signature-fail return in verify.go with the sentinel (Go 1.20+ multiple-%w form) - Classify in tamper_alert.go via errors.Is(err, ErrSignatureMismatch); operational errors log with a 'NOT tamper' tag, do NOT touch the tamperEmitted dedup map, and do NOT emit R1016. They still honour strict-mode (return !EnableSignatureVerification) so behaviour is conservative. Adds tamper_alert_test.go: pins LoadOrStore semantics, confirms each operational-error variant returns false on errors.Is(ErrSignatureMismatch), smoke-tests the sentinel value. 2. aplint_test.go:221 (Minor) — replace strings.Contains substring check for fixture-type detection with proper YAML-header parse. The substring form silently skipped fixtures with quoted/non-canonical 'kind' values AND would false-positive on nested OwnerReferences with the same substring. Now uses sigs.k8s.io/yaml (already imported) to unmarshal into a {Kind string} struct and branch on the parsed value. 3. aplint_test.go R-AP-12 (Major, 'Heavy lift') — REJECTED with reasoning to be posted as CR reply. Evidence shows R-AP-12 enforces the actual runtime contract: - pkg/rulemanager/cel/libraries/parse/parse.go's getExecPath returns args[0] (the full binary path) - ap.was_executed_with_args(containerID, parse.get_exec_path(event.args, event.comm), event.args) passes the FULL argv to the matcher - Integration test pkg/rulemanager/cel/libraries/applicationprofile/ integration_test.go uses ["/bin/bash", "-c", ...] — full argv - Real fixture tests/resources/known-application-profile.yaml uses args: ["/bin/sleep", "infinity"] with args[0] == path The compare_exec_args_test.go cases that look like flags-only ([-s, ⋯]) are matcher-isolation unit tests, not the wired contract. Removing R-AP-12 would let users author flags-only profiles that silently fail to silence R0040 in production. Bumps storage replace to a7e6234349ab (storage main HEAD post-PR #27).

* test(component): port Test_28 to upstream's unified user-overlay label Upstream PR kubescape#788 (Replace AP and NN cache with CP) collapsed the two legacy workload-keyed caches into a single ContainerProfileCache that reads ONE pod label `kubescape.io/user-defined-profile=<name>` and uses <name> as the lookup key for BOTH the user ApplicationProfile and the user NetworkNeighborhood. The fork's earlier two-label scheme (`user-defined-profile` for AP + separate `user-defined-network` for NN, with potentially different names) is no longer honored — the second label is silently ignored. Port: - tests/resources/nginx-user-defined-deployment.yaml: drop the `user-defined-network` label, point the surviving label at one shared name `curl-28-overlay`. - tests/component_test.go Test_28_UserDefinedNetworkNeighborhood: create both AP and NN under that single shared name. Assertions unchanged — the test still verifies that the user NN's egress restriction (only fusioncore.ai allowed on TCP/80) is enforced once the pod is running. Verified locally: go vet -tags=component ./tests/... clean; go test -tags=component -run='^$' ./tests/... compiles cleanly. * feat(cel): re-port CompareExecArgs hookup onto upstream's CP cache PR #35's wildcard-aware exec arg matching needs reapplication on top of the new ContainerProfileCache (upstream kubescape#788) baseline. The original PR sat on the legacy applicationprofilecache, which has been deleted; the call site now reads cp.Spec.Execs from a ContainerProfile. Same semantic change as PR #35: '⋯' (DynamicIdentifier) — matches exactly one argument position. '*' (WildcardIdentifier) — matches zero or more consecutive args. Wiring: - pkg/rulemanager/cel/libraries/applicationprofile/exec.go: drop slices.Compare exact-equality on the cp.Spec.Execs loop; route through dynamicpathdetector.CompareExecArgs. - go.mod: bump fork storage replace to feat/exec-arg-wildcards tip (3fc287210729) which carries the matcher. - exec_test.go: re-add TestExecWithArgsWildcardInProfile (13 subtests across curl --user ⋯, sh -c *, ls -l ⋯, echo hello *, plus negative literal-anchor / under-consumed-⋯ / mid-profile-* cases). Mirrors the test set that lived on PR #35 before the upstream merge. Verified: full applicationprofile package green (`go test ./pkg/rulemanager/cel/libraries/applicationprofile/`). * feat(rules): R0040 'Unexpected process arguments' + Test_32 e2e R0040 is an additive companion to R0001. It evaluates: ap.was_executed(...) && !ap.was_executed_with_args(..., event.args) so it ONLY fires when the exec'd path IS in the user-defined profile (R0001 stays silent) but the runtime arg vector does not match any profile entry's pattern. With wildcard tokens supported by dynamicpathdetector.CompareExecArgs: '⋯' (DynamicIdentifier) — exactly one argument position. '*' (WildcardIdentifier) — zero or more consecutive args. Use case: profile entry {Path: /bin/sh, Args: [sh, -c, *]} flags 'sh -x ...' as drift while permitting 'sh -c <anything>'. Wiring: - tests/chart/templates/node-agent/default-rules.yaml: new R0040 CEL rule definition immediately after R0001, same MITRE tagging (TA0002/T1059) and same applicationprofile-anomaly tag set. - tests/chart/templates/node-agent/default-rule-binding.yaml: R0040 added to the all-rules-all-pods binding next to R0001. - tests/resources/curl-exec-arg-wildcards-deployment.yaml: new fixture, curl pod labelled with the unified kubescape.io/user-defined-profile=curl-32-overlay label. - tests/component_test.go: Test_32_UnexpectedProcessArguments with 4 subtests: 32a sh_dash_c_matches_wildcard_trailing — sh -c <cmd> matches profile [sh, -c, *] — R0040 silent. 32b sh_dash_x_mismatches_R0040 — sh -x <cmd> mismatches the literal -c anchor — R0040 fires. 32c echo_hello_matches_wildcard_trailing — echo hello world matches [echo, hello, *] — R0040 silent. 32d echo_goodbye_mismatches_R0040 — echo goodbye world mismatches the literal hello anchor — R0040 fires. Verified locally: go vet -tags=component ./tests/... clean; go test -tags=component -run='^$' ./tests/... compiles cleanly. End-to-end alert assertions run in CI. * deps(storage): bump to rebased feat/exec-arg-wildcards tip (0de34ebc) After rebasing storage feat/exec-arg-wildcards onto storage main, the matcher branch now sits on top of the latest fork main commit (352395a3 — Internal-field merge fix). Bump the node-agent storage replace to that new pseudo-version so this branch's tests run against storage main + matcher in one consistent baseline. Verified locally: 47/47 non-eBPF unit packages green; vet clean; the applicationprofile CEL package's TestExecWithArgsWildcardInProfile is 13/13 green; component-tests compile under the component tag. The two failing packages (pkg/containerwatcher/v2/tracers and pkg/validator) fail with the same pre-existing /sys/fs/bpf mount-permission error they have on every recent run — env, not code. * ci(component-tests): add Test_32_UnexpectedProcessArguments to matrix The component-tests workflow uses a hardcoded matrix list, not a dynamic discovery from the test source. Test_32 (added in a613cf6) must be listed explicitly to be picked up — without this entry the test is silently skipped. * fix(containerprofilecache): re-wire R1016 tamper alert + expand Test_31 Upstream PR kubescape#788 (Replace AP and NN cache with CP) deleted the legacy applicationprofilecache where the fork's emitTamperAlert (commit c2d681e 'Feat/tamperalert' #22) lived. After the merge, R1016 alerts no longer fired for tampered user-defined profiles, breaking Test_31_TamperDetectionAlert (passed 3/3 on main, fails on the merged branch — confirmed regression introduced by PR #36). This restores the contract: every cache load of a user-supplied ApplicationProfile or NetworkNeighborhood overlay re-verifies the signature, and emits an R1016 'Signed profile tampered' alert through the rule-alert exporter when the signature is present but no longer valid. Alert shape preserved from the legacy cache so dashboards and component tests keep matching. Implementation: - new file pkg/objectcache/containerprofilecache/tamper_alert.go: verifyUserApplicationProfile / verifyUserNetworkNeighborhood / emitTamperAlert / extractWlidFromContainerID. Self-contained; keeps containerprofilecache.go diff small. - containerprofilecache.go: new tamperAlertExporter field + SetTamperAlertExporter setter + verify hooks immediately after GetApplicationProfile / GetNetworkNeighborhood succeed in the user-overlay branch of addContainer. - cmd/main.go: wire the alert exporter via SetTamperAlertExporter after the cache constructor (kept the constructor signature unchanged to avoid blast radius on tests). The setter is nil-safe: when no exporter is wired, verification still runs and is logged but no alert is emitted — matches the legacy behavior for unit-tests-with-no-exporter. Test_31 expanded from one scenario to four subtests, each in its own namespace to avoid alert cross-contamination: 31a tampered_user_defined_AP_fires_R1016 — original regression case 31b untampered_signed_AP_no_R1016 — negative: clean signature 31c unsigned_AP_no_R1016 — signing is opt-in 31d tampered_user_defined_NN_fires_R1016 — parallel NN code path Verified locally: - go build ./... clean - go test ./pkg/objectcache/containerprofilecache/... green - go test ./pkg/signature/... green - go vet -tags=component ./tests/... clean - go test -tags=component -run='^$' ./tests/... compiles * test(component): Test_32 profile uses full-path argv[0] Empirical finding from CI run 25178930763 — Test_32's positive subtests (32a sh_dash_c_matches, 32c echo_hello_matches) fired R0040 when they should not. Cause: at runtime, the eBPF tracer captures argv[0] as the FULL exec path (e.g. "/bin/sh") rather than the basename ("sh"). My profile entries used basenames, so the matcher's first-position literal compare missed and the cache fell through to 'no exec entry matches' — R0040 fires. Aligns Test_32's profile with the convention already used by Test_27's wildcard_yaml_profile_allowed_opens fixture (known-application-profile-wildcards.yaml predecessor): argv[0] is the full path, subsequent positions are flags/values. Subtest expectations after this fix: 32a sh -c <cmd> → matches [/bin/sh, -c, *] → R0040 silent 32b sh -x <cmd> → -c anchor mismatch → R0040 fires 32c echo hello <…> → matches [/bin/echo, hello, *]→ R0040 silent 32d echo goodbye <…> → hello anchor mismatch → R0040 fires * test: AP-fixture linter (R-AP-* rules) + canonical reference profile Catches the class of bug Test_32 hit on its first CI run (PR #37 run 25178930763): profile entries used basename argv[0] ("sh") while the eBPF tracer captures the full path ("/bin/sh"), so the matcher silently misses and the rule fires when it shouldn't. Without a linter, this kind of fixture drift only surfaces in a 15-minute component-test run on a kind cluster — too late, too expensive. The linter (LintApplicationProfile / LintApplicationProfileYAML in tests/resources/aplint_test.go) is intentionally written as a pure function returning []Violation. Zero testing-package coupling on the hot path so it can be lifted into a future bobctl subcommand `bobctl lint <ap.yaml>` without rewrite — see backlog at ~/biz/sbob-business-plan/state.yaml. Rules: R-AP-01 — kind must be ApplicationProfile R-AP-02 — at least one container R-AP-03 — container.name non-empty R-AP-10 — exec.path absolute (catches relative paths) R-AP-11 — exec.path no wildcards (binary identity is exact) R-AP-12 — exec.args[0] equals exec.path or wildcard token (Test_32-style argv[0] basename trap) R-AP-13 — exec.args wildcard tokens are whole-word (no embedding) R-AP-20 — open.path non-empty + absolute R-AP-21 — open.flags non-empty (real auto-recorded opens always have ≥1) R-AP-22 — open.flags from known O_* set (catches typos) Each rule has a dedicated self-test that constructs a minimal-bad YAML and asserts the rule fires (5 negative tests). One positive test (TestLinter_canonical_AP_passes) parses the fork's reference known-application-profile.yaml — extracted from a real auto-recorded AP for curlimages/curl:8.5.0 in fea3b06 — and asserts zero violations. The reference YAML is restored to tests/resources/ so the canonical shape is in-tree and visible to humans + CI. Why a Go test rather than a shell linter: keeps the rule set in the same language as the storage matcher (`dynamicpathdetector`), so extending CompareExecArgs and the linter together stays cheap. Local-cluster organic learning was the original plan but k3s on OrbStack is currently flapping (LXC-related boot loop). The fea3b06 profile was extracted from real auto-learning at an earlier moment of stability, which is the next-best ground truth. * fix(tamper_alert): accept self-signed profiles, only flag actual tamper Switch verifyUser{ApplicationProfile,NetworkNeighborhood} from strict VerifyObject to VerifyObjectAllowUntrusted. The strict variant requires a Sigstore Fulcio trust chain and rejects locally-signed profiles even when the signature against the embedded cert is valid. That made Test_31b 'untampered_signed_AP_no_R1016' fire R1016 against an untampered AP, and broke Test_30's 'tampered_profile_loaded_without_ enforcement' subtest the same way. The intent is: tamper detection, not trust-chain enforcement. Matches cmd/sign-object/main.go's default verifier. * test(component): make Test_30 30b deterministic by re-execing inside Eventually The single-shot wget exec before Eventually was racy: if the eBPF event landed before the CP cache projected the user-defined AP, the rule manager evaluated against an empty baseline and R0001 never fired within the 60s polling window. Same race Test_29 already documents. Drive the wget exec inside the Eventually loop (10s tick, 120s deadline) so cache-load latency is absorbed by retries. Filter R0001 to comm=wget to make the assertion specific instead of catching any R0001. Drops the blind 15s pre-sleep and the redundant settle-then-recount block. * deps(storage): bump replace to f44fed80 (analyzer trailing-* fix) Picks up the upstream-PR-kubescape#316 review fix: trailing WildcardIdentifier now requires at least one regular-path segment, matching standard glob semantics. Closes the R0002 blind spot where '/etc/*' would silently match the bare '/etc' directory. * deps(storage): bump replace to 4ab95fb8 (PR #25 merged on fork main) Pulls in the full PR-kubescape#316 review fix set that just landed on storage main: proper splitPath-based trailing-* anchoring, DefaultCollapseConfigs() defensive-copy accessor, FindConfigForPath value-return, splitEndpoint defensive guard, plus the BenchmarkCompareDynamic baseline. * test(component): Test_33_AnalyzeOpensWildcardAnchoring End-to-end pin of the storage-side CompareDynamic contract through R0002. Each subtest deploys a fresh nginx pod with a user-defined AP carrying ONE Opens entry, then `cat`s a target path that probes a boundary case from the storage analyzer fixes (kubescape/storage kubescape#316 review by matthyx + entlein): - Anchored trailing `*` matches one OR MORE remaining segments — never zero. So /etc/* matches /etc/passwd but NOT bare /etc. - DynamicIdentifier (⋯) consumes EXACTLY ONE segment. - Mid-path `*` is zero-or-more, so /etc/*/* matches /etc/ssh (inner * consumes zero, trailing * consumes one). - Mixed ⋯/* combinations: ⋯ pins one, * consumes the rest. - splitPath normalises trailing slashes on both sides. 11 subtests covering: trailing_star_matches_immediate_child — basic /etc/* match trailing_star_matches_deep_child — multi-segment under prefix trailing_star_does_not_match_bare_parent — the security fix deep_prefix_trailing_star_does_not_match_parent — same rule, deeper ellipsis_pin_one_segment_then_literal — ⋯ rejects zero ellipsis_then_trailing_star_matches_two_* — ⋯/* combo, 2 levels ellipsis_then_trailing_star_matches_three_* — ⋯/* combo, 3 levels double_trailing_matches_one_child — /*/* mid-zero double_trailing_matches_deep_child — /*/* mid-one double_trailing_does_not_match_parent — /*/* needs ≥1 child trailing_slash_in_profile_normalises_to_literal — splitPath on profile Pinned at component level on TOP of the unit suite in storage/pkg/registry/file/dynamicpathdetector/tests/coverage_test.go. Both layers must agree — a drift in either lights up R0002 with a false positive or false negative. Matrix entry added to component-tests.yaml so the test runs in CI. * test(component): rework Test_33 negative cases to probe under R0002's monitored prefix R0002's CEL ruleExpression has a strict path-prefix filter: event.path.startsWith('/etc/') || event.path.startsWith('/var/log/') || ... All with trailing slash. Bare /etc and /var/log don't pass the filter, so R0002 never evaluates on those events — the matcher's bare-parent anchoring contract stays invisible at runtime even though the storage unit tests pin it. Probe one level deeper instead: /etc/ssl IS under the /etc/ monitored prefix, so the rule CAN see whether a /etc/ssl/* profile entry matches the bare /etc/ssl parent. Same security guarantee, observable layer. Reworked subtests: - trailing_star_does_not_match_bare_parent_under_monitored_prefix: profile /etc/ssl/*, cat /etc/ssl → R0002 fires - deep_prefix_trailing_star_does_not_match_parent: profile /etc/ssl/certs/*, cat /etc/ssl/certs → R0002 fires - ellipsis_requires_one_segment_not_zero: profile /etc/passwd/⋯, cat /etc/passwd → ⋯ requires one more segment - double_trailing_does_not_match_parent_under_monitored_prefix: profile /etc/ssl/*/*, cat /etc/ssl → R0002 fires The 7 positive subtests that already passed are untouched. Added a comment block documenting why we probe at /etc/ssl rather than /etc. * test(component): fix Test_28 + Test_31 31b flakiness Two distinct fixes for what looked like the same intermittent failures across PR #37 runs: Test_31 31b 'untampered_signed_AP_no_R1016' — root cause: storage's PreSave runs DeflateSortString on Syscalls (and Capabilities, Architectures), which sorts + dedupes. The signSignedAP helper signed the AP BEFORE pushing, against unsorted syscalls {socket, connect, read, write, close, openat}. After PreSave the stored AP had sorted {close, connect, openat, read, socket, write}, so the content hash differed from the signature → server-side verify correctly failed → R1016 fired even though the profile was untampered. Test_29 + Test_30 30b had the same fixture but didn't observe the bug because they only assert R0001 counts, never R1016. Pre-sort the syscalls in all three test fixtures so storage's normalization is a no-op on round-trip. Test_28 28a 'allowed_fusioncore_no_alert' — root cause: 15s post-deploy sleep wasn't always enough for the upstream ContainerProfileCache to project the user-defined NN. Failure mode is alert payload `profileMetadata.errorMessage:"waiting for profile update"` — the rule manager evaluated against an unloaded NN and fired R0005/R0011 spuriously. Bumped to 30s with a comment documenting why. A real fix would poll a cache-loaded signal but no such signal is exposed from outside the node-agent today. * test(component): sign-after-roundtrip in Test_31 to defeat content-drift R1016 false positive Test_31 31b 'untampered_signed_AP_no_R1016' kept flaking because the AP's content hash drifted between client-side sign and server-side verify across the K8s/storage roundtrip. Sources of drift include storage's PreSave normalisation (DeflateSortString, DeflateStringer, DeflateRulePolicies), signature/profiles GetContent's nil→empty-map mutation on PolicyByRuleId, and any K8s server-side defaulting of spec/metadata fields. Pre-sorting Syscalls in the previous fix only covered one of these. Sign-after-roundtrip closes the whole class: 1. Push the AP UNSIGNED to storage. PreSave runs, normalises content. 2. Read it back — this is what node-agent will see at verify time. 3. Sign THAT normalised content. 4. Delete the unsigned in-storage copy so deployAndWait can Create the signed version without an AlreadyExists conflict. 5. Strip server-managed metadata (resourceVersion / uid / etc.) from the returned AP so the second Create succeeds cleanly. Second push goes through deflate again. Idempotent on already-normalised content → stored bytes identical to signed bytes → content hash matches → verify succeeds → no R1016 false positive. Tampered subtests (31a, 31d) keep working: signSignedAP returns a known-good signed AP, the test mutates it post-helper, deployAndWait Creates the mutated version, storage round-trip preserves the mutation, and verify correctly detects the divergence. * test(component): bump Test_33 WaitForReady to 180s for cluster-pressure tolerance Test_33 deploys 11 fresh pods sequentially, one per subtest. Later subtests race against an increasingly loaded kind cluster — CP cache reconciler, alertmanager, prometheus all chew CPU at boot. 80s WaitForReady deadline timed out on the post-23ea224 run with 'workload not ready in ns ...' for early subtests once the cluster got busy. 180s gives headroom without changing total runtime regime. * deps(storage): bump replace to 43795bb4 (storage feat/exec-arg-wildcards tip — CRDs + CompareExecArgs) * test(aplint): drop redundant p := p loop var (Go 1.22+, copyloopvar lint) CodeRabbit (PR #38). Loop variable shadowing is no longer required since Go 1.22 made each iteration capture its own variable. The shadow trips golangci-lint's copyloopvar check. * fix(tamper_alert): R1016 dedup + use real WLID CodeRabbit (PR #38) flagged two issues on the refresh-loop tamper path: 1. Dedup R1016 (Major). The cache refresh re-runs verifyUser*() on every reconcile cycle, so a long-lived tampered profile would emit R1016 on every cycle and once per container reference. Track emitted alerts in a sync.Map keyed on (kind|ns/name@resourceVersion). LoadOrStore gives atomic 'first transition to invalid' semantics; a re-tamper at a new RV uses a fresh key and alerts again. Verification passing clears the key so future tampers re-alert. 2. Pass real WLID (Major). emitTamperAlert previously called extractWlidFromContainerID(containerID) — but containerID is a runtime ID like 'containerd://...' which GenericRuleFailure.SetWorkloadDetails parses as a WLID and silently drops kind/name/cluster attribution. Thread sharedData.Wlid (already in scope at the call site in containerprofilecache.go) through verifyUser*() into emitTamperAlert. Drop extractWlidFromContainerID — no longer needed. Test_31_TamperDetectionAlert exercises this path end-to-end; expecting it to keep passing (one alert per tampered profile, with proper workload attribution in Alertmanager). * deps(storage): bump replace to b0d68d3d (empty-Args wildcard match) Picks up storage's CompareExecArgs fix for path-only Execs entries (CodeRabbit PR #38, finding #3). With this bump, R0040 no longer fires on every legit exec of a path that the user-defined profile allowed but didn't constrain by argv. * fix: address CodeRabbit second-review batch on PR #38 Three CR concerns addressed in one commit (all submitted 2026-05-04 10:09): 1. tamper_alert.go (Major) — only emit R1016 on actual signature mismatch. Previously every error from VerifyObjectAllowUntrusted (hash computation, verifier construction, malformed annotations, signature mismatch) was treated as a tamper event. With EnableSignatureVerification=true that meant a transient operational failure could drop a valid overlay AND emit a false R1016. Fix: - Add signature.ErrSignatureMismatch sentinel in pkg/signature/annotations.go - Wrap the signature-fail return in verify.go with the sentinel (Go 1.20+ multiple-%w form) - Classify in tamper_alert.go via errors.Is(err, ErrSignatureMismatch); operational errors log with a 'NOT tamper' tag, do NOT touch the tamperEmitted dedup map, and do NOT emit R1016. They still honour strict-mode (return !EnableSignatureVerification) so behaviour is conservative. Adds tamper_alert_test.go: pins LoadOrStore semantics, confirms each operational-error variant returns false on errors.Is(ErrSignatureMismatch), smoke-tests the sentinel value. 2. aplint_test.go:221 (Minor) — replace strings.Contains substring check for fixture-type detection with proper YAML-header parse. The substring form silently skipped fixtures with quoted/non-canonical 'kind' values AND would false-positive on nested OwnerReferences with the same substring. Now uses sigs.k8s.io/yaml (already imported) to unmarshal into a {Kind string} struct and branch on the parsed value. 3. aplint_test.go R-AP-12 (Major, 'Heavy lift') — REJECTED with reasoning to be posted as CR reply. Evidence shows R-AP-12 enforces the actual runtime contract: - pkg/rulemanager/cel/libraries/parse/parse.go's getExecPath returns args[0] (the full binary path) - ap.was_executed_with_args(containerID, parse.get_exec_path(event.args, event.comm), event.args) passes the FULL argv to the matcher - Integration test pkg/rulemanager/cel/libraries/applicationprofile/ integration_test.go uses ["/bin/bash", "-c", ...] — full argv - Real fixture tests/resources/known-application-profile.yaml uses args: ["/bin/sleep", "infinity"] with args[0] == path The compare_exec_args_test.go cases that look like flags-only ([-s, ⋯]) are matcher-isolation unit tests, not the wired contract. Removing R-AP-12 would let users author flags-only profiles that silently fail to silence R0040 in production. Bumps storage replace to a7e6234349ab (storage main HEAD post-PR #27). * fix: address CodeRabbit third-review batch on PR #38 (0cf4a50) Three new findings on the May 9 review batch: 1. aplint_test.go R-AP-13 (Major) — embedded * was not flagged, only embedded ⋯. An arg like "foo*bar" linted clean, defeating the purpose of the wildcard-must-be-standalone rule. Added a parallel check for embedded wildcardIdentifier mirroring the existing dynamicIdentifier check. 2. aplint_test.go canonical-fixture skip path (Major) — t.Skipf on missing reference fixture silently disabled the gold-standard test if someone deleted/renamed it. Switched to t.Fatalf with a message that explains the file's role. Also switched the YAML-parse-failure path from Skipf to Fatalf — un-parseable YAML in tests/resources/ is a fixture-quality bug, not something to skip past. Kept Skipf only for the kind!=AP path, which correctly handles the legit non-AP fixtures (Deployments, Services etc.) that share the directory. 3. tamper_alert_test.go (Trivial nitpick, 'Low value') — added an integration-style test that exercises the full verifyUserApplicationProfile path: real signature.SignObjectDisableKeyless, tamper the spec, call verifyUserApplicationProfile, assert tamperEmitted carries the key. Then re-sign at a new RV and assert the dedup is cleared. Confirms the wiring 'verifier returns ErrSignatureMismatch → classify as tamper → LoadOrStore in dedup map' actually fires, not just the LoadOrStore call in isolation. Uses the real cosign adapter — no mocking needed. * fix(test): make dedup-clearing assertion non-trivial CodeRabbit nitpick on PR #38 (tamper_alert_test.go:159): the prior version of TestVerifyAP_TamperedProfile_PopulatesDedupMap bumped the profile's ResourceVersion to '43' before the re-sign, then asserted that the key for RV='43' was absent from tamperEmitted. But '43' was never added in the first place — only '42' was added during the tamper phase. The assertion was trivially true. Drop the RV bump: re-sign at the SAME RV='42' so that verifyUserApplicationProfile takes the verify-clean branch and Deletes the existing dedup entry. The assertion now actually exercises the clearing path. Production code unchanged. * fix(ci): drop 'permissions: read-all' from reusable workflows Chore PR #39 added 'permissions: read-all' to every workflow file including the workflow_call reusables. That broke the reusable-call contract: a called workflow's top-level permissions are the FLOOR the caller must grant. Callers of these reusables (pr-created.yaml's pr-created job, pr-merged.yaml's pr-merged job, etc.) only grant a narrow set of scopes (id-token:write, packages:write, security-events: write, pull-requests:write, contents). They do NOT grant the full read-all set (actions:read, artifact-metadata:read, attestations:read, checks:read, deployments:read, discussions:read, issues:read, models:read, vulnerability-alerts:read, packages:read, pages:read, repository-projects:read, statuses:read, id-token:read). Result: every PR opened on this fork since chore #39 merged (May 6) has had pr-created.yaml startup_failure with: Error calling workflow ... incluster-comp-pr-created.yaml@<sha>. The workflow is requesting 'actions: read, ...', but is only allowed [...] Fix: strip top-level 'permissions: read-all' from the four reusables: - benchmark.yaml (also direct-trigger; falls back to per-job perms) - go-basic-tests.yaml - incluster-comp-pr-created.yaml - incluster-comp-pr-merged.yaml Each reusable's per-job 'permissions:' blocks remain intact and correctly request only what the job needs. Callers' existing grants are sufficient. Top-level 'permissions: read-all' stays on the OUTERMOST workflows (pr-created.yaml, pr-merged.yaml, build.yaml, bypass.yaml, component-tests.yaml, sign-object.yaml, scorecard.yml) — they're event-triggered, not workflow_call'd, so the read-all default still hardens GITHUB_TOKEN there. --------- Co-authored-by: Entlein <eineintlein@gmail.com>

entlein merged commit ba008cf into main Apr 24, 2026
1 of 3 checks passed

entlein deleted the ci/drop-mirror-sync branch April 24, 2026 12:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: delete mirror-sync and fork-ci/ (companion to storage SOURCE_REF PR)#27

ci: delete mirror-sync and fork-ci/ (companion to storage SOURCE_REF PR)#27
entlein merged 1 commit intomainfrom
ci/drop-mirror-sync

entlein commented Apr 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

entlein commented Apr 24, 2026

Summary

Why

Deletions

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant