build(deps): bump actions/checkout from 5.0.1 to 6.0.2 by dependabot[bot] · Pull Request #4 · vaaraio/vaara

dependabot · 2026-04-21T08:51:37Z

Bumps actions/checkout from 5.0.1 to 6.0.2.

Release notes

Sourced from actions/checkout's releases.

v6.0.2

What's Changed

Add orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is set by @TingluoHuang in actions/checkout#2355

Fix tag handling: preserve annotations and explicit fetch-tags by @ericsciple in actions/checkout#2356

Full Changelog: actions/checkout@v6.0.1...v6.0.2

v6.0.1

What's Changed

Update all references from v5 and v4 to v6 by @ericsciple in actions/checkout#2314

Add worktree support for persist-credentials includeIf by @ericsciple in actions/checkout#2327

Clarify v6 README by @ericsciple in actions/checkout#2328

Full Changelog: actions/checkout@v6...v6.0.1

v6.0.0

What's Changed

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

Persist creds to a separate file by @ericsciple in actions/checkout#2286

v6-beta by @ericsciple in actions/checkout#2298

update readme/changelog for v6 by @ericsciple in actions/checkout#2311

Full Changelog: actions/checkout@v5.0.0...v6.0.0

v6-beta

What's Changed

Updated persist-credentials to store the credentials under $RUNNER_TEMP instead of directly in the local git config.

This requires a minimum Actions Runner version of v2.329.0 to access the persisted credentials for Docker container action scenarios.

Changelog

Sourced from actions/checkout's changelog.

Changelog

v6.0.2

Fix tag handling: preserve annotations and explicit fetch-tags by @ericsciple in actions/checkout#2356

v6.0.1

Add worktree support for persist-credentials includeIf by @ericsciple in actions/checkout#2327

v6.0.0

Persist creds to a separate file by @ericsciple in actions/checkout#2286

Update README to include Node.js 24 support details and requirements by @salmanmkc in actions/checkout#2248

v5.0.1

Port v6 cleanup to v5 by @ericsciple in actions/checkout#2301

v5.0.0

Update actions checkout to use node 24 by @salmanmkc in actions/checkout#2226

v4.3.1

Port v6 cleanup to v4 by @ericsciple in actions/checkout#2305

v4.3.0

docs: update README.md by @motss in actions/checkout#1971

Add internal repos for checking out multiple repositories by @mouismail in actions/checkout#1977

Documentation update - add recommended permissions to Readme by @benwells in actions/checkout#2043

Adjust positioning of user email note and permissions heading by @joshmgross in actions/checkout#2044

Update README.md by @nebuk89 in actions/checkout#2194

Update CODEOWNERS for actions by @TingluoHuang in actions/checkout#2224

Update package dependencies by @salmanmkc in actions/checkout#2236

v4.2.2

url-helper.ts now leverages well-known environment variables by @jww3 in actions/checkout#1941

Expand unit test coverage for isGhes by @jww3 in actions/checkout#1946

v4.2.1

Check out other refs/* by commit if provided, fall back to ref by @orhantoy in actions/checkout#1924

v4.2.0

Add Ref and Commit outputs by @lucacome in actions/checkout#1180

Dependency updates by @dependabot- actions/checkout#1777, actions/checkout#1872

v4.1.7

Bump the minor-npm-dependencies group across 1 directory with 4 updates by @dependabot in actions/checkout#1739

Bump actions/checkout from 3 to 4 by @dependabot in actions/checkout#1697

Check out other refs/* by commit by @orhantoy in actions/checkout#1774

Pin actions/checkout's own workflows to a known, good, stable version. by @jww3 in actions/checkout#1776

v4.1.6

Check platform to set archive extension appropriately by @cory-miller in actions/checkout#1732

... (truncated)

Commits

de0fac2 Fix tag handling: preserve annotations and explicit fetch-tags (#2356)
064fe7f Add orchestration_id to git user-agent when ACTIONS_ORCHESTRATION_ID is set (...
8e8c483 Clarify v6 README (#2328)
033fa0d Add worktree support for persist-credentials includeIf (#2327)
c2d88d3 Update all references from v5 and v4 to v6 (#2314)
1af3b93 update readme/changelog for v6 (#2311)
71cf226 v6-beta (#2298)
069c695 Persist creds to a separate file (#2286)
ff7abcd Update README to include Node.js 24 support details and requirements (#2248)
See full diff in compare view

Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.1 to 6.0.2. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@eef6144...de0fac2) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: 6.0.2 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on constructor, #5 malformed-LLM-response handling) are deferred to a follow-up eval-script hardening pass. Major: 1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline result. Default fallback was "ALLOW", which would count a partial or malformed pipeline response as a successful jailbreak in ASR measurement. Now defaults to DENY when .decision is missing or unrecognised. 2. src/vaara/policy/loader.py — strict shape validation at every section boundary. Inputs like action_classes:[], thresholds:[], escalation:[], or sequences.foo.pattern:"abc" previously either AttributeError'd or got silently coerced (tuple("abc") => ("a","b","c")). New _require_mapping() and _require_sequence() helpers raise PolicyError with the field path. String/bytes are rejected by _require_sequence to prevent character-tuple coercion. 3. src/vaara/policy/loader.py — per-action threshold overrides now validated at load time. A malformed override like {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly and only blew up when threshold_for("foo") was queried. Now we construct Thresholds with the merged-with-default values during load and surface PolicyError with the offending action class path. 4. src/vaara/policy/schema.py — Policy is now actually frozen. action_classes and thresholds_overrides were declared on a frozen dataclass but were plain dicts, so callers could do policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__ now wraps both with MappingProxyType (nested override dicts too) and the field annotations switch to Mapping. Minor: 5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness" in two places (the v0.6 PAIR calibration section). 6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now expanduser()s the path before the existence check. Tests: 8 new cases in tests/test_policy.py cover the new validation and immutability paths. Full suite 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@Full

* feat: v0.6 policy schema (Sketch A) — JSON-native loader + [yaml] extra src/vaara/policy/ - New package with frozen dataclasses for action classes, thresholds, sequence patterns, and escalation routes - from_dict / from_json (stdlib, zero-dep) and from_yaml (gated on the [yaml] extra; raises ImportError with install hint if missing) - Hand-rolled validation with field-path error messages (e.g. "action_classes.x.category: 'bogus' is not one of [...]") - Reuses ActionCategory / Reversibility / BlastRadius / UrgencyClass / RegulatoryDomain from vaara.taxonomy.actions — no duplication - Threshold partial overrides: set just deny=0.75, inherit default escalate. Escalation routes match by article overlap with fallback to "on_call" tests/test_policy.py - 19 tests covering load paths, threshold resolution, escalation routing, validation errors, JSON / YAML loaders. All green. examples/policies/ - minimal.json + full.yaml as reference policies pyproject.toml - New [yaml] optional extra. Core dependencies = [] preserved. Implements Sketch A from research/dsl_design_exploration.md and v0.6 roadmap item 8. Sketches B (embedded Python DSL) and C (standalone DSL) stay deferred to v0.7+ pending external pull. * feat: v0.6 retention purge — Article 12(2) enforcement with documented seam src/vaara/audit/sqlite_backend.py - New SQLiteAuditBackend.purge_older_than(retention_seconds, *, dry_run=False) Tenant-scoped DELETE of records older than now() - retention_seconds. Returns count deleted (or count that would be deleted in dry_run mode). Validates retention_seconds is a positive int. src/vaara/cli.py - New `vaara trail purge --db PATH --retention-days N [--dry-run]` subcommand Prints count purged, plus a one-line note about the hash-chain seam reminding the deployer to export a signed zip before future purges. tests/test_audit_purge.py - 10 tests: purge empty DB, mixed-age corpus, dry_run idempotence, validation rejects (zero, negative, non-int retention), survival of remaining records, tenant isolation across two backends sharing one DB file. COMPLIANCE.md - "Audit trail integrity" → "Retention policy" updated to document the new purge mechanism AND the hash-chain seam at the retention boundary. Intended workflow (export-then-purge) explicitly described. HASH-CHAIN IMPACT (also in code docstring): surviving records reference deleted predecessors via previous_hash, so vaara trail verify reports a chain break at the retention boundary. The signed handoff zip remains self-consistent forever; the live DB chain has a documented seam. The v0.7+ severable-bundles design (research/severable_bundles_sketch.md) eliminates the seam if needed. Implements roadmap item 3 from research/roadmap.md. 287/287 non-skipped tests pass; ruff clean. * docs: COMPLIANCE.md — Annex IV evidence sections + CEN-CENELEC alignment Two new top-level sections cashing in the JTC21 WG2 deep-dive research (see research/jtc21_wg2_deep_dive.md). Compliance teams now have explicit pointers to where Vaara fits in two new dimensions of the EU AI Act / harmonised-standards story. EU AI Act Annex IV evidence sections - Maps Vaara's contribution to each of the nine Annex IV sections - Direct fill: §3 (monitoring), §5 (risk mgmt), §9 (post-market) - Contributes: §2 (elements), §4 (metrics), §6 (changes), §7 (standards) - Out of scope: §1 (general description), §8 (DoC document) CEN-CENELEC harmonised standards alignment - Per-standard table for the JTC21 WG2/3/4 work items Vaara aligns with: ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792 - Status, working group, and Vaara alignment notes per standard - Honest framing: pre-compliance positioning, not certified compliance. Once a standard publishes, expect a v0.6 / v0.7 alignment audit and an updated entry. Both sections are reviewable independently from any code change. This is the v0.6 standards-legibility story landing as a doc artifact. * feat: v0.6 distribution-shift split — hand-curated vs LLM-generated recall scripts/eval_distribution_shift.py - New script: runs the full Vaara stack against the adversarial corpus with per-source tagging (hand_curated vs llm_generated) - Reports attack recall (caught/total) and benign FPR (escalated/total) per source/class combination - Loads top-level *.jsonl as hand-curated, generated/ + benign_generated/ subdirs as LLM-generated. Same JSON schema, same stacking rule as scripts/eval_adversarial.py. tests/adversarial/distribution_shift_v0_5_3.json - Captured numbers for v0.5.3 stack against bundle v1.4 corpus. COMPLIANCE.md "Current limits" - Replaced the "owed in v0.6" bullet with the actual per-source split: - Hand-curated (held-out, 250): attack recall 97.1%, benign FPR 70.0% - LLM-generated (in-sample, 5,705): attack recall 95.2%, benign FPR 87.5% - Honest framing: hand-curated is held-out, LLM-generated is in-sample (was in classifier training). The 18pp benign-FPR gap is the dominant distribution-shift signal. - Reconciles with CHANGELOG headline "global FPR 21.0%": that was classifier-alone 5-fold CV OOF. Full-stack FPR is heuristic-dominated (most benign escalations come from the heuristic ESCALATE branch). - Notes that proper OOF split for the LLM-generated portion is a v0.7 follow-up if the gap demands it. Implements roadmap item 5 from research/roadmap.md. 287/287 tests pass; ruff clean. * feat: v0.6 stack ablation — heuristic / classifier / full-stack measurement scripts/eval_stack_ablation.py - New script: runs three configurations against the same corpus (hand-curated + LLM-generated, source-tagged): heuristic_only pipe.intercept() with classifier=None classifier_only classifier.score() + threshold, no heuristic full_stack pipe.intercept() + classifier upgrade - Reports recall (attacks) and FPR (benigns) per source/class. tests/adversarial/stack_ablation_v0_5_3.json - Captured numbers for v0.5.3 stack against bundle v1.4 corpus. Public artifact next to the existing eval results files. COMPLIANCE.md "Current limits" - New "Stack composition (v0.6 measurement)" bullet summarizing the layer-level findings: heuristic_only recall: 35% / 63% (hand-curated / LLM-generated) classifier_only recall: 94% / 86% full_stack recall: 97% / 98% - Layers not redundant: heuristic catches some attacks classifier misses. - Most full-stack benign FPR comes from heuristic ESCALATEs, not from classifier upgrades on heuristic-ALLOWed entries. Honest framing: - Heuristic alone is insufficient as adversarial defence (35% recall on hand-curated). Classifier carries most of the recall. - Heuristic adds 6 unique attack catches on hand-curated (full-stack 198 vs classifier-only 192), justifying the ensemble. - Full-stack 70-91% benign FPR is the deliberate v0.5.3 trade-off. v0.7 needs operator-queue ergonomics or threshold-tuning recipes via the YAML policy schema (item 8) to manage escalate volume. Implements roadmap item 6 from research/roadmap.md. 287/287 tests pass; ruff clean. * feat: v0.6 transparency taxonomy — prEN ISO/IEC 12792 four-axis tagging src/vaara/audit/trail.py - New TRANSPARENCY_DEFAULTS dict mapping each EventType to default values for the four 12792 axes (system_operation, data_usage, decision_making). 'limitations' stays None unless explicitly set — system-level constraints are recorded out-of-band, not per event. - AuditRecord gains four optional Optional[str] fields. Default-fill via __post_init__, but ONLY for newly constructed records (record_hash empty). Loaded-from-DB records (record_hash non-empty) skip the fill — inventing defaults retroactively would misrepresent historical records written before v0.6. - compute_hash() does NOT include transparency fields. They are metadata annotations, not tamper-evident. Trade-off explicitly documented in a NOTE comment. Pre-v0.6 chains stay valid because the hashable surface is unchanged. v0.7+ may add a separate signing mechanism if compliance teams require tamper-evident transparency tagging. src/vaara/audit/sqlite_backend.py - SCHEMA_VERSION 2 → 3. - SCHEMA_SQL gains four nullable columns at end of audit_records: system_operation, data_usage, decision_making, limitations. - _MIGRATIONS[2] adds them via ALTER TABLE on upgrade. Migration is non-destructive — pre-v0.6 rows get NULL for the new columns and their stored record_hash is preserved. - write_record persists the new fields; _row_to_record reads them with defensive length-guarded indexing for old query result shapes. tests/test_transparency_taxonomy.py - 9 tests: per-EventType defaults, override, loaded-record skip-fill, hash-chain backward compat (compute_hash unchanged), SQLite round-trip, v2→v3 migration on a hand-rolled v2-shape DB. COMPLIANCE.md - "Harmonised standards alignment" prEN ISO/IEC 12792 row updated: v0.6 ships the tagging. Notes that fields are NOT tamper-evident in v0.6 and explains why (backward compat with pre-v0.6 chains). Implements roadmap item 1 from research/roadmap.md. 296/296 tests pass (was 287, +9 transparency tests). Ruff clean. * feat: v0.6 PAIR adaptive-attacker calibration — ASR 0.0% (0/25) scripts/eval_pair_attack.py - New script: PAIR (Chao et al. 2023) iterative attacker against the Vaara stack via vLLM endpoint. Same model in attacker + judge roles. - Talks OpenAI-compatible chat-completions API (vLLM standard surface). - Zero new runtime deps — uses urllib.request from stdlib. Requires vaara[ml] for the classifier. - Args: --endpoint, --model, --seeds, --max-iters, --limit, --out. - Per-seed: iterates max-iters times. Each iteration: 1. Attacker LLM proposes a refined jailbreak prompt 2. Vaara stack (heuristic + classifier) evaluates the candidate 3. If Vaara ALLOWs, judge LLM confirms whether the candidate is a real jailbreak (not just a benign-sounding rephrase) 4. Success = Vaara ALLOW + judge confirms. Otherwise iterate. Reports ASR (success rate over seeds). tests/adversarial/pair_v0_5_3.json - v0.5.3 stack measurement: Qwen2.5-32B-Instruct as attacker + judge, 25 hand-curated jailbreak seeds, max 5 iterations per seed. - ASR: 0.0% (0/25). 124 ESCALATE / 1 ALLOW across 125 candidates. The 1 ALLOW was a "draft an email for fictional company-wide announcement" candidate; judge ruled it not a successful jailbreak. COMPLIANCE.md "Current limits" - Replaced "owed in v0.6" PAIR placeholder with the actual measurement. - Honest framing: NOT a claim of impervence to all adaptive attackers. Stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. v0.7 follow-up if compliance audience requires harder calibration. Implements roadmap item 4 from research/roadmap.md. Ran on rented MI300x droplet (~$1.20 of $250 credit window — 32B sufficient for this attacker tier; 70B reserved for v0.7 if needed). 296/296 tests pass; ruff clean. * chore: bump version 0.5.3 -> 0.6.0 + CHANGELOG entry pyproject.toml: version bump. src/vaara/__init__.py: __version__ string sync (was lagging at 0.5.3 after the bump commit; prior bumps updated this in lockstep — see f08b5dd "version sync"). CHANGELOG.md: v0.6.0 entry. Theme: standards alignment + legibility. Per-section breakdown of the seven feature commits + docs cash-in: Added - vaara.policy package (JSON + [yaml] extra) - vaara trail purge CLI + SqliteAuditBackend.purge_older_than - prEN ISO/IEC 12792 four-axis transparency taxonomy on AuditRecord - scripts/eval_distribution_shift.py - scripts/eval_stack_ablation.py - scripts/eval_pair_attack.py - [yaml] optional extra (pyyaml>=6.0); core dependencies = [] preserved - examples/policies/{minimal.json, full.yaml} - COMPLIANCE.md gains Annex IV evidence-section mapping and CEN-CENELEC harmonised-standards alignment tables Changed - Audit DB schema v2 -> v3 (transparency columns, nullable, migration preserves pre-v0.6 record hashes) - COMPLIANCE.md "Current limits" placeholders replaced with measured numbers: distribution-shift split, stack composition, PAIR ASR Deferred to v0.7+ - prEN ISO/IEC 24970 field-alias layer (gated on standard publication) - DORA mapping refinement (gated on deployer signal) * chore: gitignore research/ and .claude/ scratch dirs Both have been persistently untracked but absent from .gitignore, so every git status emitted noise. research/ is the local experimentation scratch (droplet sync, generators, notebooks) that mirrors the BD-docs-not-public convention already applied to docs/blog/ and docs/grant/. .claude/ is local Claude Code state with no project-side relevance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: silence CodeQL findings in test_transparency_taxonomy Three findings on PR #42: - Line 33 (failure, false positive). `for event_type in EventType:` flagged as "non-iterable in for loop". EnumType metaclass provides __iter__ at runtime; CodeQL's static analyzer can't resolve it. Cast to list(EventType) — semantically identical, analyzer-friendly. - Lines 110-128 and 162-173 (notices). SQLiteAuditBackend implements __enter__/__exit__ (sqlite_backend.py:817), so the try/finally backend.close() pattern was unnecessary. Refactored to with-blocks. Tests still pass (9/9 in this file, 296/296 overall, 12 skipped on vaara[ml] extras path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address CodeRabbit major findings on PR #42 Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on constructor, #5 malformed-LLM-response handling) are deferred to a follow-up eval-script hardening pass. Major: 1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline result. Default fallback was "ALLOW", which would count a partial or malformed pipeline response as a successful jailbreak in ASR measurement. Now defaults to DENY when .decision is missing or unrecognised. 2. src/vaara/policy/loader.py — strict shape validation at every section boundary. Inputs like action_classes:[], thresholds:[], escalation:[], or sequences.foo.pattern:"abc" previously either AttributeError'd or got silently coerced (tuple("abc") => ("a","b","c")). New _require_mapping() and _require_sequence() helpers raise PolicyError with the field path. String/bytes are rejected by _require_sequence to prevent character-tuple coercion. 3. src/vaara/policy/loader.py — per-action threshold overrides now validated at load time. A malformed override like {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly and only blew up when threshold_for("foo") was queried. Now we construct Thresholds with the merged-with-default values during load and surface PolicyError with the offending action class path. 4. src/vaara/policy/schema.py — Policy is now actually frozen. action_classes and thresholds_overrides were declared on a frozen dataclass but were plain dicts, so callers could do policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__ now wraps both with MappingProxyType (nested override dicts too) and the field annotations switch to Mapping. Minor: 5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness" in two places (the v0.6 PAIR calibration section). 6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now expanduser()s the path before the existence check. Tests: 8 new cases in tests/test_policy.py cover the new validation and immutability paths. Full suite 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address remaining CodeRabbit findings on PR #42 Cleans up the two new findings from the 11f0d11 re-review plus the two eval-script majors that were originally deferred. v0.6 surface fully clean — no outstanding CodeRabbit advisories. New findings on 11f0d11: - tests/test_policy.py — module-level pytest.importorskip("yaml") was silently skipping the JSON / validation / immutability tests above it on a non-[yaml] install. Moved into per-test invocations so only the two yaml-specific tests skip when pyyaml is absent. - scripts/eval_pair_attack.py — docstring referenced Llama-3.3-70B as attacker/judge, but the v0.6 calibration ran on Qwen2.5-32B-Instruct (per CHANGELOG and COMPLIANCE.md). Updated docstring. Originally deferred, now landed for completeness: - scripts/eval_distribution_shift.py — wrap AdversarialClassifier() constructor inside the same try/except as the import. Without ML extras, the constructor itself raises ImportError per its docstring, which previously bypassed the SystemExit path and crashed with a raw traceback. - scripts/eval_pair_attack.py — handle malformed LLM responses as per-seed errors. New LLMResponseError raised from call_llm() when the body fails to parse as JSON, lacks `choices`, or lacks `message.content`. Caller's except tuple extended so the run keeps going across the remaining seeds instead of dying on one bad response. Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address CodeRabbit findings on 422f4d5 — flaky test + measurement skew Two findings from the 11:23 UTC re-review: - tests/test_policy.py:249 — Minor flakiness. The unreadable-input test passed `"not json at all"` to from_json, which would mistakenly succeed if a file with that exact name existed in the CWD. Switched to a tmp_path-based missing path so the test is deterministic. - scripts/eval_distribution_shift.py:77 — Nitpick that CodeRabbit framed mildly but is functionally a measurement skew. Fallback "UNKNOWN" doesn't match any branch in evaluate() (deny / escalate / allow / error), so a missing result.decision would silently increment `n` without any outcome bucket, deflating recall and FPR reporting. Fail-closed pattern now matches eval_pair_attack.py (DENY for missing, DENY for unrecognised). Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: harden eval_distribution_shift script per CodeRabbit round-4 Three findings on a784831, all in scripts/eval_distribution_shift.py: - expected_category() (Major). Missing or invalid `expected` fields silently coerced to "DENY" then categorised as "attack", which would skew recall and FPR if a fixture had a typo. Now raises ValueError naming the offending entry id and the bad value(s). Allowed set: {ALLOW, DENY, ESCALATE}. - report() (Major). Metrics were printed even when buckets recorded ERROR outcomes, producing publishable but invalid percentages. Aborts via SystemExit if any bucket has errors > 0. - main() (Minor). corpus_root only checked for `.exists()` and the loop continued with zero entries (empty "successful" run). Now requires `.is_dir()` and aborts when zero JSONL entries load. Smoke-tested expected_category locally: missing field, bogus value, ALLOW, DENY all behave as expected. Full suite 304 passed / 12 skipped, ruff clean. Note: the v0.6 published numbers came from runs where every entry had a valid `expected` and zero pipeline errors, so this hardening doesn't invalidate the existing artifacts. It just prevents future accidental fixture drift from producing silent skew. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: pre-push lint sweep — ruff + bandit + mypy + pytest Adds scripts/lint_full.sh as the pre-push hygiene gate so CodeRabbit- class findings get caught locally before PR review round-trips. Documented in CONTRIBUTING.md and listed in the v0.6.0 CHANGELOG. Tools: - ruff (already in [dev]) — style + correctness - bandit>=1.7.5 (new) — security-focused static analysis - mypy>=1.8 (new) — types; strict on vaara.policy, lenient on legacy via [[tool.mypy.overrides]]. Modules join the strict block as they get cleaned up, so the typing floor only ratchets upward. - pytest (already) Bandit configuration in pyproject.toml: - skips=["B608"] with explanation. Every f-string SQL in audit/sqlite_backend.py interpolates only the output of _tenant_clause(), which returns one of two literal strings ("tenant_id = ?" or "1=1"). User values always go through `?` parameter binding. Bandit can't see the constraint, so the pattern trips B608 uniformly. False positive for our usage. - Two inline `# nosec` annotations: - mc_dropout_gate.py:94 (B614) — trusted-bundle fallback after weights_only=True already failed because the bundle contains sklearn objects. Operator gets a logger.warning telling them to ensure provenance. - trace_gen.py:201 (B311) — random.Random for synthetic-trace sampling, not security-sensitive. vaara.policy made strictly mypy-clean to seed the typing floor: - _coerce_enum is now generic on the enum type (TypeVar EnumT bound to Enum) so call sites get the precise enum back rather than object. - _require_mapping / _require_sequence get explicit object/dict/list annotations. - The yaml import type-ignore narrowed to [import-untyped] (the actual code mypy emits) instead of [import-not-found]. Local sweep runtime: ~10s. Tests still 304 passed / 12 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address all 10 CodeRabbit full-review findings CodeRabbit's @Full review surfaced 10 actionable items spanning audit data integrity, the public policy loader API, the v0.6 reproducibility scripts, and a CHANGELOG wording bug. All real, all addressed in one commit on top of the lint sweep (b569556). Audit data integrity (Major): - audit/sqlite_backend.py:_run_migrations was unsafe under partial failure. Each statement ran with `except Exception` swallowing every error before the schema_version got bumped unconditionally at the end. A real ALTER TABLE failure mid-migration left the DB marked at the new version while missing columns, then write_record() blew up on next insert. Fixed: only OperationalError messages matching "duplicate column" or "already exists" are swallowed (the idempotent- re-run case); everything else propagates. schema_version bumped per-version inside the loop, so a v2→v3 success followed by v3→v4 failure leaves the DB correctly at v3 instead of stuck at v2 or falsely advanced to v4. Regression test exercises a DB with partial pre-existing columns to confirm idempotency. - cli.py: `vaara trail purge` previously instantiated the backend with no tenant_id, which routes to _tenant_clause() == "1=1". On a shared multi-tenant audit DB that means every tenant's old records get deleted by anyone running the command. Tenant scoping is now required: the parser takes a mutually-exclusive --tenant TID or --all-tenants choice, both backed by an explicit decision. Policy loader public-API hygiene (Major): - policy/loader.py: unknown threshold keys now rejected. A typo like thresholds.default.deni would float-coerce, store, then silently fall back to default at threshold_for() query time. New _reject_unknown_keys() helper validates default and per-action override blocks against {escalate, deny}. - policy/loader.py: read-error normalisation. from_json() previously caught only FileNotFoundError; from_yaml() caught nothing. Public callers of either could see raw IsADirectoryError, PermissionError, UnicodeDecodeError, or generic OSError. New _read_policy_text() helper wraps Path.read_text() and translates the full set into PolicyError with a path-specific message. v0.6 reproducibility scripts (Major): - eval_stack_ablation.full_stack() previously called heuristic_only() again, double-running pipe.intercept() per entry and advancing sequence/agent-history state on the second pass. The published v0.6 ablation table assumes one intercept call per entry per config. Now full_stack takes the precomputed heuristic and classifier decisions and derives the upgrade rule from those, so the loop in evaluate() runs each pipeline path exactly once per entry. - eval_distribution_shift.py + eval_stack_ablation.py: per-entry agent_id derived from entry id (or python id() fallback) instead of collapsing every sample to a constant "adv". Sequence/agent-history state stays isolated across samples, matching corpus-measurement semantics. Same anti-pattern fixed in both scripts for consistency. - eval_pair_attack.py: classifier import is now a hard requirement. Silent fallback to heuristic-only meant a default invocation of the script wouldn't reproduce the published full-stack ASR — anyone re-running could legitimately report "ASR=N%" against a different stack and not notice. Now raises SystemExit if vaara[ml] is absent. Reproducibility metadata (Major): - tests/adversarial/pair_v0_5_3.json: model field expanded to attacker_model + judge_model + model_alias so a downstream auditor can see exactly what ran. Both attacker and judge were Qwen2.5-32B- Instruct per CHANGELOG / COMPLIANCE; the alias "qwen32b" stays for back-compat with any downstream code keying on it. Minor: - CHANGELOG.md v0.6.0 entry: removed misleading "vaara trail verify reports a chain break at the retention boundary" wording — that subcommand only validates signed zip bundles, not the live DB. Replaced with "the seam is exposed at load via hash mismatch", and the bullet now documents the new --tenant / --all-tenants gate. - eval_pair_attack.py: --model default flipped from "llama" to "Qwen2.5-32B-Instruct" so a default invocation matches the documented v0.6 calibration. Tests: - 6 new cases (4 in test_policy, 1 in test_transparency_taxonomy partial-migration regression, 1 implicit via existing audit_purge). - Full suite 310 passed / 12 skipped, ruff + bandit + mypy clean. - scripts/lint_full.sh exits PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Apr 21, 2026

dependabot Bot changed the title ~~build(deps): bump actions/checkout from 4.2.1 to 6.0.2~~ build(deps): bump actions/checkout from 5.0.1 to 6.0.2 Apr 21, 2026

dependabot Bot force-pushed the dependabot/github_actions/actions/checkout-6.0.2 branch from a603814 to 172e896 Compare April 21, 2026 08:55

vaaraio merged commit 3a5f39c into main Apr 21, 2026
4 checks passed

dependabot Bot deleted the dependabot/github_actions/actions/checkout-6.0.2 branch April 21, 2026 09:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump actions/checkout from 5.0.1 to 6.0.2#4

build(deps): bump actions/checkout from 5.0.1 to 6.0.2#4
vaaraio merged 1 commit into
mainfrom
dependabot/github_actions/actions/checkout-6.0.2

dependabot Bot commented on behalf of github Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v6.0.2

What's Changed

v6.0.1

What's Changed

v6.0.0

What's Changed

v6-beta

What's Changed

Changelog

v6.0.2

v6.0.1

v6.0.0

v5.0.1

v5.0.0

v4.3.1

v4.3.0

v4.2.2

v4.2.1

v4.2.0

v4.1.7

v4.1.6

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dependabot Bot commented on behalf of github Apr 21, 2026 •

edited

Loading