build(deps): bump actions/upload-artifact from 5.0.0 to 7.0.1 by dependabot[bot] · Pull Request #5 · vaaraio/vaara

dependabot · 2026-04-21T08:51:40Z

⚠️ Dependabot is rebasing this PR ⚠️

Rebasing might not happen immediately, so don't worry if this takes some time.

Note: if you make any changes to this PR yourself, they will take precedence over the rebase.

Bumps actions/upload-artifact from 5.0.0 to 7.0.1.

Release notes

Sourced from actions/upload-artifact's releases.

v7.0.1

What's Changed

Update the readme with direct upload details by @danwkennedy in actions/upload-artifact#795

Readme: bump all the example versions to v7 by @danwkennedy in actions/upload-artifact#796

Include changes in typespec/ts-http-runtime 0.3.5 by @yacaovsnc in actions/upload-artifact#797

Full Changelog: actions/upload-artifact@v7...v7.0.1

v7.0.0

v7 What's new

Direct Uploads

Adds support for uploading single files directly (unzipped). Callers can set the new archive parameter to false to skip zipping the file during upload. Right now, we only support single files. The action will fail if the glob passed resolves to multiple files. The name parameter is also ignored with this setting. Instead, the name of the artifact will be the name of the uploaded file.

ESM

To support new versions of the @actions/* packages, we've upgraded the package to ESM.

What's Changed

Add proxy integration test by @Link- in actions/upload-artifact#754

Upgrade the module to ESM and bump dependencies by @danwkennedy in actions/upload-artifact#762

Support direct file uploads by @danwkennedy in actions/upload-artifact#764

New Contributors

@Link- made their first contribution in actions/upload-artifact#754

Full Changelog: actions/upload-artifact@v6...v7.0.0

v6.0.0

v6 - What's new

[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (runs.using: node24) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.

Node.js 24

This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.

What's Changed

Upload Artifact Node 24 support by @salmanmkc in actions/upload-artifact#719

fix: update @actions/artifact for Node.js 24 punycode deprecation by @salmanmkc in actions/upload-artifact#744

prepare release v6.0.0 for Node.js 24 support by @salmanmkc in actions/upload-artifact#745

Full Changelog: actions/upload-artifact@v5.0.0...v6.0.0

Commits

043fb46 Merge pull request #797 from actions/yacaovsnc/update-dependency
634250c Include changes in typespec/ts-http-runtime 0.3.5
e454baa Readme: bump all the example versions to v7 (#796)
74fad66 Update the readme with direct upload details (#795)
bbbca2d Support direct file uploads (#764)
589182c Upgrade the module to ESM and bump dependencies (#762)
47309c9 Merge pull request #754 from actions/Link-/add-proxy-integration-tests
02a8460 Add proxy integration test
b7c566a Merge pull request #745 from actions/upload-artifact-v6-release
e516bc8 docs: correct description of Node.js 24 support in README
Additional commits viewable in compare view

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4.6.2 to 7.0.1. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@ea165f8...043fb46) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: 7.0.1 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on constructor, #5 malformed-LLM-response handling) are deferred to a follow-up eval-script hardening pass. Major: 1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline result. Default fallback was "ALLOW", which would count a partial or malformed pipeline response as a successful jailbreak in ASR measurement. Now defaults to DENY when .decision is missing or unrecognised. 2. src/vaara/policy/loader.py — strict shape validation at every section boundary. Inputs like action_classes:[], thresholds:[], escalation:[], or sequences.foo.pattern:"abc" previously either AttributeError'd or got silently coerced (tuple("abc") => ("a","b","c")). New _require_mapping() and _require_sequence() helpers raise PolicyError with the field path. String/bytes are rejected by _require_sequence to prevent character-tuple coercion. 3. src/vaara/policy/loader.py — per-action threshold overrides now validated at load time. A malformed override like {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly and only blew up when threshold_for("foo") was queried. Now we construct Thresholds with the merged-with-default values during load and surface PolicyError with the offending action class path. 4. src/vaara/policy/schema.py — Policy is now actually frozen. action_classes and thresholds_overrides were declared on a frozen dataclass but were plain dicts, so callers could do policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__ now wraps both with MappingProxyType (nested override dicts too) and the field annotations switch to Mapping. Minor: 5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness" in two places (the v0.6 PAIR calibration section). 6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now expanduser()s the path before the existence check. Tests: 8 new cases in tests/test_policy.py cover the new validation and immutability paths. Full suite 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@Full

* feat: v0.6 policy schema (Sketch A) — JSON-native loader + [yaml] extra src/vaara/policy/ - New package with frozen dataclasses for action classes, thresholds, sequence patterns, and escalation routes - from_dict / from_json (stdlib, zero-dep) and from_yaml (gated on the [yaml] extra; raises ImportError with install hint if missing) - Hand-rolled validation with field-path error messages (e.g. "action_classes.x.category: 'bogus' is not one of [...]") - Reuses ActionCategory / Reversibility / BlastRadius / UrgencyClass / RegulatoryDomain from vaara.taxonomy.actions — no duplication - Threshold partial overrides: set just deny=0.75, inherit default escalate. Escalation routes match by article overlap with fallback to "on_call" tests/test_policy.py - 19 tests covering load paths, threshold resolution, escalation routing, validation errors, JSON / YAML loaders. All green. examples/policies/ - minimal.json + full.yaml as reference policies pyproject.toml - New [yaml] optional extra. Core dependencies = [] preserved. Implements Sketch A from research/dsl_design_exploration.md and v0.6 roadmap item 8. Sketches B (embedded Python DSL) and C (standalone DSL) stay deferred to v0.7+ pending external pull. * feat: v0.6 retention purge — Article 12(2) enforcement with documented seam src/vaara/audit/sqlite_backend.py - New SQLiteAuditBackend.purge_older_than(retention_seconds, *, dry_run=False) Tenant-scoped DELETE of records older than now() - retention_seconds. Returns count deleted (or count that would be deleted in dry_run mode). Validates retention_seconds is a positive int. src/vaara/cli.py - New `vaara trail purge --db PATH --retention-days N [--dry-run]` subcommand Prints count purged, plus a one-line note about the hash-chain seam reminding the deployer to export a signed zip before future purges. tests/test_audit_purge.py - 10 tests: purge empty DB, mixed-age corpus, dry_run idempotence, validation rejects (zero, negative, non-int retention), survival of remaining records, tenant isolation across two backends sharing one DB file. COMPLIANCE.md - "Audit trail integrity" → "Retention policy" updated to document the new purge mechanism AND the hash-chain seam at the retention boundary. Intended workflow (export-then-purge) explicitly described. HASH-CHAIN IMPACT (also in code docstring): surviving records reference deleted predecessors via previous_hash, so vaara trail verify reports a chain break at the retention boundary. The signed handoff zip remains self-consistent forever; the live DB chain has a documented seam. The v0.7+ severable-bundles design (research/severable_bundles_sketch.md) eliminates the seam if needed. Implements roadmap item 3 from research/roadmap.md. 287/287 non-skipped tests pass; ruff clean. * docs: COMPLIANCE.md — Annex IV evidence sections + CEN-CENELEC alignment Two new top-level sections cashing in the JTC21 WG2 deep-dive research (see research/jtc21_wg2_deep_dive.md). Compliance teams now have explicit pointers to where Vaara fits in two new dimensions of the EU AI Act / harmonised-standards story. EU AI Act Annex IV evidence sections - Maps Vaara's contribution to each of the nine Annex IV sections - Direct fill: §3 (monitoring), §5 (risk mgmt), §9 (post-market) - Contributes: §2 (elements), §4 (metrics), §6 (changes), §7 (standards) - Out of scope: §1 (general description), §8 (DoC document) CEN-CENELEC harmonised standards alignment - Per-standard table for the JTC21 WG2/3/4 work items Vaara aligns with: ISO/IEC 42001, prEN 18286, prEN 18228, ISO/IEC 42006, prEN ISO/IEC 24970, prEN 18229-1, prEN ISO/IEC 12792 - Status, working group, and Vaara alignment notes per standard - Honest framing: pre-compliance positioning, not certified compliance. Once a standard publishes, expect a v0.6 / v0.7 alignment audit and an updated entry. Both sections are reviewable independently from any code change. This is the v0.6 standards-legibility story landing as a doc artifact. * feat: v0.6 distribution-shift split — hand-curated vs LLM-generated recall scripts/eval_distribution_shift.py - New script: runs the full Vaara stack against the adversarial corpus with per-source tagging (hand_curated vs llm_generated) - Reports attack recall (caught/total) and benign FPR (escalated/total) per source/class combination - Loads top-level *.jsonl as hand-curated, generated/ + benign_generated/ subdirs as LLM-generated. Same JSON schema, same stacking rule as scripts/eval_adversarial.py. tests/adversarial/distribution_shift_v0_5_3.json - Captured numbers for v0.5.3 stack against bundle v1.4 corpus. COMPLIANCE.md "Current limits" - Replaced the "owed in v0.6" bullet with the actual per-source split: - Hand-curated (held-out, 250): attack recall 97.1%, benign FPR 70.0% - LLM-generated (in-sample, 5,705): attack recall 95.2%, benign FPR 87.5% - Honest framing: hand-curated is held-out, LLM-generated is in-sample (was in classifier training). The 18pp benign-FPR gap is the dominant distribution-shift signal. - Reconciles with CHANGELOG headline "global FPR 21.0%": that was classifier-alone 5-fold CV OOF. Full-stack FPR is heuristic-dominated (most benign escalations come from the heuristic ESCALATE branch). - Notes that proper OOF split for the LLM-generated portion is a v0.7 follow-up if the gap demands it. Implements roadmap item 5 from research/roadmap.md. 287/287 tests pass; ruff clean. * feat: v0.6 stack ablation — heuristic / classifier / full-stack measurement scripts/eval_stack_ablation.py - New script: runs three configurations against the same corpus (hand-curated + LLM-generated, source-tagged): heuristic_only pipe.intercept() with classifier=None classifier_only classifier.score() + threshold, no heuristic full_stack pipe.intercept() + classifier upgrade - Reports recall (attacks) and FPR (benigns) per source/class. tests/adversarial/stack_ablation_v0_5_3.json - Captured numbers for v0.5.3 stack against bundle v1.4 corpus. Public artifact next to the existing eval results files. COMPLIANCE.md "Current limits" - New "Stack composition (v0.6 measurement)" bullet summarizing the layer-level findings: heuristic_only recall: 35% / 63% (hand-curated / LLM-generated) classifier_only recall: 94% / 86% full_stack recall: 97% / 98% - Layers not redundant: heuristic catches some attacks classifier misses. - Most full-stack benign FPR comes from heuristic ESCALATEs, not from classifier upgrades on heuristic-ALLOWed entries. Honest framing: - Heuristic alone is insufficient as adversarial defence (35% recall on hand-curated). Classifier carries most of the recall. - Heuristic adds 6 unique attack catches on hand-curated (full-stack 198 vs classifier-only 192), justifying the ensemble. - Full-stack 70-91% benign FPR is the deliberate v0.5.3 trade-off. v0.7 needs operator-queue ergonomics or threshold-tuning recipes via the YAML policy schema (item 8) to manage escalate volume. Implements roadmap item 6 from research/roadmap.md. 287/287 tests pass; ruff clean. * feat: v0.6 transparency taxonomy — prEN ISO/IEC 12792 four-axis tagging src/vaara/audit/trail.py - New TRANSPARENCY_DEFAULTS dict mapping each EventType to default values for the four 12792 axes (system_operation, data_usage, decision_making). 'limitations' stays None unless explicitly set — system-level constraints are recorded out-of-band, not per event. - AuditRecord gains four optional Optional[str] fields. Default-fill via __post_init__, but ONLY for newly constructed records (record_hash empty). Loaded-from-DB records (record_hash non-empty) skip the fill — inventing defaults retroactively would misrepresent historical records written before v0.6. - compute_hash() does NOT include transparency fields. They are metadata annotations, not tamper-evident. Trade-off explicitly documented in a NOTE comment. Pre-v0.6 chains stay valid because the hashable surface is unchanged. v0.7+ may add a separate signing mechanism if compliance teams require tamper-evident transparency tagging. src/vaara/audit/sqlite_backend.py - SCHEMA_VERSION 2 → 3. - SCHEMA_SQL gains four nullable columns at end of audit_records: system_operation, data_usage, decision_making, limitations. - _MIGRATIONS[2] adds them via ALTER TABLE on upgrade. Migration is non-destructive — pre-v0.6 rows get NULL for the new columns and their stored record_hash is preserved. - write_record persists the new fields; _row_to_record reads them with defensive length-guarded indexing for old query result shapes. tests/test_transparency_taxonomy.py - 9 tests: per-EventType defaults, override, loaded-record skip-fill, hash-chain backward compat (compute_hash unchanged), SQLite round-trip, v2→v3 migration on a hand-rolled v2-shape DB. COMPLIANCE.md - "Harmonised standards alignment" prEN ISO/IEC 12792 row updated: v0.6 ships the tagging. Notes that fields are NOT tamper-evident in v0.6 and explains why (backward compat with pre-v0.6 chains). Implements roadmap item 1 from research/roadmap.md. 296/296 tests pass (was 287, +9 transparency tests). Ruff clean. * feat: v0.6 PAIR adaptive-attacker calibration — ASR 0.0% (0/25) scripts/eval_pair_attack.py - New script: PAIR (Chao et al. 2023) iterative attacker against the Vaara stack via vLLM endpoint. Same model in attacker + judge roles. - Talks OpenAI-compatible chat-completions API (vLLM standard surface). - Zero new runtime deps — uses urllib.request from stdlib. Requires vaara[ml] for the classifier. - Args: --endpoint, --model, --seeds, --max-iters, --limit, --out. - Per-seed: iterates max-iters times. Each iteration: 1. Attacker LLM proposes a refined jailbreak prompt 2. Vaara stack (heuristic + classifier) evaluates the candidate 3. If Vaara ALLOWs, judge LLM confirms whether the candidate is a real jailbreak (not just a benign-sounding rephrase) 4. Success = Vaara ALLOW + judge confirms. Otherwise iterate. Reports ASR (success rate over seeds). tests/adversarial/pair_v0_5_3.json - v0.5.3 stack measurement: Qwen2.5-32B-Instruct as attacker + judge, 25 hand-curated jailbreak seeds, max 5 iterations per seed. - ASR: 0.0% (0/25). 124 ESCALATE / 1 ALLOW across 125 candidates. The 1 ALLOW was a "draft an email for fictional company-wide announcement" candidate; judge ruled it not a successful jailbreak. COMPLIANCE.md "Current limits" - Replaced "owed in v0.6" PAIR placeholder with the actual measurement. - Honest framing: NOT a claim of impervence to all adaptive attackers. Stronger attacker (70B+), longer iteration budgets, or alternate strategies (multi-turn drift, language-switch, obfuscation) might produce non-zero ASR. v0.7 follow-up if compliance audience requires harder calibration. Implements roadmap item 4 from research/roadmap.md. Ran on rented MI300x droplet (~$1.20 of $250 credit window — 32B sufficient for this attacker tier; 70B reserved for v0.7 if needed). 296/296 tests pass; ruff clean. * chore: bump version 0.5.3 -> 0.6.0 + CHANGELOG entry pyproject.toml: version bump. src/vaara/__init__.py: __version__ string sync (was lagging at 0.5.3 after the bump commit; prior bumps updated this in lockstep — see f08b5dd "version sync"). CHANGELOG.md: v0.6.0 entry. Theme: standards alignment + legibility. Per-section breakdown of the seven feature commits + docs cash-in: Added - vaara.policy package (JSON + [yaml] extra) - vaara trail purge CLI + SqliteAuditBackend.purge_older_than - prEN ISO/IEC 12792 four-axis transparency taxonomy on AuditRecord - scripts/eval_distribution_shift.py - scripts/eval_stack_ablation.py - scripts/eval_pair_attack.py - [yaml] optional extra (pyyaml>=6.0); core dependencies = [] preserved - examples/policies/{minimal.json, full.yaml} - COMPLIANCE.md gains Annex IV evidence-section mapping and CEN-CENELEC harmonised-standards alignment tables Changed - Audit DB schema v2 -> v3 (transparency columns, nullable, migration preserves pre-v0.6 record hashes) - COMPLIANCE.md "Current limits" placeholders replaced with measured numbers: distribution-shift split, stack composition, PAIR ASR Deferred to v0.7+ - prEN ISO/IEC 24970 field-alias layer (gated on standard publication) - DORA mapping refinement (gated on deployer signal) * chore: gitignore research/ and .claude/ scratch dirs Both have been persistently untracked but absent from .gitignore, so every git status emitted noise. research/ is the local experimentation scratch (droplet sync, generators, notebooks) that mirrors the BD-docs-not-public convention already applied to docs/blog/ and docs/grant/. .claude/ is local Claude Code state with no project-side relevance. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * test: silence CodeQL findings in test_transparency_taxonomy Three findings on PR #42: - Line 33 (failure, false positive). `for event_type in EventType:` flagged as "non-iterable in for loop". EnumType metaclass provides __iter__ at runtime; CodeQL's static analyzer can't resolve it. Cast to list(EventType) — semantically identical, analyzer-friendly. - Lines 110-128 and 162-173 (notices). SQLiteAuditBackend implements __enter__/__exit__ (sqlite_backend.py:817), so the try/finally backend.close() pattern was unnecessary. Refactored to with-blocks. Tests still pass (9/9 in this file, 296/296 overall, 12 skipped on vaara[ml] extras path). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address CodeRabbit major findings on PR #42 Triaged 9 CodeRabbit findings and addressed the 7 worth fixing for the v0.6.0 release. Two eval-script ergonomics findings (#4 ImportError on constructor, #5 malformed-LLM-response handling) are deferred to a follow-up eval-script hardening pass. Major: 1. scripts/eval_pair_attack.py:88 — fail closed on malformed Pipeline result. Default fallback was "ALLOW", which would count a partial or malformed pipeline response as a successful jailbreak in ASR measurement. Now defaults to DENY when .decision is missing or unrecognised. 2. src/vaara/policy/loader.py — strict shape validation at every section boundary. Inputs like action_classes:[], thresholds:[], escalation:[], or sequences.foo.pattern:"abc" previously either AttributeError'd or got silently coerced (tuple("abc") => ("a","b","c")). New _require_mapping() and _require_sequence() helpers raise PolicyError with the field path. String/bytes are rejected by _require_sequence to prevent character-tuple coercion. 3. src/vaara/policy/loader.py — per-action threshold overrides now validated at load time. A malformed override like {"foo": {"escalate": 0.9, "deny": 0.2}} previously parsed cleanly and only blew up when threshold_for("foo") was queried. Now we construct Thresholds with the merged-with-default values during load and surface PolicyError with the offending action class path. 4. src/vaara/policy/schema.py — Policy is now actually frozen. action_classes and thresholds_overrides were declared on a frozen dataclass but were plain dicts, so callers could do policy.thresholds_overrides["x"]["escalate"] = 0.9. __post_init__ now wraps both with MappingProxyType (nested override dicts too) and the field annotations switch to Mapping. Minor: 5. CHANGELOG.md, COMPLIANCE.md — typo "impervence" -> "imperviousness" in two places (the v0.6 PAIR calibration section). 6. src/vaara/cli.py — `vaara trail purge --db ~/path/audit.db` now expanduser()s the path before the existence check. Tests: 8 new cases in tests/test_policy.py cover the new validation and immutability paths. Full suite 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address remaining CodeRabbit findings on PR #42 Cleans up the two new findings from the 11f0d11 re-review plus the two eval-script majors that were originally deferred. v0.6 surface fully clean — no outstanding CodeRabbit advisories. New findings on 11f0d11: - tests/test_policy.py — module-level pytest.importorskip("yaml") was silently skipping the JSON / validation / immutability tests above it on a non-[yaml] install. Moved into per-test invocations so only the two yaml-specific tests skip when pyyaml is absent. - scripts/eval_pair_attack.py — docstring referenced Llama-3.3-70B as attacker/judge, but the v0.6 calibration ran on Qwen2.5-32B-Instruct (per CHANGELOG and COMPLIANCE.md). Updated docstring. Originally deferred, now landed for completeness: - scripts/eval_distribution_shift.py — wrap AdversarialClassifier() constructor inside the same try/except as the import. Without ML extras, the constructor itself raises ImportError per its docstring, which previously bypassed the SystemExit path and crashed with a raw traceback. - scripts/eval_pair_attack.py — handle malformed LLM responses as per-seed errors. New LLMResponseError raised from call_llm() when the body fails to parse as JSON, lacks `choices`, or lacks `message.content`. Caller's except tuple extended so the run keeps going across the remaining seeds instead of dying on one bad response. Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address CodeRabbit findings on 422f4d5 — flaky test + measurement skew Two findings from the 11:23 UTC re-review: - tests/test_policy.py:249 — Minor flakiness. The unreadable-input test passed `"not json at all"` to from_json, which would mistakenly succeed if a file with that exact name existed in the CWD. Switched to a tmp_path-based missing path so the test is deterministic. - scripts/eval_distribution_shift.py:77 — Nitpick that CodeRabbit framed mildly but is functionally a measurement skew. Fallback "UNKNOWN" doesn't match any branch in evaluate() (deny / escalate / allow / error), so a missing result.decision would silently increment `n` without any outcome bucket, deflating recall and FPR reporting. Fail-closed pattern now matches eval_pair_attack.py (DENY for missing, DENY for unrecognised). Tests still 304 passed / 12 skipped, ruff clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: harden eval_distribution_shift script per CodeRabbit round-4 Three findings on a784831, all in scripts/eval_distribution_shift.py: - expected_category() (Major). Missing or invalid `expected` fields silently coerced to "DENY" then categorised as "attack", which would skew recall and FPR if a fixture had a typo. Now raises ValueError naming the offending entry id and the bad value(s). Allowed set: {ALLOW, DENY, ESCALATE}. - report() (Major). Metrics were printed even when buckets recorded ERROR outcomes, producing publishable but invalid percentages. Aborts via SystemExit if any bucket has errors > 0. - main() (Minor). corpus_root only checked for `.exists()` and the loop continued with zero entries (empty "successful" run). Now requires `.is_dir()` and aborts when zero JSONL entries load. Smoke-tested expected_category locally: missing field, bogus value, ALLOW, DENY all behave as expected. Full suite 304 passed / 12 skipped, ruff clean. Note: the v0.6 published numbers came from runs where every entry had a valid `expected` and zero pipeline errors, so this hardening doesn't invalidate the existing artifacts. It just prevents future accidental fixture drift from producing silent skew. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: pre-push lint sweep — ruff + bandit + mypy + pytest Adds scripts/lint_full.sh as the pre-push hygiene gate so CodeRabbit- class findings get caught locally before PR review round-trips. Documented in CONTRIBUTING.md and listed in the v0.6.0 CHANGELOG. Tools: - ruff (already in [dev]) — style + correctness - bandit>=1.7.5 (new) — security-focused static analysis - mypy>=1.8 (new) — types; strict on vaara.policy, lenient on legacy via [[tool.mypy.overrides]]. Modules join the strict block as they get cleaned up, so the typing floor only ratchets upward. - pytest (already) Bandit configuration in pyproject.toml: - skips=["B608"] with explanation. Every f-string SQL in audit/sqlite_backend.py interpolates only the output of _tenant_clause(), which returns one of two literal strings ("tenant_id = ?" or "1=1"). User values always go through `?` parameter binding. Bandit can't see the constraint, so the pattern trips B608 uniformly. False positive for our usage. - Two inline `# nosec` annotations: - mc_dropout_gate.py:94 (B614) — trusted-bundle fallback after weights_only=True already failed because the bundle contains sklearn objects. Operator gets a logger.warning telling them to ensure provenance. - trace_gen.py:201 (B311) — random.Random for synthetic-trace sampling, not security-sensitive. vaara.policy made strictly mypy-clean to seed the typing floor: - _coerce_enum is now generic on the enum type (TypeVar EnumT bound to Enum) so call sites get the precise enum back rather than object. - _require_mapping / _require_sequence get explicit object/dict/list annotations. - The yaml import type-ignore narrowed to [import-untyped] (the actual code mypy emits) instead of [import-not-found]. Local sweep runtime: ~10s. Tests still 304 passed / 12 skipped. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: address all 10 CodeRabbit full-review findings CodeRabbit's @Full review surfaced 10 actionable items spanning audit data integrity, the public policy loader API, the v0.6 reproducibility scripts, and a CHANGELOG wording bug. All real, all addressed in one commit on top of the lint sweep (b569556). Audit data integrity (Major): - audit/sqlite_backend.py:_run_migrations was unsafe under partial failure. Each statement ran with `except Exception` swallowing every error before the schema_version got bumped unconditionally at the end. A real ALTER TABLE failure mid-migration left the DB marked at the new version while missing columns, then write_record() blew up on next insert. Fixed: only OperationalError messages matching "duplicate column" or "already exists" are swallowed (the idempotent- re-run case); everything else propagates. schema_version bumped per-version inside the loop, so a v2→v3 success followed by v3→v4 failure leaves the DB correctly at v3 instead of stuck at v2 or falsely advanced to v4. Regression test exercises a DB with partial pre-existing columns to confirm idempotency. - cli.py: `vaara trail purge` previously instantiated the backend with no tenant_id, which routes to _tenant_clause() == "1=1". On a shared multi-tenant audit DB that means every tenant's old records get deleted by anyone running the command. Tenant scoping is now required: the parser takes a mutually-exclusive --tenant TID or --all-tenants choice, both backed by an explicit decision. Policy loader public-API hygiene (Major): - policy/loader.py: unknown threshold keys now rejected. A typo like thresholds.default.deni would float-coerce, store, then silently fall back to default at threshold_for() query time. New _reject_unknown_keys() helper validates default and per-action override blocks against {escalate, deny}. - policy/loader.py: read-error normalisation. from_json() previously caught only FileNotFoundError; from_yaml() caught nothing. Public callers of either could see raw IsADirectoryError, PermissionError, UnicodeDecodeError, or generic OSError. New _read_policy_text() helper wraps Path.read_text() and translates the full set into PolicyError with a path-specific message. v0.6 reproducibility scripts (Major): - eval_stack_ablation.full_stack() previously called heuristic_only() again, double-running pipe.intercept() per entry and advancing sequence/agent-history state on the second pass. The published v0.6 ablation table assumes one intercept call per entry per config. Now full_stack takes the precomputed heuristic and classifier decisions and derives the upgrade rule from those, so the loop in evaluate() runs each pipeline path exactly once per entry. - eval_distribution_shift.py + eval_stack_ablation.py: per-entry agent_id derived from entry id (or python id() fallback) instead of collapsing every sample to a constant "adv". Sequence/agent-history state stays isolated across samples, matching corpus-measurement semantics. Same anti-pattern fixed in both scripts for consistency. - eval_pair_attack.py: classifier import is now a hard requirement. Silent fallback to heuristic-only meant a default invocation of the script wouldn't reproduce the published full-stack ASR — anyone re-running could legitimately report "ASR=N%" against a different stack and not notice. Now raises SystemExit if vaara[ml] is absent. Reproducibility metadata (Major): - tests/adversarial/pair_v0_5_3.json: model field expanded to attacker_model + judge_model + model_alias so a downstream auditor can see exactly what ran. Both attacker and judge were Qwen2.5-32B- Instruct per CHANGELOG / COMPLIANCE; the alias "qwen32b" stays for back-compat with any downstream code keying on it. Minor: - CHANGELOG.md v0.6.0 entry: removed misleading "vaara trail verify reports a chain break at the retention boundary" wording — that subcommand only validates signed zip bundles, not the live DB. Replaced with "the seam is exposed at load via hash mismatch", and the bullet now documents the new --tenant / --all-tenants gate. - eval_pair_attack.py: --model default flipped from "llama" to "Qwen2.5-32B-Instruct" so a default invocation matches the documented v0.6 calibration. Tests: - 6 new cases (4 in test_policy, 1 in test_transparency_taxonomy partial-migration regression, 1 implicit via existing audit_purge). - Full suite 310 passed / 12 skipped, ruff + bandit + mypy clean. - scripts/lint_full.sh exits PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: vaaraio <267591518+vaaraio@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

dependabot Bot added dependencies Pull requests that update a dependency file github_actions Pull requests that update GitHub Actions code labels Apr 21, 2026

dependabot Bot changed the title ~~build(deps): bump actions/upload-artifact from 4.6.2 to 7.0.1~~ build(deps): bump actions/upload-artifact from 5.0.0 to 7.0.1 Apr 21, 2026

dependabot Bot force-pushed the dependabot/github_actions/actions/upload-artifact-7.0.1 branch from 8c5c4f1 to 384b519 Compare April 21, 2026 08:55

vaaraio merged commit 7634132 into main Apr 21, 2026
4 checks passed

dependabot Bot deleted the dependabot/github_actions/actions/upload-artifact-7.0.1 branch April 21, 2026 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build(deps): bump actions/upload-artifact from 5.0.0 to 7.0.1#5

build(deps): bump actions/upload-artifact from 5.0.0 to 7.0.1#5
vaaraio merged 1 commit into
mainfrom
dependabot/github_actions/actions/upload-artifact-7.0.1

dependabot Bot commented on behalf of github Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dependabot Bot commented on behalf of github Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

v7.0.1

What's Changed

v7.0.0

v7 What's new

Direct Uploads

ESM

What's Changed

New Contributors

v6.0.0

v6 - What's new

Node.js 24

What's Changed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dependabot Bot commented on behalf of github Apr 21, 2026 •

edited

Loading