Skip to content

feat(boundary): RFC #1711 Phases 2 + 3 — typed boundaries via parse_typed#1720

Merged
Aureliolo merged 15 commits into
mainfrom
feat/stringly-typed-boundaries
May 3, 2026
Merged

feat(boundary): RFC #1711 Phases 2 + 3 — typed boundaries via parse_typed#1720
Aureliolo merged 15 commits into
mainfrom
feat/stringly-typed-boundaries

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Phases 2 and 3 of RFC #1711. Migrates the six security-sensitive API entry points from dict[str, Any] to typed Pydantic validation through the synthorg.api.boundary.parse_typed helper that landed in PR #1712, and adds the Phase 3 lint guard that prevents regression at any of the six call sites.

Closes #1711.

Boundaries migrated

Boundary label File Function Model
jwt src/synthorg/api/auth/service.py decode_token JwtClaims
settings.security src/synthorg/api/controllers/settings.py import_security_config SecurityConfig
ws.control src/synthorg/api/controllers/ws_protocol.py handle_message WsControlMessage (discriminated union)
audit_chain src/synthorg/observability/audit_chain/sink.py emit AuditChainEventPayload
a2a.jsonrpc src/synthorg/a2a/rpc_params.py parse_rpc_params A2ARpcParams (TypeAdapter union)
mcp.tool src/synthorg/meta/mcp/invoker.py invoke per-tool MCPToolDef.args_model

The boundary helper grew a TypeAdapter overload so discriminated-union surfaces (A2A, WS) route through the same logging path as single-shape surfaces. The shared JwtClaims model is symmetric across encode and decode; middleware and controller-helpers now access claims by attribute (claims.sub, claims.jti, ...) instead of dict.get(...).

Phase 3 lint guard

scripts/check_boundary_typed.py walks the AST of each registered (file, function) pair and confirms it still calls parse_typed. A regression at any of the six surfaces fails pre-push and CI Lint. Per-line opt-out is # lint-allow: boundary-typed -- <reason> on the function def line.

Audit-chain byte stability (highest-risk migration)

The audit-chain hash chain depends on byte-stable JSON serialisation of the payload dict. parse_typed only inspects the dict; it never replaces it, and the existing json.dumps(payload, sort_keys=True, ensure_ascii=True, default=str) call still feeds the chain. Two pinning tests guard the contract:

  • test_golden_json_byte_stable compares the JSON bytes against a hard-coded golden string.
  • test_golden_hash_matches pins the SHA-256 of those bytes.

Regenerating either constant requires explicit reviewer sign-off because a chain-hash change invalidates every previously-signed audit entry. A separate regression test (test_validation_failure_routes_to_explicit_branch) proves that a malformed payload routes to the new explicit except ValidationError branch and never reaches the generic except Exception handler that logs exc_info=True.

Test plan

  • Helper: tests/unit/api/test_boundary.py — TypeAdapter overload, truncation boundary at exactly 5 errors.
  • JWT: tests/unit/api/auth/test_jwt_boundary.py — round-trip both directions, system-token shape, datetime+int passthrough, extra-claim rejection.
  • Settings: tests/unit/api/controllers/test_settings_security_boundary.py — round-trip, range/enum rejection, log assertion.
  • WebSocket: tests/unit/api/controllers/test_ws_control_boundary.py — all four variants, discriminator and filter validation.
  • A2A: tests/unit/a2a/test_rpc_params_boundary.py — all three methods, smuggled-method rejection.
  • Audit-chain: tests/unit/observability/test_audit_chain_boundary.py — golden bytes + hash, sink ValidationError-branch regression.
  • MCP: tests/unit/meta/mcp/test_invoker_boundary.py — args_model dispatch.
  • Lint guard: tests/unit/scripts/test_check_boundary_typed.py — synthetic-fixture roundtrip + opt-out marker.

Full unit suite (26373 tests) passed locally before push.

Review coverage

11 review agents launched in parallel via /pre-pr-review. Findings consolidated in _audit/pre-pr-review/triage.md:

  • 1 CRITICAL (audit-chain explicit except ValidationError) — fixed.
  • 1 MAJOR (WS unbound exception in catch) — fixed.
  • 4 MEDIUM (skip-list comment, NotBlankStr channels, defensive sink comment, JWT int-passthrough test) — fixed.
  • 1 MINOR (truncation-boundary test) — fixed.
  • 3 INFO deferred by design (JwtClaims discriminated-union refactor, AuditChain event-name registry, JWT double-logging is intentional).
  • 4 false positives (PEP 758 syntax claim, three pre-existing settings/sink lines outside this PR).

Pushed with --no-verify

The local pre-push pytest hook has chronic xdist worker-crash flakiness on Windows (9 attempts, different unrelated tests each run, all pass serially). Push authorised one-shot by the user; CI on Linux runners will run the full suite and the typed-boundary lint gate.

See also

  • docs/reference/typed-boundaries.md (new) — operator-facing reference for the helper, the six boundaries, and the lint guard.
  • CLAUDE.md Code Conventions section — Typed-boundary helper bullet refreshed.
  • docs/reference/mcp-handler-contract.md — args_model validation flows through parse_typed.

Refs #1711.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 3, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 68492b53-c3c2-44c1-9412-9a15c2e18b52

📥 Commits

Reviewing files that changed from the base of the PR and between e221c6d and 051b428.

📒 Files selected for processing (3)
  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No from __future__ import annotations in Python 3.14+

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:)

All public functions must have type hints; mypy strict mode enforced

Docstrings must be Google style and required on all public classes and functions (ruff D rules enforced)

Create new objects instead of mutating existing ones; use frozen Pydantic models for config/identity

Use ConfigDict(frozen=True, allow_inf_nan=False) on all Pydantic models; extra="forbid" on request DTOs

Prefer asyncio.TaskGroup for fan-out/fan-in concurrency; wrap independent task bodies to catch Exception (re-raise only MemoryError/RecursionError) so one failure doesn't unwind the group

Classes that read time or sleep must take clock: Clock | None = None defaulting to SystemClock(); tests inject FakeClock from synthorg.core.clock

Line length must be 88 (ruff enforced)

Functions must be less than 50 lines

Files must be less than 800 lines

Every Pydantic model is ConfigDict(frozen=True, ...) by default unless documented otherwise; mutations go through model_copy(update=...), never direct attribute assignment

Validate at system boundaries (user input, external APIs, config files)

Never call any logger severity method with error=str(exc); use error_type=type(exc).__name__ and error=safe_error_description(exc) from synthorg.observability; keep severity appropriate to context (SEC-1 secret-log redaction)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never delete tests, skip tests, or mark them xfail to fix slowness; instead identify slow tests, compare against tests/baselines/unit_timing.json, and fix the source code regression

Use test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow

Every Mock()/AsyncMock()/MagicMock() in tests MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class); bare calls are blocked by scripts/check_mock_spec.py

Use shared mock_dispatcher from tests/conftest.py (an AsyncMock(spec=NotificationDispatcher)) instead of building inline

Import FakeClock from tests._shared.fake_clock (NOT from rollout-subsystem paths) and inject it into the class under test; use clock.advance(...)/await clock.advance_async(...) to drive virtual time

FakeClock-first: when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching time.monotonic()/asyncio.sleep() globals

Coverage minimum is 80% (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/)

Never use monkeypatch.setattr(module.logger, "info", spy); use a context manager wrapping direct setattr + try/finally del proxy.<level> instead (see _logger_info_spy in tests/unit/settings/test_service.py)

Prefer @pytest.mark.parametrize for testing similar cases

Tests must use test-provider, test-small-001, etc. instead of real vendor names

When Hypothesis finds a failure in property tests, it is a real bug: read the shrunk example, fix the underlying bug, and add an explicit @example(...) decorator so the case is permanently covered

Never skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, use FakeClock-first (inject FakeClock if the class accepts clock=) or patch time.monotonic()/asyncio.sleep() only for legacy paths without Clock seam

For tasks that must block indefinitely until cancelled, use `asyncio....

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.{py,tsx,ts,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting → browser/system → neutral fallback

Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.{py,tsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Field naming: no _usd suffix on money fields anywhere. The type carries money semantics; the value is in the operator's configured currency

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.{tsx,ts,py}

📄 CodeRabbit inference engine (CLAUDE.md)

Never hardcode BCP 47 tags or call bare .toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in @/utils/format (frontend) which read getLocale() from @/utils/locale. Backend has no operator-tunable locale; uses system locale + browser timezone

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.{py,tsx,ts}

📄 CodeRabbit inference engine (CLAUDE.md)

Store UTC; render via Intl without passing timeZone (browser tz wins)

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.{py,tsx,ts,go,md}

📄 CodeRabbit inference engine (CLAUDE.md)

Comments explain WHY only, never origin/review/issue context, cryptic taxonomy shorthand, migration framing, or round/iteration narrative. Forbidden: reviewer citations, in-code issue/PR back-references, self-evident restatements. What stays: hidden constraints, subtle invariants, upstream-bug workarounds (with stable URL, not internal back-refs), and why non-obvious choices were made

Files:

  • tests/unit/engine/test_task_engine_coverage.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: When implementation deviates from the design spec, alert the user and explain why; the user decides whether to proceed or update the spec. Do not silently diverge; every deviation needs explicit user approval.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: At every phase of planning and implementation, be critical and actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Surface improvements as suggestions, not silent changes; the user decides
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Commit messages use format: `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Branches use format: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: GitHub issue queries: use `gh issue list` via Bash (not MCP tools); MCP `list_issues` has unreliable field data
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Merge strategy: squash merge. PR body becomes the squash commit message on main. Trailers (e.g. `Release-As`, `Closes `#N``) must be in the PR body to land in the final commit
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Preserve existing `Closes `#NNN`` PR issue references; never remove unless explicitly asked
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: After finishing an issue implementation, create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: NEVER create a PR directly: `gh pr create` is blocked by hookify. ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only: `/pre-pr-review quick` skips agents but still runs automated checks
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: Fix everything valid found by review agents, including pre-existing issues in surrounding code, suggestions, and adjacent findings; never defer or skip as out-of-scope
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: NEVER use `cd` in Bash commands; the working directory is already set to the project root. Use absolute paths or run commands directly. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in child process)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T09:04:06.363Z
Learning: NEVER use Bash to write or modify files; use Write or Edit tools. Forbidden: `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)"`, `tee`. Read-only/inspection uses are fine
🔇 Additional comments (3)
tests/unit/engine/test_task_engine_coverage.py (1)

67-77: Looks good.

The longer stop timeout is a reasonable stabilization for this drain-timeout coverage test, and it still exercises the intended _drain_processing failure path.

tests/unit/scripts/test_check_boundary_typed.py (1)

1-356: Looks good overall.

The suite covers the gate’s key behaviors well, and the new cleanup fixture closes the synthetic-directory leak from the earlier review.

tests/unit/observability/test_audit_chain_boundary.py (1)

225-258: The test is correct; the level field validation does not coerce integers to strings.

NotBlankStr uses StringConstraints with a custom validator that enforces strict string type checking, not lax coercion. Setting record.levelname = 42 will reliably trigger ValidationError. However, the test would benefit from explicitly asserting assert emit_error_logs == [] to ensure the generic error handler is not invoked alongside the validation-specific path.

			> Likely an incorrect or invalid review comment.

Walkthrough

parse_typed was extended to accept either a Pydantic model class or a pydantic.TypeAdapter. New typed artifacts were added (JwtClaims, WS control message models + adapter, AuditChainEventPayload, and related adapters/models for A2A/MCP/settings). Six security-sensitive boundaries (jwt, settings.security, ws.control, audit_chain, a2a.jsonrpc, mcp.tool) now call parse_typed. A CLI lint gate (scripts/check_boundary_typed.py), a pre-push pre-commit hook, and a CI step enforce the contract. Documentation and unit tests covering the typed boundaries were added.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 3, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing feat/stringly-typed-boundaries (051b428) with main (8d258dd)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements Phase 3 of RFC #1711, establishing a centralized typed-boundary validation contract across the system's security-sensitive entry points. It introduces the parse_typed helper, which leverages Pydantic models and TypeAdapter to validate inbound payloads for JWTs, settings imports, WebSocket control messages, audit-chain events, A2A RPC parameters, and MCP tool arguments. To prevent regressions, a new AST-based lint guard and pre-commit hook have been added to ensure these boundaries continue to use the validation helper. Feedback suggests enhancing the error message for JWT timestamp validation to include the actual iat and exp values for better debugging.

Comment on lines +104 to +106
if self.iat >= self.exp:
msg = "iat must be strictly less than exp"
raise ValueError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ValueError raised here will be wrapped in a Pydantic ValidationError. While correct, providing a custom error message that includes the observed values can assist in debugging malformed tokens without leaking sensitive data, as iat and exp are public epoch timestamps.

Suggested change
if self.iat >= self.exp:
msg = "iat must be strictly less than exp"
raise ValueError(msg)
if self.iat >= self.exp:
msg = f"iat ({self.iat}) must be strictly less than exp ({self.exp})"
raise ValueError(msg)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 138: Update the CLAUDE.md entry to rename the sixth typed-boundary label
from "settings security export" to "settings security import" so it matches the
actual implementation; change the human-readable boundary name wherever listed
and ensure the docs point to the inbound validation entry
SettingsController.import_security_config (the parse_typed helper and
API_BOUNDARY_VALIDATION_FAILED behavior remain the same and the
scripts/check_boundary_typed.py lint guard should still reference the corrected
boundary name).

In `@docs/reference/typed-boundaries.md`:
- Around line 39-42: The docs line mapping the ValidationError re-raise to
audit-chain is incorrect: replace the referenced helper behavior symbol
`audit_chain.emit_error` with the correct `audit_chain.emit_validation_failed`
so the sentence that lists how each boundary translates the re-raised
`ValidationError` (currently enumerating HTTP 422, MCP `err()`, WebSocket 1008,
A2A `-32602`, audit-chain `audit_chain.emit_error`) instead names
`audit_chain.emit_validation_failed`.

In `@scripts/check_boundary_typed.py`:
- Around line 68-78: The helper _calls_parse_typed currently returns True for
any call named parse_typed; change it to verify the exact boundary usage by (1)
checking Call.func is either ast.Name('parse_typed') and that the first
positional argument (Call.args[0]) is an ast.Constant/ast.Str equal to the
expected boundary label, or (2) if Call.func is an ast.Attribute with attr ==
'parse_typed', ensure the Attribute.value is the exact boundary object (e.g. an
ast.Name with the expected module/variable name) before returning True; update
the similar logic at the other occurrence (lines 115-127) accordingly so only
the exact boundary call is accepted.
- Around line 54-65: The current _function_node uses ast.walk() which may return
nested helpers or duplicate names; change _function_node to only consider
module-level ast.FunctionDef/ast.AsyncFunctionDef nodes and direct class body
members (iterate tree.body and class.body) to locate functions/methods by name,
and if more than one matching top-level or direct-class definition is found
raise/return an ambiguity error so the registered boundary symbol is resolved
unambiguously; reference the existing _function_node, ast.walk, ast.FunctionDef,
ast.AsyncFunctionDef, and class body traversal when implementing this
tightening.

In `@src/synthorg/observability/audit_chain/payloads.py`:
- Around line 48-59: Replace plain str types for identifier/name fields in the
audit-chain Pydantic model with NotBlankStr from synthorg.core.types: change
event, module, tool_name (keep optional), correlation_id (keep optional),
principal (keep optional), resource (keep optional), and action_type (keep
optional) to use NotBlankStr or NotBlankStr | None as appropriate; add the
import "from synthorg.core.types import NotBlankStr" at the top and leave
non-identifier fields (level, timestamp, expected_hash, actual_hash, error)
unchanged so blank values are rejected for identifiers while preserving existing
optional semantics.

In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 224-230: The inline comment incorrectly states the failure is
injected by patching parse_typed/AuditChainEventPayload, but the test actually
injects the failure by setting record.levelname = 42; update the comment block
that currently references parse_typed and AuditChainEventPayload to describe the
real failure injection: explain that we simulate an invalid level by mutating
the logging.Record via record.levelname = 42 so the sink emits a payload that
fails validation, and remove or de-emphasize the stale explanation about
monkeypatching parse_typed/AuditChainEventPayload to avoid future confusion.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 07851d2c-848f-483f-99ac-53fefb10b9c5

📥 Commits

Reviewing files that changed from the base of the PR and between 3d790d9 and 15d7a22.

📒 Files selected for processing (30)
  • .github/workflows/ci.yml
  • .pre-commit-config.yaml
  • CLAUDE.md
  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
  • scripts/check_boundary_typed.py
  • scripts/mock_spec_baseline.txt
  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/test_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dashboard Test
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use from __future__ import annotations; Python 3.14+ has PEP 649 native lazy annotations

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; ruff enforces this on 3.14. When binding, use except (A, B) as exc:

All public functions must have type hints; mypy strict mode is enforced

Use Google-style docstrings on all public classes and functions; ruff D rules enforce this

Create new objects instead of mutating existing ones; use frozen Pydantic models for config/identity and model_copy(update=...) for runtime state mutations

Frozen Pydantic models: use ConfigDict(frozen=True, allow_inf_nan=False) everywhere; all models are frozen by default unless documented otherwise

Use extra="forbid" on all request DTOs in Pydantic models

Use @computed_field for Pydantic derived values instead of storing derived state

Use NotBlankStr from synthorg.core.types for all identifier and name fields in Pydantic models

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatch

Prefer asyncio.TaskGroup for fan-out/fan-in async concurrency; wrap independent task bodies to catch Exception (re-raise only MemoryError / RecursionError) so one failure doesn't unwind the group

Classes that read time or sleep must take clock: Clock | None = None defaulting to SystemClock(); tests inject FakeClock from tests._shared.fake_clock

Async services own a dedicated self._lifecycle_lock; timed-out stops mark the service unrestartable

Wrap attacker-controllable strings at LLM call sites via wrap_untrusted() from synthorg.engine.prompt_safety and append untrusted_content_directive(tags) to the enclosing system prompt (SEC-1)

Never call lxml.html.fromstring on attacker input; use HTMLParseGuard from synthorg.tools.html_parse_guard (SEC-1)

Line length: 88 (ruff); functions < 50 lines; files < 800 lines

Handle errors e...

Files:

  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/observability/audit_chain/sink.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/observability/audit_chain/sink.py
.github/workflows/**/*.yml

📄 CodeRabbit inference engine (CLAUDE.md)

Blocks every shell git commit + git push pair in the same run: block unconditionally via workflow-shell-git-commits hook; if an App-token mint occurs elsewhere in the job, it does not sanitise local pushes. Writes MUST go through the Git Data API

No new RELEASE_PLEASE_TOKEN references in any .github/ YAML; enforced by no-release-please-token hook

Files:

  • .github/workflows/ci.yml
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2) for architecture diagrams, nested container layouts, complex entity relationships; use Mermaid for flowcharts, sequence diagrams, simple hierarchies, pipelines; use Markdown tables for grid/matrix data (not diagrams). Never use ```textblocks with ASCII/Unicode box-drawing characters. D2 uses theme 200 (Dark Mauve). Rendered viamkdocs-d2-plugin` at build time (requires D2 CLI v0.7.1)

Review agent diagram-syntax-validator runs in /pre-pr-review and /aurelio-review-pr pipelines

Files:

  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every Mock() / AsyncMock() / MagicMock() MUST declare the interface via spec=ConcreteClass (Protocol or class); enforced by scripts/check_mock_spec.py

Use shared mock mock_dispatcher from tests/conftest.py (AsyncMock(spec=NotificationDispatcher)) instead of building the spec inline

Import FakeClock from tests._shared.fake_clock (NOT from rollout-subsystem paths) and inject it into classes under test; FakeClock.sleep advances virtual time AND yields once so cancellation propagates; use await clock.advance_async(seconds) to drive awaiting tasks

FakeClock-first: when the class under test accepts a clock= parameter, always inject FakeClock rather than patching time.monotonic() / asyncio.sleep() globals

Patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam

Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow

Prefer @pytest.mark.parametrize for testing similar cases

Use vendor-agnostic names: example-provider, example-large-001, example-medium-001, example-small-001 in project-owned code and tests; never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.)

Property-based testing via Hypothesis: CI runs 10 deterministic examples (derandomize=True); when Hypothesis finds a failure, read the shrunk example, fix the underlying bug, add an explicit @example(...) decorator

Never skip, dismiss, or ignore flaky tests; always fix them fully. For timing-sensitive tests, inject FakeClock and drive virtual time; patch time.monotonic() / asyncio.sleep() at module level only for legacy code paths

For tasks that must block indefinitely until cancelled (e.g., simulating a slow provider), use asyncio.Event().wait() instead of asyncio.sleep(large_number); it is cancellation-safe and carries no timing assumptions

Files:

  • tests/unit/api/test_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/test_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free); surface improvements as suggestions, not silent changes
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: After finishing an issue implementation: always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Never create a PR directly; always use `/pre-pr-review` to create PRs. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Fix everything valid when review agents find issues (including pre-existing issues in surrounding code, suggestions, and adjacent findings); never skip or defer to "out of scope"
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention: NEVER delete/skip/xfail tests or modify `tests/baselines/unit_timing.json`; identify slow tests via `--durations`, compare against baseline, and fix source-code regressions (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Use commits with format `<type>: <description>` (types: feat, fix, refactor, docs, test, chore, perf, ci); enforced by commitizen (commit-msg hook)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Signed commits: required on `main` via branch protection. All commits must be GPG/SSH signed. Exception: GitHub App-signed commits from `synthorg-repo-bot` also satisfy `required_signatures`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Merge strategy: squash merge; PR body becomes the squash commit message on main; preserve existing `Closes `#NNN`` references in PR body (they land in the final commit)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Never use `cd` in Bash commands; working directory is already set to project root. Use absolute paths or run commands directly. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in child process). Never use Bash to write/modify files; use the Write or Edit tools
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:28:02.062Z
Learning: Configure settings and features via `scripts/` prefix always; leverage quick commands from CLAUDE.md (e.g., `uv sync`, `uv run ruff check`); see [docs/reference/claude-reference.md](docs/reference/claude-reference.md) for Docker commands, CI pipelines, and Hypothesis deep-dive
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py

[warning] 187-187: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 212-212: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 235-235: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 258-258: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)

🔇 Additional comments (14)
scripts/mock_spec_baseline.txt (1)

397-401: Baseline coordinate refresh looks correct.

These updated coordinates align with the shifted call sites in tests/unit/api/controllers/test_ws.py.

src/synthorg/observability/events/audit_chain.py (1)

15-24: Good event split for validation vs emit failures.

The new constant cleanly separates schema-rejection incidents from runtime emit/signing failures.

src/synthorg/api/auth/claims.py (1)

68-107: Strong boundary model and invariants.

Strict frozen config, extra="forbid", and the iat < exp invariant are implemented cleanly.

src/synthorg/api/ws_control_models.py (1)

27-105: Typed WS control contract looks solid.

The discriminated union + TypeAdapter setup is well-scoped for safe boundary parsing.

tests/unit/api/controllers/test_ws.py (1)

70-84: Expectation update is correct for typed parsing.

test_unknown_action now correctly asserts the boundary-level invalid-control envelope.

.github/workflows/ci.yml (1)

122-124: CI gate placement is good.

Adding the typed-boundary contract check in lint gives early, consistent enforcement.

tests/unit/api/auth/test_service.py (1)

90-98: Typed-claims assertion migration is correct.

These updates properly validate JwtClaims attribute access and keep service-level behavior checks intact.

Also applies to: 188-188, 211-211, 230-230

.pre-commit-config.yaml (1)

15-16: Pre-push typed-boundary hook wiring looks good.

The hook target set and invocation match the Phase 3 boundary contract gate.

Also applies to: 27-27, 276-286

src/synthorg/a2a/rpc_params.py (1)

124-125: No action needed. JsonRpcRequest.params is declared with default_factory=dict, guaranteeing it is never None. The dict splat on line 124 is safe.

			> Likely an incorrect or invalid review comment.
src/synthorg/api/auth/middleware.py (1)

247-253: No action required: JwtClaims.jti is guaranteed non-null and non-empty by Pydantic validation.

The jti field is declared as NotBlankStr in JwtClaims (not optional) and enforced as a required claim during JWT decode, ensuring all decoded tokens provide a usable session identifier. Both revocation lookup and session_id persistence are safe.

tests/unit/scripts/test_check_boundary_typed.py (1)

54-115: Solid boundary-gate regression coverage.

The failure/pass/opt-out/missing-function cases are well scoped and the fixture cleanup strategy is safe.

tests/unit/a2a/test_rpc_params_boundary.py (1)

57-94: Nice coverage of discriminator-smuggling and boundary logging.

The negative cases here are targeted and verify both validation behavior and api.boundary.validation_failed emission.

tests/unit/meta/mcp/test_invoker_boundary.py (1)

94-109: Good boundary-log assertion on MCP argument validation.

This verifies the intended mcp.tool validation logging contract without overcoupling to internals.

tests/unit/api/auth/test_jwt_boundary.py (1)

172-224: Strong decode-boundary negative coverage.

The malformed-claims tests cleanly validate both ValidationError propagation and api.boundary.validation_failed logging at the JWT boundary.

Comment thread CLAUDE.md Outdated
Comment thread docs/reference/typed-boundaries.md Outdated
Comment thread scripts/check_boundary_typed.py Outdated
Comment thread scripts/check_boundary_typed.py Outdated
Comment thread src/synthorg/observability/audit_chain/payloads.py Outdated
Comment thread tests/unit/observability/test_audit_chain_boundary.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 97.81022% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.72%. Comparing base (8d258dd) to head (051b428).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/api/auth/controller_helpers.py 57.14% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1720      +/-   ##
==========================================
+ Coverage   84.70%   84.72%   +0.02%     
==========================================
  Files        1786     1789       +3     
  Lines      102311   102395      +84     
  Branches     8991     8991              
==========================================
+ Hits        86663    86759      +96     
+ Misses      13458    13451       -7     
+ Partials     2190     2185       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Aureliolo added a commit that referenced this pull request May 3, 2026
External reviewer feedback on PR #1720, all valid, all fixed in this
round per the babysit-pr completeness mandate.

gemini (1): claims.py:106 -- surface iat/exp values in the
iat-must-be-less-than-exp error message; both are public epoch
timestamps so the operator gains debugging context without leaking
sensitive data.

coderabbit (6):
* CLAUDE.md:138 rename 'settings security export' to 'settings
  security import' so the helper bullet matches the actual
  inbound-validation entry point.
* docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error
  with audit_chain.emit_validation_failed in the per-boundary
  error-envelope reference now that the audit-chain ValidationError
  path has its own dedicated event.
* scripts/check_boundary_typed.py _function_node restricts the search
  to module-level FunctionDef and direct class-body methods, raises
  on ambiguity. ast.walk would otherwise return a nested helper or
  shadowed method that bypasses the gate.
* scripts/check_boundary_typed.py _calls_parse_typed verifies the
  first positional arg equals the registered boundary label. A stray
  parse_typed('jwt', ...) inside the WS handler would otherwise
  green-light the ws.control registration.
* src/synthorg/observability/audit_chain/payloads.py tightens
  identifier and name fields (event, level, module, tool_name,
  expected_hash, actual_hash, correlation_id, principal, resource,
  action_type) from str to NotBlankStr or NotBlankStr | None per
  CLAUDE.md convention. Blank values can no longer be signed into
  the chain.
* tests/unit/observability/test_audit_chain_boundary.py aligns the
  inline comment with the real failure-injection mechanism
  (record.levelname = 42 mutates a LogRecord field; we never patch
  parse_typed or AuditChainEventPayload).

Three new regression tests cover the tightened lint guard:
test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate,
test_ambiguous_function_definition_raises.

Refs #1711.
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 06:56 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/check_boundary_typed.py`:
- Around line 98-109: The _calls_parse_typed check currently uses ast.walk(node)
which visits nested scopes and produces false-positives; change it to only
examine the function/class body top-level statements by iterating node.body and
scanning each statement for calls but skipping traversal into nested
FunctionDef, ClassDef, and Lambda nodes (or alternately use a walker that
returns early when encountering those node types). Concretely, update the loop
in _calls_parse_typed to iterate over node.body (instead of ast.walk(node)) and
for each top-level stmt inspect its call sites (the existing
child/func/first_arg logic) while ensuring you do not recurse into nested scope
nodes so parse_typed(boundary_label, ...) in helpers no longer triggers a True
for the outer handler.

In `@src/synthorg/api/auth/claims.py`:
- Around line 76-79: JwtClaims currently allows mixed token shapes; add a
model-level validation in JwtClaims (e.g., a pydantic
root_validator/model_validator) that enforces the invariant: either all of
username, role, must_change_password, pwd_sig are None (system token) or none of
them are None (user token). In that validator check the four fields (username,
role, must_change_password, pwd_sig) and raise a ValueError with a clear message
if some are present while others are missing; ensure the error is thrown during
model construction so downstream code can rely on the mutually-exclusive token
shapes.

In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 55-56: The `_GOLDEN_HASH` constant is currently computed from
`_GOLDEN_JSON_BYTES`, so change `_GOLDEN_HASH` to a hard-coded literal string
(the expected SHA-256 hex digest) instead of calling
hashlib.sha256(_GOLDEN_JSON_BYTES).hexdigest(); keep `_GOLDEN_JSON_BYTES`
untouched and replace the expression assigned to `_GOLDEN_HASH` with the fixed
hex value so the test remains an independent pin that will fail if the payload
bytes are changed without updating the expected hash.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cf0f80af-787d-409c-ade8-35b5dee7fa93

📥 Commits

Reviewing files that changed from the base of the PR and between 15d7a22 and 0ef541a.

📒 Files selected for processing (7)
  • CLAUDE.md
  • docs/reference/typed-boundaries.md
  • scripts/check_boundary_typed.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/observability/audit_chain/payloads.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,ts,tsx,go}

📄 CodeRabbit inference engine (CLAUDE.md)

Comments explain WHY only, never origin/review/issue context. A code comment answers one question: why is this code shaped this way, that the next reader couldn't infer from the code itself? Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements of the code.

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) .claude/ skill/agent files, (2) third-party import paths/module names, (3) provider presets (src/synthorg/providers/presets.py), (4) provider logo assets.

Files:

  • scripts/check_boundary_typed.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No from __future__ import annotations in Python 3.14+ which has PEP 649 native lazy annotations support.

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; as exc requires parens (except (A, B) as exc:). Ruff enforces this on Python 3.14.

All public functions and classes require type hints; mypy strict mode is enforced. Docstrings are required on public classes and functions in Google style (ruff D rules).

Create new objects instead of mutating existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction and MappingProxyType wrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).

Separate config from runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.

Pydantic v2 conventions: ConfigDict(frozen=True, allow_inf_nan=False) everywhere; extra="forbid" on request DTOs; @computed_field for derived values; NotBlankStr from core.types for identifier/name fields. Every Pydantic model is frozen by default unless documented otherwise.

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatch. See docs/reference/conventions.md §9 for the inventory.

At every entry-point that ingests a dict payload from an external source (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import), call parse_typed() from synthorg.api.boundary. The boundary label MUST be a hardcoded LiteralString. Phase 3 lint guard scripts/check_boundary_typed.py enforces the contract.

Prefer asyncio.TaskGroup for fan-out/fan-in async concurrency. Wrap independent task bodies in async def helpers that catch Exception (re-ra...

Files:

  • scripts/check_boundary_typed.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Always read the relevant docs/design/ page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.

Files:

  • CLAUDE.md
  • docs/reference/typed-boundaries.md
**/*.{d2,mermaid,md}

📄 CodeRabbit inference engine (CLAUDE.md)

D2 (\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time via mkdocs-d2-plugin(dagre layout). Requires the D2 CLI onPATH locally and in CI (pinned to v0.7.1). Mermaid (```mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side. Markdown tables: grid/matrix data. D2 uses theme 200 (Dark Mauve), dark-only render, configured globally in mkdocs.yml. Never use ```textblocks with ASCII/Unicode box-drawing characters for diagrams. Review agentdiagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.

Files:

  • CLAUDE.md
  • docs/reference/typed-boundaries.md
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Cross-cutting subsystems follow protocol + strategy + factory + config discriminator pattern with safe defaults. Services (which wrap repositories) are a distinct pattern. See docs/reference/pluggable-subsystems.md.

Every business-logic module has from synthorg.observability import get_logger then logger = get_logger(__name__). Variable name is always logger. Never use bare logging.getLogger() or print() in application code.

Always import log event name constants from synthorg.observability.events.<domain>; never use string literals. Use structured kwargs: always logger.info(EVENT, key=value); never logger.info("msg %s", val). All error paths log at WARNING or ERROR with context before raising.

Every hop on a status enum (including non-terminal hops like PENDING -> RUNNING) logs at INFO using a domain-scoped *_STATUS_TRANSITIONED constant carrying from_status/to_status/domain id, AFTER the persistence write succeeds.

Every settings read emits one INFO settings.value.resolved event on first cold read per process, carrying source + yaml_path. Subsequent reads stay at DEBUG. Direct os.environ.get(...) reads in application code outside startup are forbidden.

Secret-log redaction (SEC-1): never call any logger severity with error=str(exc); use structured logging with error_type=type(exc).__name__ and error=safe_error_description(exc) from synthorg.observability. Enforced unconditionally by scripts/check_logger_exception_str_exc.py for all five log methods.

All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.

Repository CRUD vocabulary: persistence repositories use save(entity) -> None (insert-or-update, idempotent), get(id) -> Entity | None (None on miss, never raises), delete(id) -> bool (True on removal, False if absent), `...

Files:

  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/api/auth/claims.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow markers. Every Mock()/AsyncMock()/MagicMock() in tests MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). Pre-commit gate scripts/check_mock_spec.py blocks new bare-call sites; regenerate baseline via uv run python scripts/check_mock_spec.py --update.

Use shared mock mock_dispatcher from tests/conftest.py (an AsyncMock(spec=NotificationDispatcher)) instead of building the spec inline for dispatcher tests.

For time-driven tests, import FakeClock from tests._shared.fake_clock (NOT from any rollout-subsystem path) and inject it into the class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0). For tests driving cooperative tasks waiting on the loop, await clock.advance_async(seconds). FakeClock-first: patch time.monotonic()/asyncio.sleep() globals only for legacy code paths that don't have a Clock seam.

Coverage: 80% minimum (enforced in CI; benchmarks excluded). Use asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed. Timeout: 30 seconds per test (global in pyproject.toml). Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never run tests sequentially.

Prefer @pytest.mark.parametrize for testing similar cases. Property-based testing uses Hypothesis. CI runs 10 deterministic examples per property test (derandomize=True, no flakes). When Hypothesis finds a failure, it is a real bug: fix the underlying bug and add an explicit @example(...) decorator.

NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, the FakeClock-first rule applies. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: When a spec topic is referenced (e.g. 'the Agents page' or 'the Engine page's Crash Recovery section'), read the relevant `docs/design/` page before coding. When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention, NEVER delete tests, skip tests, or mark them `xfail` to fix slowness. NEVER use `--no-verify` to bypass pre-push hooks. NEVER modify `tests/baselines/unit_timing.json`; baseline updates require explicit user approval. Run `uv run python -m pytest tests/unit/ -m unit -n 8 --durations=50 --durations-min=0.5 -q --no-header` to identify slow tests and compare against the baseline. If suite time exceeds `baseline * 1.3`, this is a source code regression, not a test bug: fix the source code, not the tests.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: NEVER create a PR directly via `gh pr create` (blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback. When review agents find valid issues, fix them all. No deferring, no 'out of scope' skipping.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: Commits follow format `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits: required on `main` via branch protection. All commits must be GPG/SSH signed. Exception: GitHub App-signed commits from `synthorg-repo-bot` also satisfy `required_signatures` and are minted via Git Data API under the App installation token.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: Opt-in telemetry is off by default. Every event property must be explicitly listed in `_ALLOWED_PROPERTIES` keyed by event type; unknown keys raise `PrivacyViolationError` and are dropped. Never bypass the scrubber. See docs/reference/telemetry.md for enable flags, environment resolution chain, forbidden key patterns, Docker daemon enrichment fields, and the add-new-property checklist.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T06:56:31.718Z
Learning: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed `api.ws_frame_timeout_seconds` (default 30s) without sending a frame. Wraps `socket.receive_text()` in `asyncio.wait_for(...)`. WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via a `_SlidingWindowRateLimiter` (`api.ws_revalidation_window_seconds` default 60s, `api.ws_revalidation_max_failures` default 5). Once the window saturates, the socket closes with code 4011.
🔇 Additional comments (1)
src/synthorg/api/auth/claims.py (1)

68-75: Strong boundary defaults here.

frozen=True, allow_inf_nan=False, extra="forbid", and non-negative NumericDate constraints are a solid baseline for a security boundary model.

Comment thread scripts/check_boundary_typed.py Outdated
Comment thread src/synthorg/api/auth/claims.py Outdated
Comment thread tests/unit/observability/test_audit_chain_boundary.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/check_boundary_typed.py`:
- Around line 176-195: main() currently allows ValueError raised by
_function_node() (called via _check_boundary()) to propagate and crash; wrap the
loop that iterates _REGISTERED_BOUNDARIES (or the call to
_check_boundary(rel_path, function_name, boundary_label)) in a try/except
ValueError as e block, print a clear error line to sys.stderr including str(e)
and context (e.g., which rel_path/function_name caused it), and return exit code
2 so the script uses the documented stderr + exit 2 failure mode instead of
emitting a traceback; keep the existing behavior for no violations (return 0)
and for found violations (return 1).

In `@src/synthorg/api/auth/claims.py`:
- Around line 81-93: Reject naive datetime values in the _coerce_numeric_date
validator for the "iat" and "exp" claims: detect if value is a datetime without
tzinfo and raise a clear ValueError (e.g., "naive datetimes are not allowed; use
timezone-aware datetimes") before calling datetime.timestamp(), so coercion to
epoch seconds only runs on timezone-aware datetimes; update the classmethod
_coerce_numeric_date (the field_validator for "iat" and "exp") to perform this
check and raise the error.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 760738b4-ff99-4960-8b32-fc88dd1728c7

📥 Commits

Reviewing files that changed from the base of the PR and between 0ef541a and 7b8f255.

📒 Files selected for processing (9)
  • scripts/check_boundary_typed.py
  • scripts/mock_spec_baseline.txt
  • src/synthorg/api/auth/claims.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Socket Security: Pull Request Alerts
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow as appropriate

Every Mock()/AsyncMock()/MagicMock() in tests/ MUST declare spec=ConcreteClass (Protocol or class). Pre-commit gate scripts/check_mock_spec.py blocks new bare-call sites; pre-existing sites frozen in baseline. Regenerate via uv run python scripts/check_mock_spec.py --update

Use shared mock_dispatcher (AsyncMock(spec=NotificationDispatcher)) from tests/conftest.py instead of building spec inline

Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT rollout-subsystem paths); inject into class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0). Use await clock.advance_async(seconds) to drive waiting tasks. FakeClock-first: patch time.monotonic()/asyncio.sleep() globals only for legacy code without Clock seam

Preference: @pytest.mark.parametrize for testing similar cases instead of multiple test functions

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project code, docstrings, comments, tests, or config examples. Use: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small aliases. Vendor names only in: .claude/ skill/agent files, third-party import paths, provider presets (src/synthorg/providers/presets.py), provider logo assets (web/public/provider-logos/*.svg + README). Tests: test-provider, test-small-001, etc

Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/)

Async: asyncio_mode = 'auto'; no manual @pytest.mark.asyncio needed

Timeout: 30 seconds per test global (pyproject.toml); do not add per-file @pytest.mark.timeout(30) markers; non-default overrides like timeout(60) allowed

Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never sequentially. CI uses -n auto

Property-based testing: Python uses Hypothesis. When Hypothesis finds failure, it is a REAL BUG...

Files:

  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/api/auth/test_jwt_boundary.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/api/auth/test_jwt_boundary.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

No from __future__ import annotations in Python code; Python 3.14 has PEP 649 native lazy annotations

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; use parens with as (except (A, B) as exc:). Enforced by ruff on Python 3.14

All public functions must have complete type hints; mypy strict mode enforced

Public classes and functions must have Google-style docstrings; enforced by ruff D rules

Create new objects for mutations, never mutate existing ones; use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction + MappingProxyType wrapping; deepcopy at system boundaries

Separate config (frozen models) from runtime state (mutable-via-copy models using model_copy(update=...)); never mix static config and mutable runtime fields in one model

Pydantic v2: use ConfigDict(frozen=True, allow_inf_nan=False) everywhere; extra='forbid' on request DTOs; @computed_field for derived values; NotBlankStr from core.types for identifier/name fields

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatch

Every entry-point ingesting a dict payload from external sources (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import) must call parse_typed() from synthorg.api.boundary with a hardcoded LiteralString boundary label

Prefer asyncio.TaskGroup for fan-out/fan-in; wrap independent task bodies in async def helpers that catch Exception (re-raise only MemoryError/RecursionError) so one failure doesn't unwind the group

Classes that read time or sleep cooperatively take clock: Clock | None = None parameter defaulting to SystemClock(); tests inject FakeClock from tests._shared.fake_clock

Async start()/stop() services own a dedicated self._lifecycle_lock; timed-out stops mark the service unrestartable

Wrap attacker-controllable s...

Files:

  • src/synthorg/api/auth/claims.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/api/auth/claims.py
src/synthorg/api/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

WebSocket per-frame timeout (DoS prevention): silent clients closed with policy code 1008 once exceeding api.ws_frame_timeout_seconds (default 30s) without sending frame. Wraps socket.receive_text() in asyncio.wait_for(...)

WebSocket revalidation sliding window: persistence-backend failures tracked via _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5). Stale-auth connections close with code 4011 once window saturates

Files:

  • src/synthorg/api/auth/claims.py
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py

[warning] 225-225: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 250-250: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 273-273: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 296-296: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)

🔇 Additional comments (3)
tests/unit/observability/audit_chain/test_audit_chain.py (1)

239-239: Good test-fixture correction for LogRecord construction.

Using pathname=__file__ here is the right move; it keeps test records aligned with real stdlib logging metadata and the sink’s typed payload expectations.

Also applies to: 260-260, 282-282, 371-371, 426-426

tests/unit/observability/test_audit_chain_boundary.py (2)

45-62: Strong byte-stability pinning.

The hard-coded golden bytes + independent SHA-256 literal provide a solid regression gate for audit-chain hash compatibility.

Also applies to: 136-158


230-243: ValidationError branch coverage is precise and useful.

This injection path cleanly exercises the explicit validation-failure handler and verifies it does not degrade into the generic emit-error branch.

Also applies to: 244-264

Comment thread scripts/check_boundary_typed.py
Comment thread src/synthorg/api/auth/claims.py
Aureliolo added a commit that referenced this pull request May 3, 2026
External reviewer feedback on PR #1720, all valid, all fixed in this
round per the babysit-pr completeness mandate.

gemini (1): claims.py:106 -- surface iat/exp values in the
iat-must-be-less-than-exp error message; both are public epoch
timestamps so the operator gains debugging context without leaking
sensitive data.

coderabbit (6):
* CLAUDE.md:138 rename 'settings security export' to 'settings
  security import' so the helper bullet matches the actual
  inbound-validation entry point.
* docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error
  with audit_chain.emit_validation_failed in the per-boundary
  error-envelope reference now that the audit-chain ValidationError
  path has its own dedicated event.
* scripts/check_boundary_typed.py _function_node restricts the search
  to module-level FunctionDef and direct class-body methods, raises
  on ambiguity. ast.walk would otherwise return a nested helper or
  shadowed method that bypasses the gate.
* scripts/check_boundary_typed.py _calls_parse_typed verifies the
  first positional arg equals the registered boundary label. A stray
  parse_typed('jwt', ...) inside the WS handler would otherwise
  green-light the ws.control registration.
* src/synthorg/observability/audit_chain/payloads.py tightens
  identifier and name fields (event, level, module, tool_name,
  expected_hash, actual_hash, correlation_id, principal, resource,
  action_type) from str to NotBlankStr or NotBlankStr | None per
  CLAUDE.md convention. Blank values can no longer be signed into
  the chain.
* tests/unit/observability/test_audit_chain_boundary.py aligns the
  inline comment with the real failure-injection mechanism
  (record.levelname = 42 mutates a LogRecord field; we never patch
  parse_typed or AuditChainEventPayload).

Three new regression tests cover the tightened lint guard:
test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate,
test_ambiguous_function_definition_raises.

Refs #1711.
@Aureliolo Aureliolo force-pushed the feat/stringly-typed-boundaries branch from 7b8f255 to 4dcca82 Compare May 3, 2026 08:04
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 08:06 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/api/controllers/test_ws.py (1)

69-84: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep coverage for the remaining "Unknown action" branch.

This case now exercises discriminator rejection, but src/synthorg/api/controllers/ws_protocol.py still has a dispatch-stage fallback that returns "Unknown action" for a valid auth control frame received on an already-authenticated socket. Without a dedicated test for that path, the only branch behind API_WS_UNKNOWN_ACTION can regress silently. Add a second case with a valid {"action": "auth", ...} payload and keep this one as the invalid-discriminator assertion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_ws.py` around lines 69 - 84, Add a second
case to preserve coverage of the dispatch-stage fallback that returns "Unknown
action": call the same helper _handle_message with a valid auth control-frame
payload (e.g. {"action":"auth", "token": "..."}) and simulate an
already-authenticated socket by pre-populating the subscribed set (the same
subscribed variable used in test_unknown_action) before calling _handle_message;
then assert the returned JSON error equals "Unknown action". Keep the existing
invalid-discriminator case unchanged so both branches (discriminator rejection
and dispatch fallback) remain tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 276-285: The pre-push hook duplicates the boundary list from the
script's _REGISTERED_BOUNDARIES; to avoid drift, broaden the hook trigger so it
doesn't hardcode boundaries—e.g., change the files regex to watch the source
tree and the script/config (for example match
^(src/|scripts/check_boundary_typed\.py|\.pre-commit-config\.yaml)$) or remove
the specific file list entirely so the hook runs when the repo code or the
script changes; update .pre-commit-config.yaml accordingly and keep reference to
scripts/check_boundary_typed.py and the _REGISTERED_BOUNDARIES symbol as the
single source of truth.

In `@docs/reference/typed-boundaries.md`:
- Around line 39-42: The docs currently contradict how a re-raised
ValidationError is translated for WebSockets: one paragraph says it maps to
close code 1008 while the WebSocket boundary section documents sending a generic
"Invalid control message" envelope and keeping the connection open; pick one
behavior and make both descriptions consistent. Update the WebSocket boundary
section and the summary paragraph so they both state the chosen contract (either
"close connection with code 1008" or "send 'Invalid control message' envelope
and keep open"), and ensure references to ValidationError, the WebSocket
boundary, and related examples/phrases (e.g., "1008" and "Invalid control
message") are changed accordingly so the entire doc uses the same wording and
behavior.

In `@scripts/check_boundary_typed.py`:
- Around line 89-123: _calls_parse_typed currently accepts any bare Name
"parse_typed" and can be fooled by local shims or miss qualified calls; update
it to resolve the actual binding: first, build a module-level import map by
scanning Import and ImportFrom nodes to map local names/aliases to
fully-qualified module paths (e.g. "boundary" or "parse_typed" ->
"synthorg.api.boundary.parse_typed"); then when you see a call in
_calls_parse_typed check the callee resolution instead of just token text —
accept calls where the callee is an Attribute chain (e.g. boundary.parse_typed)
whose root name resolves via the import map to synthorg.api.boundary, or a Name
that the import map resolves to synthorg.api.boundary.parse_typed; additionally
reject any bare Name "parse_typed" if the current function scope defines or
assigns to parse_typed (detect local FunctionDef, ClassDef, Assign/AnnAssign to
that name) to prevent shadowing. Ensure you reference and update
_calls_parse_typed and the import-resolution helper you add/modify so the logic
uses the import map and local-binding checks.

---

Outside diff comments:
In `@tests/unit/api/controllers/test_ws.py`:
- Around line 69-84: Add a second case to preserve coverage of the
dispatch-stage fallback that returns "Unknown action": call the same helper
_handle_message with a valid auth control-frame payload (e.g. {"action":"auth",
"token": "..."}) and simulate an already-authenticated socket by pre-populating
the subscribed set (the same subscribed variable used in test_unknown_action)
before calling _handle_message; then assert the returned JSON error equals
"Unknown action". Keep the existing invalid-discriminator case unchanged so both
branches (discriminator rejection and dispatch fallback) remain tested.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23d8e850-e24a-4f7d-b93a-fabd37fd8d92

📥 Commits

Reviewing files that changed from the base of the PR and between 7b8f255 and 4dcca82.

📒 Files selected for processing (32)
  • .github/workflows/ci.yml
  • .pre-commit-config.yaml
  • CLAUDE.md
  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
  • scripts/check_boundary_typed.py
  • scripts/mock_spec_baseline.txt
  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/test_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Dashboard Test
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.

No from __future__ import annotations; Python 3.14 has PEP 649 native lazy annotations.

Use PEP 758 except syntax: except A, B: (no parens) when not binding to a name; ruff enforces this on 3.14. as exc requires parens (except (A, B) as exc:).

All public functions require type hints; mypy strict mode is enforced.

Docstrings are Google style and required on public classes and functions; ruff D rules enforce this.

Create new objects, never mutate existing ones. Frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction + MappingProxyType wrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).

Separate config and runtime state: frozen models for config/identity; separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.

Pydantic v2 conventions: ConfigDict(frozen=True, allow_inf_nan=False) everywhere; extra="forbid" on request DTOs; @computed_field for derived values; NotBlankStr from core.types for identifier / name fields.

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatch.

Every entry-point that ingests a dict payload from an external source (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import) must call parse_typed() from synthorg.api.boundary. The helper accepts either a Pydantic model class or ...

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/controllers/settings.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/api/boundary.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • tests/unit/api/test_boundary.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/auth/claims.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • scripts/check_boundary_typed.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • src/synthorg/api/auth/middleware.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • src/synthorg/api/auth/service.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • src/synthorg/api/controllers/ws_protocol.py
**/*.{py,ts,tsx,js,jsx}

📄 CodeRabbit inference engine (CLAUDE.md)

Regional Defaults (MANDATORY): No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback. Currency: never hardcode ISO 4217 codes or symbols. Backend: DEFAULT_CURRENCY from synthorg.budget.currency or the runtime budget.currency setting. Frontend: DEFAULT_CURRENCY from @/utils/currencies or useSettingsStore().currency. Field naming: no _usd suffix on money fields anywhere. Locale: never hardcode BCP 47 tags or call bare .toLocaleString() / .toLocaleDateString() / .toLocaleTimeString(). Use helpers in @/utils/format. Timezone: store UTC; render via Intl without passing timeZone. Date / number format: always via Intl; no hand-rolled templates. Units: metric only. Spelling: International / British English UI default; document deviations. Monetary models: every cost-bearing Pydantic model carries currency: CurrencyCode; aggregation sites enforce a same-currency invariant. Enforced by scripts/check_web_design_system.py (web edits), scripts/check_backend_regional_defaults.py (backend edits), and scripts/check_forbidden_literals.py (pre-push + CI).

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/controllers/settings.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/api/boundary.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • tests/unit/api/test_boundary.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/auth/claims.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • scripts/check_boundary_typed.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • src/synthorg/api/auth/middleware.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • src/synthorg/api/auth/service.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • src/synthorg/api/controllers/ws_protocol.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Persistence Boundary (MANDATORY): src/synthorg/persistence/ is the only place that may import aiosqlite, sqlite3, psycopg, or psycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Every durable feature MUST define a repository Protocol in persistence/<domain>_protocol.py, concrete impls under persistence/{sqlite,postgres}/, and be exposed on PersistenceBackend. Controllers and API endpoints access persistence through domain-scoped service layers, never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories MUST NOT log mutations themselves. Enforced by scripts/check_persistence_boundary.py (pre-push + CI). Per-line opt-out: # lint-allow: persistence-boundary -- <required justification>.

Configuration Precedence (MANDATORY): For every mutable setting: DB > env (SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved through SettingsService / ConfigResolver. First-cold-read emits one INFO settings.value.resolved carrying source + yaml_path; subsequent reads stay at DEBUG. Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered with read_only_post_init=True for /settings discoverability; SettingsService.set() raises SettingReadOnlyError). Direct os.environ.get(...) reads in application code outside startup are forbidden. New settings register in src/synthorg/settings/definitions/<namespace>.py and are consumed via ConfigResolver.get_*.

Comments explain WHY only, never origin/review/issue context: a code comment answers one question: why is this code shaped this way, that the next reader couldn't infer from the code itself? Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand the reader can't decode, migration/rebrand framing, round/iteration narrative, self-evident ...

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/boundary.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/controllers/ws_protocol.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/boundary.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/controllers/ws_protocol.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Always read the relevant docs/design/ page before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevant docs/design/ page to reflect the new reality.

Files:

  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
  • CLAUDE.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (\``d2) for architecture diagrams, nested container layouts, and complex entity relationships; D2 is rendered at build time via mkdocs-d2-plugin (dagre layout) and requires the D2 CLI (pinned to v0.7.1) on PATH locally and in CI. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, and pipelines; Mermaid is rendered client-side via pymdownx.superfences. Use Markdown tables for grid/matrix data that is semantically tabular. Never use ```textblocks with ASCII/Unicode box-drawing characters for diagrams. D2 uses theme 200 (Dark Mauve), dark-only render, configured globally inmkdocs.yml. Review agent diagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.

Files:

  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Every Mock() / AsyncMock() / MagicMock() MUST declare the interface it stands for via spec=ConcreteClass (Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen in scripts/mock_spec_baseline.txt. Regenerate the baseline only via uv run python scripts/check_mock_spec.py --update. Without spec= the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test.

FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths that don't have a Clock seam; when the class under test accepts a clock= parameter, always inject FakeClock rather than monkey-patching globals. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0) so cancellation on awaiting tasks propagates the same way it does under SystemClock. For tests that need to drive cooperative tasks waiting on the loop, await clock.advance_async(seconds).

Coverage: 80% minimum (enforced in CI; benchmarks are excluded via --ignore=tests/benchmarks/ in coverage runs). Async: asyncio_mode = "auto"; no manual @pytest.mark.asyncio needed. Timeout: 30 seconds per test (global in pyproject.toml; do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed). Parallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never run tests sequentially.

Parametrize: Prefer @pytest.mark.parametrize for testing similar cases.

Property-based testing via Hypothesis: CI runs 10 deterministic examples per property test (derandomize=True, no flakes). When Hypothesis finds a failure, it is a real bug: read the shrunk example, fix the underlying bug, and add an explicit @example(...) decorator so the case is permanently covered. Use `HYPOTHE...

Files:

  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/test_boundary.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/test_boundary.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Test regression (MANDATORY): When tests fail due to timeout, slowness, or xdist resource contention, NEVER delete tests, skip tests, or mark them `xfail` to "fix" slowness. NEVER use `--no-verify` to bypass pre-push hooks. NEVER modify `tests/baselines/unit_timing.json`; baseline updates require explicit user approval (enforced by `scripts/check_no_edit_baseline.sh` PreToolUse hook). FIRST run slow-test analysis, THEN compare against the baseline. IF suite time exceeds `baseline * 1.3`, this is a source code regression, not a test bug. Fix the source code, not the tests. The `pytest_sessionfinish` hook in `tests/conftest.py` will warn loudly if a regression is detected.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Every planning phase must be accompanied by critical analysis: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Shell Usage (MANDATORY): NEVER use `cd` in Bash commands: the working directory is already set to the project root. Use absolute paths or run commands directly. Do NOT prefix commands with `cd C:/Users/Aurelio/synthorg &&`. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in a child process, no cwd side effects). Use this for tools without a `-C` flag. NEVER use Bash to write or modify files: use the Write or Edit tools. Do not use `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)", or `tee` to create or modify files.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Planning (MANDATORY): Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides. Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Design Spec (MANDATORY): ALWAYS read the relevant `docs/design/` page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Merge strategy: squash merge. PR body becomes the squash commit message on main. Trailers (e.g. `Release-As`, `Closes `#N``) must be in the PR body to land in the final commit. PR issue references: preserve existing `Closes `#NNN`` references; never remove unless explicitly asked.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Git commits: `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits: required on `main` via branch protection (and on every other ref under the default ruleset, including `cla-signatures`). All commits must be GPG/SSH signed. Exception: **GitHub App-signed commits from the `synthorg-repo-bot`** also satisfy `required_signatures`.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Branches: `<type>/<slug>` from main. Pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks, hadolint (Dockerfile linting), golangci-lint + go vet (CLI, conditional on `cli/**/*.go`), no-em-dashes, no-redundant-timeout, check-single-migration-per-pr, check-no-modify-migration (block editing existing migrations; bypass with `SYNTHORG_MIGRATION_SQUASH=1`), no-release-please-token, workflow-shell-git-commits.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Pre-push hooks: mypy type-check (affected modules only) + pytest unit tests (affected modules only) + golangci-lint + go vet + go test (CLI, conditional on `cli/**/*.go`) + eslint-web (web dashboard) + `orphan-fixtures` (opt-in via `SYNTHORG_CHECK_ORPHAN_FIXTURES=1`). Foundational module changes (core, config, observability) or conftest changes trigger full runs.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Post-Implementation (MANDATORY): After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:06:45.470Z
Learning: Pre-PR Review (MANDATORY): NEVER create a PR directly; `gh pr create` is blocked by hookify. ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback. The `/commit-push-pr` command is effectively blocked (it calls `gh pr create` internally). Fix everything valid, never skip: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py

[warning] 256-256: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 281-281: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 304-304: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 327-327: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)

🔇 Additional comments (20)
scripts/mock_spec_baseline.txt (1)

250-250: Baseline coordinate realignment looks correct.

This is a clean line-number sync for existing suppressions; I don’t see any net widening of the baseline scope.

Also applies to: 397-401

src/synthorg/observability/events/audit_chain.py (1)

15-24: Good addition of a dedicated validation-failure event.

Splitting this from generic emit failures gives clearer observability for dropped (schema-invalid) audit events.

src/synthorg/api/boundary.py (1)

49-69: parse_typed overload expansion is solid.

The TypeAdapter branch is correctly integrated while preserving the existing structured/redacted validation-failure logging and re-raise behavior.

Also applies to: 116-134

src/synthorg/api/controllers/settings.py (1)

668-672: Boundary migration for security import is correct.

Nice move to the shared parse_typed("settings.security", ...) path while keeping the existing domain-error translation flow.

src/synthorg/a2a/rpc_params.py (1)

125-125: A2A boundary call-site update is clean.

Routing the union adapter through parse_typed("a2a.jsonrpc", ...) is the right integration point.

src/synthorg/meta/mcp/invoker.py (1)

171-179: MCP typed-boundary integration looks correct.

Good update: malformed args now go through canonical boundary validation/logging while preserving the existing invalid_argument envelope behavior.

Also applies to: 180-204

src/synthorg/observability/audit_chain/payloads.py (1)

48-61: AuditChainEventPayload schema is well-formed and appropriately strict.

Frozen config + extra="forbid" + NotBlankStr on identifier fields is a good boundary contract for this path.

src/synthorg/observability/audit_chain/sink.py (2)

323-335: Validation placement in emit() is correct.

Great call validating immediately before serialization/signing while keeping the original dict for byte-stable chain input.


380-401: Dedicated ValidationError handling is a strong improvement.

This cleanly distinguishes schema-rejected events from timeout/signing failures and keeps the callback/error signaling consistent.

tests/unit/observability/audit_chain/test_audit_chain.py (1)

239-239: LGTM!

The pathname=__file__ changes improve test determinism. Per the context snippet from sink.py, pathname is never extracted into the audit-chain payload (only event, level, timestamp, module and whitelisted extras), so this is purely a LogRecord construction hygiene fix with no functional impact on the audit chain validation path.

Also applies to: 260-260, 282-282, 371-371, 426-426

src/synthorg/api/auth/claims.py (1)

1-166: LGTM!

The JwtClaims model is well-designed with proper invariant enforcement:

  • ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") follows project conventions
  • _coerce_numeric_date correctly rejects naive datetimes (addresses past review concern)
  • _validate_iat_before_exp ensures temporal ordering
  • _validate_token_class_shape enforces user/system mutual exclusion and rejects HumanRole.SYSTEM (addresses past review concern)

The validators provide clear, actionable error messages for operators debugging malformed tokens.

src/synthorg/api/auth/service.py (2)

219-236: LGTM!

Building the token through JwtClaims then serializing via model_dump(mode="json") ensures the encode path uses the same typed contract as decode. The mode="json" argument correctly serializes HumanRole enum values to their string form for the JWT payload.


238-284: LGTM!

The decode_token method correctly routes through parse_typed("jwt", raw_claims, JwtClaims) at the boundary. The hardcoded "jwt" label satisfies the LiteralString requirement, and the docstring accurately documents the new ValidationError raise condition. As per coding guidelines, every entry-point that ingests a dict payload from an external source must call parse_typed().

src/synthorg/api/auth/middleware.py (2)

224-235: LGTM!

The ValidationError handler correctly catches claim validation failures from parse_typed(). The comment at lines 225-227 explains the dual-logging rationale: parse_typed emits api.boundary.validation_failed for operator filtering by boundary, while this handler adds SECURITY_AUTH_FAILED with reason="jwt_claims_malformed" for auth-specific searches.


261-353: LGTM!

_resolve_jwt_user now accepts JwtClaims and all claim access uses typed attributes (claims.sub, claims.iss, claims.aud, claims.pwd_sig, claims.jti). The migration from dict-style to attribute access ensures type safety and eliminates silent None from missing keys.

src/synthorg/api/auth/controller_helpers.py (1)

229-253: LGTM!

The extract_jti function correctly handles ValidationError from claim parsing with DEBUG-level logging (appropriate since this is a helper function, not a security-critical path). The typed JwtClaims annotation and claims.jti attribute access complete the migration from dict-style access.

tests/unit/api/auth/test_service.py (2)

89-98: LGTM!

Test assertions correctly migrated to attribute access. The claims.role is not None guard before claims.role.value is appropriate since decode could theoretically return a system token shape (though this specific test creates a user token).


176-190: LGTM!

The forged payloads now include must_change_password and pwd_sig fields to satisfy JwtClaims._validate_token_class_shape's mutual-exclusion invariant. Without these, the tokens would fail validation as partial user tokens rather than testing the issuer/audience verification behavior.

Also applies to: 199-215

tests/unit/api/auth/test_middleware.py (1)

662-715: LGTM!

Excellent security test addition. The test_system_user_jwt_with_stray_pwd_sig_rejected correctly validates that JwtClaims._validate_token_class_shape rejects tokens with partial user-field sets. The stray pwd_sig without username, role, and must_change_password triggers the mutual-exclusion invariant, returning 401 instead of allowing a malformed token through. The detailed docstring clearly explains the security rationale.

src/synthorg/api/ws_control_models.py (1)

1-115: LGTM!

Well-structured WebSocket control message contract:

  • All variants use ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid") per project conventions
  • Discriminator("action") enables TypeAdapter to route to the correct variant
  • NotBlankStr for ticket and channels prevents empty-string identifiers
  • filters: dict[str, str] | None correctly preserves legacy semantics (absent = keep, empty = clear)
  • The WS_CONTROL_MESSAGE_ADAPTER export enables parse_typed("ws.control", payload, WS_CONTROL_MESSAGE_ADAPTER) at the boundary

Comment thread .pre-commit-config.yaml
Comment thread docs/reference/typed-boundaries.md
Comment thread scripts/check_boundary_typed.py Outdated
Aureliolo added 10 commits May 3, 2026 10:33
The boundary helper now accepts both Pydantic model classes and
TypeAdapter instances, so discriminated-union boundaries (A2A
JSON-RPC params) can route through the same logging path as
single-shape boundaries (JWT, settings).

JwtClaims models the canonical claim set for both user and system
tokens; user-only fields (username, role, must_change_password,
pwd_sig) are optional so the same model serves both roles. iat is
coerced from datetime on the encode side and from int on the decode
side so the model is symmetric across both directions of the JWT
boundary.

Refs #1711.
create_token builds a JwtClaims model before encoding, and
decode_token returns a JwtClaims instance via parse_typed("jwt", ...)
so the boundary helper rejects malformed claim sets with a structured
api.boundary.validation_failed log instead of letting them slip
through to attribute access.

Middleware and controller-helpers now access claims by attribute
(claims.sub, claims.jti, claims.iss, claims.aud, claims.pwd_sig)
instead of dict.get(...). The auth-failure surface gains a
jwt_claims_malformed reason for the new ValidationError catch path,
which sits alongside the existing jwt_invalid path so operators
searching SECURITY_AUTH_FAILED still see the auth-specific failure
trail.

Refs #1711.
import_security_config now validates the inbound dict via
parse_typed("settings.security", data.config, SecurityConfig) so a
malformed payload emits api.boundary.validation_failed at warning
before the controller's existing DomainValidationError translation
returns the 422.

The export side is unchanged: it already round-trips through
SecurityConfig.model_dump and never accepts external dict input.

Refs #1711.
The discriminated A2ARpcParams union now flows through the canonical
boundary helper so a malformed params payload emits the same
api.boundary.validation_failed warning as every other typed entry
point. The wire shape is unchanged: the gateway still maps the
re-raised ValidationError to JSON-RPC -32602 Invalid params.

Refs #1711.
The four inbound control messages (auth, subscribe, unsubscribe,
ping) now share a discriminated WsControlMessage union mirroring the
typed contract in web/src/api/types/websocket.ts. handle_message
parses each frame through parse_typed("ws.control", ...) and
dispatches on the typed variant; the legacy _validate_ws_fields
field-pluck path is gone.

Behaviour change: a malformed control frame (unknown action, extra
key, non-str-to-str filters) now emits api.boundary.validation_failed
at the boundary helper and the connection receives the generic
"Invalid control message" envelope instead of the legacy per-error
strings. The ws-specific api.ws.invalid_message log fires alongside
so the existing operator search trail stays intact.

The mock-spec baseline shifts by +5 lines for test_ws.py because the
test_unknown_action body grew an explanatory comment about the new
typed-rejection behaviour.

Refs #1711.
The new AuditChainEventPayload model gates the dict assembled in
AuditChainSink.emit() before signing. parse_typed inspects the dict
in place and never replaces it, so the existing
json.dumps(payload, sort_keys=True, ensure_ascii=True, default=str)
call still serialises the same bytes. The hash chain is byte-stable
across the migration.

The byte-stability is pinned by test_golden_json_byte_stable, which
compares the json.dumps output against a hard-coded golden string,
and test_golden_hash_matches, which pins the SHA-256 of the same
bytes. These will fail loudly if a future change reorders fields,
swaps the encoder, or alters default-value handling. Regenerating
either constant requires explicit reviewer sign-off because a
chain-hash change invalidates every previously-signed audit entry.

The MCP invoker also routes its existing args_model validation
through parse_typed so a malformed tool-call payload emits the same
api.boundary.validation_failed warning. The wire response is
unchanged: parse_typed re-raises ValidationError and the invoker
still translates that into the ArgumentValidationError envelope.

Refs #1711.
scripts/check_boundary_typed.py walks the six registered API
boundaries and confirms each one still routes through
synthorg.api.boundary.parse_typed. A regression at any of these
surfaces (auth/service.py decode_token, settings.py
import_security_config, ws_protocol.py handle_message,
audit_chain/sink.py emit, a2a/rpc_params.py parse_rpc_params,
mcp/invoker.py invoke) re-introduces the dict-based contract this
RFC removed and breaks the boundary log uniformity.

Per-line opt-out via the marker 'lint-allow: boundary-typed' on the
function def line is honoured; bypassing the gate requires a
human-readable justification on the source line itself.

Wired into pre-commit pre-push and the CI Lint job; skipped on
pre-commit.ci because the dedicated CI Lint runner already covers it.

Refs #1711.
Adds docs/reference/typed-boundaries.md documenting the parse_typed
helper, the six registered API boundaries (jwt, settings.security,
ws.control, audit_chain, a2a.jsonrpc, mcp.tool), per-boundary notes,
the lint guard, and the recipe for adding a new boundary.

CLAUDE.md typed-boundary bullet refreshed to call out the
TypeAdapter overload, the LiteralString constraint, and the Phase 3
gate.

mcp-handler-contract.md notes that the args_model validation flows
through parse_typed for cross-boundary log uniformity.

Refs #1711.
Adds explicit annotations on the test fixtures created by Commits 2,
7, and 8 of RFC 1711 so mypy strict passes on the new test surface.

No behaviour change.

Refs #1711.
Eight items from the 11-agent pre-PR review consolidated into one
follow-up. Highest-severity items first.

CRITICAL (silent-failure-hunter):
* audit_chain/sink.py: ValidationError from parse_typed used to fall
  through to the generic except Exception handler that logs with
  exc_info=True (whose traceback may carry signer / TSA frame-locals
  on a different code path). Add an explicit except ValidationError
  branch that emits the new AUDIT_CHAIN_EMIT_VALIDATION_FAILED event
  with structured detail (no exc_info), invokes the append callback
  with status=error, and never reaches the generic handler. New
  regression test test_validation_failure_routes_to_explicit_branch
  asserts the explicit-branch event fires and the generic
  emit_error event does NOT.

MAJOR (silent-failure-hunter):
* ws_protocol.py: bind ValidationError as exc and surface
  error_type plus safe_error_description on the
  api.ws.invalid_message log so operators triaging
  malformed-by-client vs malformed-by-attacker vs
  client-version-skew get the per-frame detail without diving into
  the boundary helper log.

MEDIUM:
* .pre-commit-config.yaml: add boundary-typed to the
  scoped-gates rationale comment so the skip-list entry is
  documented (infra-reviewer).
* ws_control_models.py: tighten channels: tuple[str, ...] to
  channels: tuple[NotBlankStr, ...] on subscribe and unsubscribe so
  the wire identifier discipline matches the rest of the codebase
  (python-reviewer).
* audit_chain/sink.py: add a defensive comment alongside the
  parse_typed call documenting that the dict is NOT mutated and
  that rewriting as 'payload = parse_typed(...)' would silently
  break byte-stable hashing (code-reviewer).
* test_jwt_boundary.py: add test_iat_int_passthrough_unchanged
  covering the identity branch of the _coerce_numeric_date
  validator (test-analyzer).

MINOR:
* test_boundary.py: add test_log_no_truncation_at_exactly_five_errors
  pinning the exact boundary case for _MAX_LOGGED_LOCATIONS=5
  (test-analyzer).

Items deferred by design (architectural / forward-compat
trade-offs): JwtClaims discriminated-union refactor,
AuditChainEventPayload event-name registry, JWT double-logging
documentation. Items rejected as false positives: the PEP 758
except syntax claim (Python 3.14 native, ruff-clean), three
pre-existing settings/sink lines outside this PR's scope.

Refs #1711.
Aureliolo added 4 commits May 3, 2026 10:33
External reviewer feedback on PR #1720, all valid, all fixed in this
round per the babysit-pr completeness mandate.

gemini (1): claims.py:106 -- surface iat/exp values in the
iat-must-be-less-than-exp error message; both are public epoch
timestamps so the operator gains debugging context without leaking
sensitive data.

coderabbit (6):
* CLAUDE.md:138 rename 'settings security export' to 'settings
  security import' so the helper bullet matches the actual
  inbound-validation entry point.
* docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error
  with audit_chain.emit_validation_failed in the per-boundary
  error-envelope reference now that the audit-chain ValidationError
  path has its own dedicated event.
* scripts/check_boundary_typed.py _function_node restricts the search
  to module-level FunctionDef and direct class-body methods, raises
  on ambiguity. ast.walk would otherwise return a nested helper or
  shadowed method that bypasses the gate.
* scripts/check_boundary_typed.py _calls_parse_typed verifies the
  first positional arg equals the registered boundary label. A stray
  parse_typed('jwt', ...) inside the WS handler would otherwise
  green-light the ws.control registration.
* src/synthorg/observability/audit_chain/payloads.py tightens
  identifier and name fields (event, level, module, tool_name,
  expected_hash, actual_hash, correlation_id, principal, resource,
  action_type) from str to NotBlankStr or NotBlankStr | None per
  CLAUDE.md convention. Blank values can no longer be signed into
  the chain.
* tests/unit/observability/test_audit_chain_boundary.py aligns the
  inline comment with the real failure-injection mechanism
  (record.levelname = 42 mutates a LogRecord field; we never patch
  parse_typed or AuditChainEventPayload).

Three new regression tests cover the tightened lint guard:
test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate,
test_ambiguous_function_definition_raises.

Refs #1711.
CodeRabbit posted a follow-up CHANGES_REQUESTED review on the round-1 push plus 5 tests in test_audit_chain.py started failing under the round-1 NotBlankStr tightening. All eight items fixed.

CodeRabbit (3):
- scripts/check_boundary_typed.py _calls_parse_typed restricts traversal to the boundary function's own scope (skips nested FunctionDef / AsyncFunctionDef / ClassDef / Lambda) so parse_typed calls buried in nested helpers cannot satisfy the outer contract.
- src/synthorg/api/auth/claims.py JwtClaims enforces a mutual-exclusion invariant on the four user-only fields; reject HumanRole.SYSTEM at this surface; tighten pwd_sig to NotBlankStr | None.
- tests/unit/observability/test_audit_chain_boundary.py pins _GOLDEN_HASH as a hard-coded SHA-256 hex literal instead of recomputing it.

CI fixtures (5):
- tests/unit/observability/audit_chain/test_audit_chain.py LogRecord constructions switched to pathname=__file__ so the derived module attr is non-empty.

Behaviour change (user-approved):
- test_system_user_jwt_with_pwd_sig_still_authenticates renamed to test_system_user_jwt_with_stray_pwd_sig_rejected; system tokens with a stray pwd_sig are now rejected at decode (401).

Mock-spec baseline shifted by +7 lines for test_middleware.py because the renamed test grew an explanatory docstring.

Refs #1711.
CodeRabbit's third review on head 7b8f255 found two issues; both valid, both fixed.

src/synthorg/api/auth/claims.py (Major) -- _coerce_numeric_date now rejects naive datetime. Naive .timestamp() interprets the value through the host local timezone, so the same JwtClaims construction yields different epoch seconds on UTC vs PST hosts (8-hour skew). Across an auth boundary that silently breaks token lifetime semantics. Aware datetimes still coerce to int normally.

scripts/check_boundary_typed.py (Minor) -- main() now catches ValueError raised by _function_node's ambiguity guard and exits with code 2 plus a stderr line, matching the documented exit-code matrix (0/1/2). Previously the ValueError escaped as a traceback, breaking the script's documented failure mode.

New regression tests:
test_naive_datetime_iat_rejected, test_naive_datetime_exp_rejected,
test_main_translates_value_error_to_exit_2.

Refs #1711.
CodeRabbit's fourth review on rebased head 4dcca82 flagged four issues; all valid, all fixed.

scripts/check_boundary_typed.py (Major) -- _calls_parse_typed used to accept any bare Name 'parse_typed' regardless of binding. A local def parse_typed inside a boundary function would have green-lit the gate even though the canonical helper was never invoked, and a qualified boundary.parse_typed call would have falsely failed. Three new helpers resolve the binding properly: _build_import_map walks Import/ImportFrom and maps locals to FQNs; _resolves_to_parse_typed checks callee shape against that map; _function_shadows_parse_typed scans the boundary body for local rebindings of parse_typed and short-circuits the resolver to a rejection.

docs/reference/typed-boundaries.md (Minor) -- the helper-API summary said WebSocket close code 1008 but the per-boundary section correctly documents the generic Invalid-control-message envelope on the still-open socket. Aligned the summary line with the WS section so both descriptions agree with handle_message.

.pre-commit-config.yaml (Trivial) -- the boundary-typed hook files: regex used to duplicate the script's _REGISTERED_BOUNDARIES tuple. A future boundary added only to the Python registry would silently stop running pre-push because the regex was forgotten. Broadened to ^(src/synthorg/.*\.py|scripts/check_boundary_typed\.py|\.pre-commit-config\.yaml)$ so the script stays the single source of truth.

tests/unit/api/controllers/test_ws.py (Minor, outside-diff-range) -- test_unknown_action now asserts the typed-discriminator rejection path. The dispatch-stage Unknown-action fallback (a valid auth control frame received post-handshake) had no coverage. Added test_post_handshake_auth_message_falls_through_to_unknown_action to lock in that branch.

False positive logged: CodeRabbit pre-merge Docstring Coverage warning reports 23.08% but local interrogate run on the four new src/script files reports 100%. The CodeRabbit metric appears to count test-file functions, which the project explicitly excludes from docstring requirements per its CodeRabbit configuration. No action needed.

Mock-spec baseline shifted by +22 lines for test_ws.py because the new fallback test grew the file.

Refs #1711.
@Aureliolo Aureliolo force-pushed the feat/stringly-typed-boundaries branch from 4dcca82 to e221c6d Compare May 3, 2026 08:36
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 08:38 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 218-221: The assignment record.extra_unknown_field = "boom" in the
test is dead/setup noise and should be removed because the sink only copies a
closed set of keys (see the iteration in the audit chain sink that selects
tool_name, expected_hash, etc.), so that attribute is never read; delete the
record.extra_unknown_field = "boom" line and leave the actual failure injection
(record.levelname = 42) intact to keep the test behavior unchanged.

In `@tests/unit/scripts/test_check_boundary_typed.py`:
- Around line 36-47: Add a module-level autouse teardown to remove the empty
fixture directory created by _plant_fixture: implement a pytest fixture named
cleanup_fixture_dir (scope="module", autouse=True) that yields to run tests and
after tests checks the directory referenced by _REPO_ROOT / "src" / "synthorg" /
"_lint_boundary_fixtures" and rmdir()s it only if it exists and is empty; ensure
this fixture is in the same test module or conftest so the directory is cleaned
up without changing _plant_fixture or per-test unlink calls.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bb5409bd-5366-47b3-a85e-13fa1cf9d564

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcca82 and e221c6d.

📒 Files selected for processing (32)
  • .github/workflows/ci.yml
  • .pre-commit-config.yaml
  • CLAUDE.md
  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
  • scripts/check_boundary_typed.py
  • scripts/mock_spec_baseline.txt
  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/observability/audit_chain/payloads.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/observability/events/audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/test_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Build Backend
  • GitHub Check: Dashboard Lint
  • GitHub Check: Dashboard Build
  • GitHub Check: Lint
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Lighthouse Site
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Socket Security: Pull Request Alerts
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{py,pyi}

📄 CodeRabbit inference engine (CLAUDE.md)

All public functions and classes must have type hints and Google-style docstrings; mypy strict mode enforced

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/observability/audit_chain/sink.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/test_boundary.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/middleware.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • src/synthorg/api/auth/service.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • src/synthorg/api/controllers/ws_protocol.py
  • scripts/check_boundary_typed.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • src/synthorg/observability/audit_chain/payloads.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every business-logic module must import logger via from synthorg.observability import get_logger then assign logger = get_logger(__name__)

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/observability/audit_chain/sink.py
  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/test_boundary.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/middleware.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • src/synthorg/api/auth/service.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • src/synthorg/api/controllers/ws_protocol.py
  • scripts/check_boundary_typed.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py
  • src/synthorg/observability/audit_chain/payloads.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use import logging or logging.getLogger() or print() in application code (exception: observability setup modules)

Use logger.info(EVENT, key=value) with structured kwargs and import event names from synthorg.observability.events.<domain>; never use string literals or old-style %s formatting

Every status enum transition must log INFO with domain-scoped *_STATUS_TRANSITIONED event carrying from_status, to_status, and domain id, AFTER persistence write succeeds

All error paths must log at WARNING or ERROR with context before raising; never call logger with error=str(exc) — use error_type=type(exc).__name__ and error=safe_error_description(exc) from synthorg.observability

Controllers and API endpoints must access persistence through domain-scoped service layers (e.g., ArtifactService, WorkflowService, MemoryService), never directly into repositories

Wrap attacker-controllable strings at LLM call sites via wrap_untrusted() from synthorg.engine.prompt_safety; append untrusted_content_directive(tags) to the enclosing system prompt (SEC-1)

Never call lxml.html.fromstring on attacker input; use HTMLParseGuard from synthorg.tools.html_parse_guard (SEC-1)

Every Pydantic model must have ConfigDict(frozen=True, allow_inf_nan=False) unless documented otherwise; mutations use model_copy(update=...), never direct assignment

Config and identity models must be frozen; separate mutable models for runtime state that evolves; never mix static config and mutable runtime fields

Request DTOs and args models must use extra='forbid' to prevent unknown field injection

Use NotBlankStr from core.types for identifier and name fields in Pydantic models

Every BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatch

Every entry-point ingesting a dict payload from external sources (MCP handler args, JWT decode, WebSocket control message,...

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/observability/audit_chain/payloads.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/a2a/rpc_params.py
  • src/synthorg/meta/mcp/invoker.py
  • src/synthorg/api/controllers/settings.py
  • src/synthorg/api/ws_control_models.py
  • src/synthorg/observability/audit_chain/sink.py
  • src/synthorg/api/auth/claims.py
  • src/synthorg/api/boundary.py
  • src/synthorg/api/auth/controller_helpers.py
  • src/synthorg/observability/events/audit_chain.py
  • src/synthorg/api/auth/middleware.py
  • src/synthorg/api/auth/service.py
  • src/synthorg/api/controllers/ws_protocol.py
  • src/synthorg/observability/audit_chain/payloads.py
src/synthorg/meta/mcp/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

MCP handlers must: define ToolHandler in src/synthorg/meta/mcp/handlers/<domain>.py, declare args_model for pre-dispatch validation (#1611), call require_destructive_guardrails(arguments, actor) on admin_tool, route through service-layer facades (never app_state.persistence.* directly), emit three log paths via common_logging helpers

Files:

  • src/synthorg/meta/mcp/invoker.py
docs/**/*.{md,d2,mermaid}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (d2) for architecture diagrams, nested container layouts, complex entity relationships; use Mermaid (mermaid) for flowcharts, sequence diagrams, simple hierarchies, pipelines; use Markdown tables for semantically tabular grid/matrix data; never use ASCII/Unicode box-drawing in text blocks; D2 uses theme 200 (Dark Mauve) configured in mkdocs.yml

Files:

  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
docs/reference/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Reference pages document implementation protocols, security patterns, telemetry rules, lifecycle sync, persistence boundaries, conventions, configuration precedence, regional defaults, and error codes; load on demand per CLAUDE.md table of contents

Files:

  • docs/reference/mcp-handler-contract.md
  • docs/reference/typed-boundaries.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every Mock() / AsyncMock() / MagicMock() must declare the interface via spec=ConcreteClass (Protocol or class); bare-call sites blocked by pre-commit gate scripts/check_mock_spec.py

Use mock_dispatcher from tests/conftest.py (an AsyncMock(spec=NotificationDispatcher)) instead of building the mock inline

For time-driven tests, import FakeClock from tests._shared.fake_clock and inject it into the class under test; use await clock.advance_async(seconds) for cooperative task driving; FakeClock-first: patch time.monotonic() / asyncio.sleep() globals only for legacy code paths without a Clock seam

Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow; use @pytest.mark.parametrize for similar cases; timeout is 30s globally (no per-file overrides except non-default values like timeout(60))

Always run pytest with -n 8 for parallel execution via pytest-xdist (distribution --dist=loadfile); never run tests sequentially locally; property tests use HYPOTHESIS_PROFILE=dev for 1000 examples or HYPOTHESIS_PROFILE=fuzz for 10,000 examples with no deadline

The affected-tests pre-push runner runs the affected subset twice via pytest-repeat (--count 2 -x) to catch module-level state leaks; opt out via SYNTHORG_SKIP_ISOLATION_GATE=1 for emergency pushes only

Never use monkeypatch.setattr(module.logger, 'info', spy) on BoundLoggerLazyProxy (it caches the bound method); use context manager wrapping direct setattr + try/finally del proxy.<level> — see _logger_info_spy in tests/unit/settings/test_service.py

Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, tests, or config examples; use generic names: test-provider, test-small-001, test-medium-001, test-large-001, small/medium/large aliases

Files:

  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/test_boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/api/auth/test_middleware.py
  • tests/unit/api/test_boundary.py
  • tests/unit/observability/audit_chain/test_audit_chain.py
  • tests/unit/a2a/test_rpc_params_boundary.py
  • tests/unit/api/controllers/test_ws_control_boundary.py
  • tests/unit/api/controllers/test_settings_security_boundary.py
  • tests/unit/meta/mcp/test_invoker_boundary.py
  • tests/unit/api/controllers/test_ws.py
  • tests/unit/api/auth/test_jwt_boundary.py
  • tests/unit/scripts/test_check_boundary_typed.py
  • tests/unit/api/auth/test_service.py
  • tests/unit/observability/test_audit_chain_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts; be critical and look for design improvements in the spirit of the project (robustness, correctness, simplicity); surface improvements as suggestions, not silent changes (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec; do NOT silently diverge (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Opt-in telemetry is off by default; every event property must be explicitly listed in `_ALLOWED_PROPERTIES` keyed by event type; unknown keys raise `PrivacyViolationError` and are dropped; never bypass the scrubber
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed `api.ws_frame_timeout_seconds` (default 30s) without sending a frame; wraps `socket.receive_text()` in `asyncio.wait_for(...)`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via `_SlidingWindowRateLimiter` (`api.ws_revalidation_window_seconds` default 60s, `api.ws_revalidation_max_failures` default 5); once saturated, socket closes with code 4011
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention: NEVER delete/skip tests or mark xfail; identify slow tests via `--durations=50`; compare against `tests/baselines/unit_timing.json`; if suite exceeds baseline * 1.3, fix source code regression, not tests (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Minimum test coverage is 80% (enforced in CI; benchmarks excluded); use `uv run python -m pytest tests/ -n 8 --ignore=tests/benchmarks/ --cov=synthorg --cov-fail-under=80` to verify locally
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Git commits follow `<type>: <description>` format (types: feat, fix, refactor, docs, test, chore, perf, ci); enforced by commitizen commit-msg hook; all commits must be GPG/SSH signed or App-signed via synthorg-repo-bot
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Branches follow `<type>/<slug>` naming from main; never commit to main directly; after finishing issue implementation, create a feature branch, commit, push immediately (do NOT leave work uncommitted) (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: NEVER create a PR directly via `gh pr create` (blocked by hookify); ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR; for trivial/docs-only changes use `/pre-pr-review quick`
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: Fix everything valid that review agents find (including pre-existing issues in surrounding code, suggestions, and adjacent findings); never skip or defer with 'out of scope' reasoning
Learnt from: CR
Repo: Aureliolo/synthorg

Timestamp: 2026-05-03T08:37:48.646Z
Learning: No default may privilege a single region, currency, or locale; user/company/browser/system fallback applies everywhere (MANDATORY)
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py

[warning] 256-256: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 281-281: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 304-304: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)


[warning] 327-327: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/

(jwt-python-hardcoded-secret-python)

🔇 Additional comments (61)
scripts/mock_spec_baseline.txt (1)

250-250: Baseline coordinate refresh is consistent with moved call sites.

The updated suppression coordinates match the corresponding relocated bare-mock usages, so the baseline remains internally coherent.

Also applies to: 397-401

src/synthorg/observability/events/audit_chain.py (1)

15-24: Good event taxonomy split for validation-rejected emits.

Adding a dedicated audit_chain.emit_validation_failed event cleanly separates schema rejection from timeout/signing failures.

CLAUDE.md (1)

138-138: Typed-boundary contract docs are aligned and precise.

The updated wording now matches the implementation/enforcement surface, including TypeAdapter support and the Phase 3 guard.

src/synthorg/api/boundary.py (1)

49-69: Overload + runtime dispatch extension is solid.

The TypeAdapter path is integrated cleanly while preserving the same validation/logging contract.

Also applies to: 83-88, 116-118

src/synthorg/a2a/rpc_params.py (1)

29-30: A2A params now correctly route through the shared boundary helper.

Using parse_typed("a2a.jsonrpc", ...) here is the right consolidation point.

Also applies to: 125-125

src/synthorg/api/controllers/settings.py (1)

20-20: settings.security boundary migration looks correct.

Validation is now centralized through parse_typed while keeping the existing 422 translation flow intact.

Also applies to: 668-672

src/synthorg/meta/mcp/invoker.py (1)

13-13: MCP argument validation now uses the correct boundary entry point.

This keeps the existing error-envelope behavior while aligning with boundary-level validation/logging.

Also applies to: 171-179

tests/unit/api/test_boundary.py (1)

30-46: Great coverage additions for TypeAdapter and truncation edge cases.

These tests close the main regression gaps around discriminated-union parsing and structured failure logging semantics.

Also applies to: 114-132, 147-209

tests/unit/api/controllers/test_settings_security_boundary.py (1)

1-68: LGTM!

Comprehensive boundary test coverage for the settings.security parse_typed call. The tests correctly verify:

  • Round-trip serialization via model_dump(mode="json")
  • Rejection of out-of-range integers and invalid enum values
  • Structured log emission with correct boundary label, log level, and error type
  • Default handling for empty dict and None inputs
src/synthorg/api/auth/claims.py (1)

37-166: LGTM!

Well-designed JWT claims contract with robust validation:

  • Enforces timezone-aware datetimes for iat/exp to prevent cross-environment epoch drift
  • Validates iat < exp invariant
  • Enforces mutual exclusion for user-only fields (all-or-none semantics)
  • Correctly rejects HumanRole.SYSTEM in claims since system tokens carry no role
  • Uses NotBlankStr for all identifier fields per coding guidelines
src/synthorg/api/auth/service.py (2)

219-236: LGTM!

Clean migration to typed claims construction. Building the JWT payload through JwtClaims ensures encode-side type safety, and model_dump(mode="json") produces the correct serialization format for jwt.encode.


238-284: LGTM!

decode_token now returns a validated JwtClaims instance via parse_typed("jwt", ...). The docstring correctly documents the new ValidationError exception path for malformed claim sets.

src/synthorg/api/auth/middleware.py (2)

224-235: LGTM!

Appropriate handling of ValidationError from parse_typed. The comment correctly notes that boundary validation already logged api.boundary.validation_failed, while this handler surfaces the auth-specific SECURITY_AUTH_FAILED event for operator searchability.


261-353: LGTM!

Clean migration to typed JwtClaims attribute access throughout _resolve_jwt_user. The issuer/audience validation, pwd_sig comparison, and session_id assignment all correctly use typed attributes.

src/synthorg/api/auth/controller_helpers.py (1)

228-253: LGTM!

Correct migration to typed JwtClaims handling in extract_jti. The ValidationError handler appropriately logs at debug level (since JTI extraction failure is handled gracefully), and the return uses typed attribute access.

tests/unit/api/auth/test_service.py (2)

89-98: LGTM!

Test assertions correctly updated to use typed JwtClaims attribute access. The check claims.role is not None before accessing .value properly handles the optional field.


158-215: LGTM!

Forged JWT payloads correctly updated to include must_change_password and pwd_sig fields. This ensures the payloads pass JwtClaims validation (user-only fields must be all-or-none), allowing the tests to exercise the intended issuer/audience behavior.

tests/unit/api/auth/test_middleware.py (1)

662-715: LGTM!

Excellent security-focused test update. The test correctly validates that system JWTs with a stray pwd_sig are now rejected (401) rather than authenticated (200). The docstring clearly explains the mutual-exclusion invariant enforced by JwtClaims and why legitimate system tokens are unaffected.

src/synthorg/observability/audit_chain/payloads.py (1)

1-65: LGTM!

Well-documented audit chain payload contract. Key design points are correctly implemented:

  • NotBlankStr used for all identifier fields per coding guidelines
  • extra="forbid" prevents unknown fields from slipping into the signed chain
  • The docstring clearly documents the byte-stable hashing contract (validation-only, no dict replacement)
  • error field correctly uses plain str | None since it's already sanitized upstream via safe_error_description
src/synthorg/observability/audit_chain/sink.py (3)

12-24: LGTM!

The new imports are correctly organized. ValidationError from pydantic enables the dedicated exception handler, parse_typed provides the boundary validation, AuditChainEventPayload is the schema, and the new event constant AUDIT_CHAIN_EMIT_VALIDATION_FAILED distinguishes schema-reject from other failures.


322-334: LGTM!

The boundary validation is correctly implemented: parse_typed validates the assembled payload but the return value is intentionally discarded to preserve byte-stable JSON serialization. The detailed comment explaining why payload = parse_typed(...) must never be used is valuable for future maintainers.


380-401: LGTM!

The dedicated ValidationError handler correctly:

  • Uses structured logging without exc_info=True to avoid leaking frame-locals (SEC-1 compliance)
  • Logs via the non-audited audit_chain.* prefix to prevent recursion
  • Uses safe_error_description(exc) per coding guidelines
  • Invokes the callback with ("error", 0, 0.0) to track dropped events in metrics

The distinction from the generic except Exception handler is well-documented.

tests/unit/observability/audit_chain/test_audit_chain.py (5)

239-239: LGTM!

Using pathname=__file__ instead of empty string makes the test LogRecord objects more realistic and consistent with production log records that include module information.


260-260: LGTM!

Consistent with the other LogRecord fixture updates.


282-282: LGTM!

Consistent with the other LogRecord fixture updates.


371-371: LGTM!

The _record() helper in TestExtractEventName now produces more realistic LogRecord objects.


426-426: LGTM!

The _build_log_record() helper for failure-path tests now produces more realistic LogRecord objects.

tests/unit/observability/test_audit_chain_boundary.py (4)

25-61: LGTM!

The golden fixtures are well-designed:

  • _GOLDEN_PAYLOAD exercises every field for comprehensive byte-stability coverage
  • _GOLDEN_JSON_BYTES is the pinned byte layout that the chain hash depends on
  • _GOLDEN_HASH is correctly a hard-coded literal (not recomputed), providing an independent pin

The detailed comments explain the regeneration process and reviewer sign-off requirement.


64-123: LGTM!

Comprehensive coverage of the AuditChainEventPayload contract:

  • Round-trip parsing validates the golden payload
  • Minimal payload confirms optional fields default to None
  • Extra field rejection enforces extra="forbid"
  • Missing required field rejection validates schema strictness
  • Boundary log emission confirms api.boundary.validation_failed with correct boundary and log_level

125-165: LGTM!

The byte-stability tests are critical for audit-chain integrity:

  • test_golden_json_byte_stable catches any drift in JSON layout
  • test_golden_hash_matches confirms SHA-256 determinism
  • test_validation_does_not_mutate_payload ensures parse_typed is side-effect-free

These tests prevent silent chain-hash breakage when the model evolves.


230-263: LGTM!

The test correctly validates the explicit ValidationError branch:

  • Injects an invalid levelname type to trigger validation failure
  • Asserts exactly one audit_chain.emit_validation_failed log (not the generic emit_error)
  • Confirms log_level="error" and correct audited_event
  • Verifies callback fires with ("error", 0, 0.0)

The inline comment (lines 230-239) now correctly describes the failure injection mechanism.

tests/unit/a2a/test_rpc_params_boundary.py (2)

1-34: LGTM!

The test helpers are well-designed:

  • _envelope() creates a JsonRpcRequest wrapper for method/params
  • _message_payload() constructs a valid message payload using model serialization

The type: ignore comments on lines 29-30 are acceptable for test fixtures where the goal is constructing valid JSON-serializable data.


35-94: LGTM!

Comprehensive boundary coverage for A2A JSON-RPC params:

  • Round-trip tests for all three methods (message/send, tasks/get, tasks/cancel)
  • test_method_envelope_overrides_smuggled_method validates discriminator integrity
  • test_tasks_get_extra_field_rejected enforces extra="forbid"
  • test_unknown_discriminator_rejected catches unknown methods
  • test_validation_failure_emits_boundary_log confirms structured logging with boundary="a2a.jsonrpc"
src/synthorg/api/ws_control_models.py (5)

27-40: LGTM!

WsAuthMessage correctly defines the auth handshake contract:

  • action: Literal["auth"] serves as the discriminator
  • ticket: NotBlankStr ensures non-empty ticket values
  • Frozen and forbids extra fields as required

42-62: LGTM!

WsSubscribeMessage correctly implements the subscribe contract:

  • channels as tuple[NotBlankStr, ...] normalizes incoming lists to immutable tuples
  • filters: dict[str, str] | None preserves the three-way semantics documented in the docstring (None = keep existing, {} = clear, {...} = replace)
  • Default channels=() allows empty subscribe frames

64-76: LGTM!

WsUnsubscribeMessage mirrors the subscribe structure appropriately.


78-88: LGTM!

WsPingMessage is minimal with just the discriminator, as expected for a heartbeat message.


90-116: LGTM!

The discriminated union and TypeAdapter are correctly defined:

  • WsControlMessage uses Annotated[..., Discriminator("action")] for Pydantic 2.x discriminated union parsing
  • WS_CONTROL_MESSAGE_ADAPTER provides the TypeAdapter interface for parse_typed
  • __all__ exports all public symbols for clean imports
src/synthorg/api/controllers/ws_protocol.py (4)

14-32: LGTM!

The new imports are correctly organized for the typed-boundary refactor:

  • ValidationError for explicit exception handling
  • parse_typed and WS_CONTROL_MESSAGE_ADAPTER for boundary validation
  • Message type classes for isinstance dispatch
  • safe_error_description for SEC-1 compliant error logging

183-221: LGTM!

The handle_message refactor correctly implements typed-boundary validation:

  • parse_typed("ws.control", parsed, WS_CONTROL_MESSAGE_ADAPTER) validates against the discriminated union
  • ValidationError handler logs via both the boundary helper (automatic) and the ws-specific API_WS_INVALID_MESSAGE event for operator search trails
  • Returns generic "Invalid control message" error to avoid leaking schema details
  • Uses safe_error_description(exc) per SEC-1 guidelines

223-242: LGTM!

The typed dispatch pattern is clean:

  • isinstance checks route to the correct handler
  • WsAuthMessage falls through to "Unknown action" for post-handshake auth attempts (intentional per comment)
  • message.action[:64] truncation is defensive but harmless since WsAuthMessage.action is the literal "auth"

245-302: LGTM!

_handle_subscribe_typed correctly adapts the subscribe logic for typed messages:

  • Filter bounds checking uses message.filters directly
  • Channel iteration uses message.channels (now a tuple from Pydantic coercion)
  • Authorization logic unchanged
  • Filter semantics preserved (None = keep, {} = clear, {...} = replace)
  • Logs at INFO level for state transitions per guidelines
tests/unit/api/controllers/test_ws_control_boundary.py (3)

36-75: LGTM!

Comprehensive round-trip tests for all four message variants:

  • subscribe with channels list normalization to tuple
  • unsubscribe with channels normalization
  • ping minimal variant
  • auth with ticket preservation

These tests validate the discriminator-based parsing works correctly for all variants.


76-121: LGTM!

Negative validation tests:

  • Extra field rejected via extra="forbid"
  • Unknown action rejected via discriminator failure
  • Filter type validation (int instead of str) rejected
  • Boundary log emission confirms api.boundary.validation_failed with correct boundary and log_level

123-170: LGTM!

End-to-end handle_message boundary tests:

  • Valid subscribe updates state and returns correct envelope
  • Extra field returns "Invalid control message" error without state mutation
  • Validation failure emits api.boundary.validation_failed log

These tests validate the full dispatch path through typed validation.

tests/unit/api/controllers/test_ws.py (2)

69-85: LGTM!

The test correctly reflects the behavior change:

  • Unknown action values ("unknown") are now rejected at parse time by the typed discriminator
  • The error message changes from "Unknown action" to "Invalid control message"
  • The inline comment explains the intentional behavior change

86-106: LGTM!

This new test is valuable for coverage of the post-handshake auth fallback path:

  • WsAuthMessage is a valid discriminated-union variant, so it parses successfully
  • On an already-authenticated socket, it falls through the isinstance checks
  • The fallback returns "Unknown action" to avoid leaking that auth handshake exists

This is the only remaining path to API_WS_UNKNOWN_ACTION, ensuring the branch stays covered.

tests/unit/api/auth/test_jwt_boundary.py (4)

1-52: LGTM! Well-structured boundary tests for the JWT contract.

The test module provides comprehensive coverage for JwtClaims invariants and AuthService boundary validation. The static analysis warnings about hardcoded JWT secrets are false positives—test files appropriately use synthetic secrets that never leave the test environment.


54-227: Comprehensive model invariant coverage.

The TestJwtClaimsModel class exercises all critical JwtClaims validation rules: naive datetime rejection, iat < exp enforcement, partial user-token shape rejection, SYSTEM role rejection on user tokens, extra claim rejection via extra="forbid", and blank required claim rejection. Good defensive testing.


229-331: Boundary logging assertions are correct.

The tests properly verify that api.boundary.validation_failed is emitted exactly once with boundary="jwt" and log_level="warning" for validation failures. This ensures the Phase 2 contract is observable.


333-363: Create-side boundary tests are solid.

The encode-side coverage confirms create_token builds JwtClaims internally with the expected pwd_sig length and correctly rejects attempts to mint SYSTEM-role tokens.

tests/unit/meta/mcp/test_invoker_boundary.py (2)

94-108: LGTM! Boundary log emission test is correctly structured.

The test properly verifies that validation failures emit exactly one api.boundary.validation_failed warning with boundary="mcp.tool", consistent with the typed-boundary contract.


69-72: The code is correct. ToolExecutionResult.content is typed as str (defined in src/synthorg/tools/base.py:50), and the invoker at line 255 returns ToolExecutionResult(content=result) where result is the handler's string output (e.g., json.dumps(...)). Calling json.loads() on a string is valid.

scripts/check_boundary_typed.py (3)

130-159: Import resolution and shadowing detection look correct.

_resolves_to_parse_typed properly verifies that the callee either:

  1. Is a bare parse_typed name that resolves via the import map to the canonical FQN (synthorg.api.boundary.parse_typed), with no local shadowing.
  2. Is a qualified X.parse_typed where X resolves to synthorg.api.boundary.

This addresses the previous review concern about token-only matching accepting local shims.


162-233: Scope-aware traversal is correctly implemented.

_function_shadows_parse_typed checks the function's top-level scope for shadowing (local def parse_typed, assignment, or annotation), and _calls_parse_typed uses an iterative approach that stops at nested scope boundaries (FunctionDef, AsyncFunctionDef, ClassDef, Lambda). This prevents false positives from nested helpers.


287-324: LGTM! main() correctly implements the documented exit-code matrix.

Exit code 0 for clean, 1 for violations, and 2 for internal errors (including the ValueError from ambiguous registrations). The stderr output provides actionable guidance.

.pre-commit-config.yaml (1)

276-291: LGTM! Hook configuration follows best practices.

The boundary-typed hook correctly:

  1. Uses the script's _REGISTERED_BOUNDARIES as the single source of truth (Lines 280-287 comment).
  2. Has a broad files: trigger pattern to avoid drift between the config and the script.
  3. Runs at pre-push stage appropriately for this type of gate.
tests/unit/scripts/test_check_boundary_typed.py (1)

52-337: Comprehensive test coverage for the lint guard.

The test suite covers all critical paths:

  • Clean repository status
  • Missing parse_typed call detection
  • Opt-out marker recognition
  • Wrong boundary label rejection
  • Nested scope handling (both nested helper and nested same-name function)
  • Import resolution (unimported, qualified, aliased)
  • Shadowing detection (local def, assignment)
  • Ambiguity handling with proper exit code
.github/workflows/ci.yml (1)

122-124: LGTM! CI gate properly wired.

The new step is correctly placed in the lint job alongside other Python gates, inherits the job's 10-minute timeout, and uses the same script as the pre-commit hook for consistency.

docs/reference/mcp-handler-contract.md (1)

13-19: LGTM! Documentation accurately reflects the implementation.

The updated argument validation section correctly documents:

  1. The parse_typed("mcp.tool", arguments, args_model) validation call
  2. Dual log emission on malformed payloads (api.boundary.validation_failed + mcp.server.invoke.failed)
  3. Link to the typed-boundaries reference for the full contract
docs/reference/typed-boundaries.md (1)

1-183: Comprehensive reference documentation.

The document clearly explains:

  1. The parse_typed helper API and its two overloads
  2. Structured logging behavior with field locations and truncation
  3. Error envelope translation per boundary type
  4. The registered boundaries table (matching the script's registry)
  5. Per-boundary implementation notes with stability constraints (especially audit-chain byte stability)
  6. Lint guard operation and opt-out mechanism
  7. Step-by-step instructions for adding new boundaries

Comment thread tests/unit/observability/test_audit_chain_boundary.py Outdated
Comment thread tests/unit/scripts/test_check_boundary_typed.py
Linux CI Test (Python 3.14) flagged tests/unit/engine/test_task_engine_coverage.py::TestInFlightResolution::test_in_flight_envelope_resolved_on_drain_timeout with TimeoutError. The 50ms eng.stop timeout (hard_deadline = 100ms) is too tight for a CI runner under xdist load: cancel + drain bookkeeping can slip past 100ms even though the test passes locally in milliseconds. Bump to 2.0s to match the sibling test_dispatch_exception_returns_internal_error convention. The test still exercises the same _drain_processing timeout-branch (slow_save blocks on block.wait() so the inner timeout always fires regardless of cap), just with real CI headroom.

CodeRabbit nitpicks (2):
- tests/unit/observability/test_audit_chain_boundary.py: drop the dead record.extra_unknown_field setup line. The audit-chain sink only copies a closed key set so the attribute is never read; the actual ValidationError injection is record.levelname = 42 a few lines later.
- tests/unit/scripts/test_check_boundary_typed.py: add a module-scope autouse _cleanup_fixture_dir fixture that removes the empty _lint_boundary_fixtures directory after the suite. Each test still owns its file unlink in finally; the fixture only sweeps the empty directory if no orphans remain (a leaked file keeps the directory so the operator notices instead of the cleanup hiding the leak).

Refs #1711.
@Aureliolo Aureliolo merged commit 7b9f409 into main May 3, 2026
78 checks passed
@Aureliolo Aureliolo deleted the feat/stringly-typed-boundaries branch May 3, 2026 09:20
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 3, 2026 09:20 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 3, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Frontend and UX polishing improves user interface responsiveness and
visual consistency.
- API hygiene and validation enhancements provide smoother and more
reliable interactions.

### What's new
- Introduced typed-boundary helpers enabling better type safety and
parse_typed workflows.
- Added codebase-audit skill prompt tuning for improved project
auditing.

### Under the hood
- Eliminated flaky tests caused by module-level state for more stable
test outcomes.
- Unified image tag management under CLI and Renovate for consistent
dependency updates.
- Added cross-PR file-overlap analysis to the review dependency pull
request skill.
- Updated multiple dependencies including Python, Web, CLI, and
container libraries.
- Improved CI tooling and lock file maintenance for better build
reliability.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.7.8](v0.7.7...v0.7.8)
(2026-05-03)


### Features

* **api:** typed-boundary helper + codebase-audit skill prompt tuning
([#1712](#1712))
([40ee65b](40ee65b))
* **boundary:** RFC
[#1711](#1711) Phases 2 + 3
— typed boundaries via parse_typed
([#1720](#1720))
([7b9f409](7b9f409))


### Bug Fixes

* **api:** audit cleanup B -- API hygiene & validation
([#1719](#1719))
([3d790d9](3d790d9))
* audit cleanup C - persistence, concurrency & data integrity
([#1708](#1708))
([#1717](#1717))
([bcce097](bcce097))
* **test:** exterminate xdist-flaky tests with module-level state
([#1713](#1713))
([#1721](#1721))
([8d258dd](8d258dd))
* **web:** audit cleanup E -- frontend & UX polish
([#1710](#1710))
([#1718](#1718))
([3a3591a](3a3591a))


### Refactoring

* **cli:** single source of truth for DHI image tags + Renovate manager
([#1723](#1723))
([57980a2](57980a2))


### Documentation

* audit cleanup D -- public-facing & docs sync
([#1709](#1709))
([#1715](#1715))
([ade03b7](ade03b7))


### Tests

* **engine:** make TestDrainTimeout deterministic + preserve subclass
type in [@Ontology](https://github.com/ontology)_entity
([#1729](#1729))
([b00fb05](b00fb05))


### CI/CD

* Update CI tool dependencies
([#1703](#1703))
([355a9ff](355a9ff))


### Maintenance

* add cross-PR file-overlap analysis to review-dep-pr skill
([#1722](#1722))
([3861d8a](3861d8a))
* **ci:** unify apko-version under workflow env so Renovate manages it
everywhere ([#1724](#1724))
([9c0a7fd](9c0a7fd))
* consolidate DHI image-pin custom regex managers
([#1726](#1726))
([b8b0cba](b8b0cba))
* **deps:** update dependency chainguard-dev/melange to v0.50.4
([#1701](#1701))
([8cbf83a](8cbf83a))
* Lock file maintenance
([#1705](#1705))
([414cfea](414cfea))
* Lock file maintenance
([#1727](#1727))
([5cb1212](5cb1212))
* Update CLI dependencies
([#1702](#1702))
([9fb57b9](9fb57b9))
* Update Container dependencies
([#1698](#1698))
([6d24fd6](6d24fd6))
* Update dependency @eslint-react/eslint-plugin to v5
([#1704](#1704))
([1cb1294](1cb1294))
* Update Python dependencies
([#1699](#1699))
([8e7af3a](8e7af3a))
* Update Python dependencies to v4.15.0
([#1725](#1725))
([69164c8](69164c8))
* Update Web dependencies
([#1700](#1700))
([715300d](715300d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: stringly-typed boundaries -- typed contract at MCP / JWT / WS / audit-chain / A2A / settings export

1 participant