Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ed1e649
feat(observability): log before raise and persist transition events
Aureliolo May 15, 2026
740b69c
refactor(comment-hygiene): rewrite migration-framing docstrings; exte…
Aureliolo May 15, 2026
b5adeac
docs(reference): extend typed-boundaries with 3 new pattern sections …
Aureliolo May 15, 2026
bfd0582
feat(cli,tooling): polish CLI Go commands; move slow gates to pre-pus…
Aureliolo May 15, 2026
0fe0749
test: tighten mock spec on parked context; use vi.waitFor for microta…
Aureliolo May 15, 2026
e69885d
docs(monitoring): document audit log fill ratio metric + NATS-py veri…
Aureliolo May 15, 2026
5d4feb1
feat(obs): link active OTel span_id and trace_id on budget hard-stop …
Aureliolo May 15, 2026
11e6edc
docs(guides,reference): add 8 operational guides + approval/workers/R…
Aureliolo May 15, 2026
9646e4e
feat(ceremony): wire optional budget_snapshot callable into CeremonyE…
Aureliolo May 15, 2026
0e6306b
chore: track WP-7 follow-up work in #1925 (refs #1922)
Aureliolo May 15, 2026
c80ae98
feat(workers): real HTTP executor + backend execute endpoint + servic…
Aureliolo May 15, 2026
fb018a6
feat(experiments): A/B test variant registry + deterministic assignme…
Aureliolo May 15, 2026
d29e48c
feat(in-memory): add clear() and size() to MCP installations repo (re…
Aureliolo May 15, 2026
6de9487
feat(observability): WS lifetime + revalidation + active conns + Post…
Aureliolo May 15, 2026
d9cc512
feat(llm,test): ModelPinMetadata + LlmFallbackConfig temp/top_p + lif…
Aureliolo May 15, 2026
d02429a
feat(prompts): explicit top_p on ProceduralMemoryProposer + LlmSemant…
Aureliolo May 15, 2026
d201bfb
feat(persistence,obs,ws): backend.kind discriminator + WS lifetime in…
Aureliolo May 15, 2026
3802c9c
refactor(providers): default CompletionConfig.top_p=1.0 centrally ins…
Aureliolo May 15, 2026
175b2d7
test(conformance): add post-start TCP accept probe before yielding po…
Aureliolo May 15, 2026
fc4c090
fix(wp7): address pre-PR review findings (refs #1922)
Aureliolo May 15, 2026
69540af
fix(wp7): babysit round 2 -- address CodeRabbit + Gemini findings + C…
Aureliolo May 15, 2026
d21d9e7
fix(wp7): babysit round 3 -- address CodeRabbit re-review on round 2
Aureliolo May 15, 2026
c8f2f03
fix(workers): explicit client-level httpx timeout on AsyncClient base…
Aureliolo May 15, 2026
c2fa188
fix(workers): log before raise on TaskExecutionExecutor constructor e…
Aureliolo May 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 23 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,16 @@ ci:
# check-no-modify-migration:
# rebase / migration state checks meaningful only on the pushing
# developer's clone, not in an ephemeral cloud runner.
# ``no-release-please-token``, ``workflow-shell-git-commits``,
# ``no-review-origin-in-code`` and ``no-migration-framing`` are
# deliberately NOT skipped: all four are pure-Python, zero-dependency
# ``no-release-please-token`` and ``workflow-shell-git-commits`` are
# deliberately NOT skipped: both are pure-Python, zero-dependency
# scripts with no CI counterpart, and letting pre-commit.ci enforce
# them closes the gap where a PR from a contributor who skipped local
# hooks would otherwise introduce a regression unchecked.
# them at the pre-commit stage closes the gap where a PR from a
# contributor who skipped local hooks would otherwise introduce a
# regression unchecked. ``no-review-origin-in-code`` and
# ``no-migration-framing`` run at ``stages: [pre-push]`` (the
# local-only pre-push hook surface) so pre-commit.ci does not pick
# them up; their CI counterpart is the ``Lint`` job in ``ci.yml``
# which runs the same scripts on every PR.
skip: [commitizen, gitleaks, hadolint-docker, caddy-validate, zizmor, no-em-dashes, no-redundant-timeout, mypy, pytest-unit, golangci-lint, go-vet, go-test, eslint-web, check-push-rebased, check-single-migration-per-pr, check-no-modify-migration, forbidden-literals, persistence-boundary, persistence-protocol-uniformity, dependency-inversion, provider-complete-chokepoint, no-new-logger-exception-str-exc, otlp-span-redaction, orphan-fixtures, doc-drift-counts, boundary-typed, setting-to-startup-trace, long-running-loop-kill-switch, list-pagination, domain-error-hierarchy, dead-api-endpoints, dual-backend-test-parity, schema-drift, no-magic-numbers, convention-gate-inventory, mcp-admin-guardrail, runtime-stats-freshness, dto-types-ts-in-sync]

default_install_hook_types: [pre-commit, commit-msg, pre-push]
Expand Down Expand Up @@ -452,7 +456,12 @@ repos:
# script / baseline) cannot bypass the check.
files: ^(src/synthorg/.*\.py|scripts/check_domain_error_hierarchy\.py|scripts/domain_error_hierarchy_baseline\.txt|\.pre-commit-config\.yaml)$
pass_filenames: false
stages: [pre-commit, pre-push]
# Full-repo AST walk: ~2.7s on every commit becomes noticeable
# on focused-edit cycles where a developer commits multiple
# times in a row. Move to pre-push so each commit stays
# interactive while the gate still fires before code leaves
# the machine.
stages: [pre-push]

- id: no-controller-response-for-domain-errors
name: no-controller-response-for-domain-errors gate (raise typed errors, never build local Response envelopes)
Expand Down Expand Up @@ -618,10 +627,11 @@ repos:
# the script's internal allowlist would immediately skip.
files: ^(src/synthorg/(?!persistence/(?:postgres|sqlite)/revisions/).*\.(py|sql)|tests/.*\.py|scripts/check_no_review_origin_in_code\.py|\.pre-commit-config\.yaml)$
pass_filenames: false
# pre-commit covers pre-commit.ci (catches GitHub-UI edits and
# contributors who skip local hooks); pre-push remains the
# primary developer-machine enforcement point.
stages: [pre-commit, pre-push]
# Full-repo scan is ~14s wall-clock; running it on every commit
# made focused-edit cycles painful. pre-push still catches
# everything before code leaves the machine, and pre-commit.ci
# is a separate hook surface (uses its own stage selection).
stages: [pre-push]
Comment thread
coderabbitai[bot] marked this conversation as resolved.

- id: no-migration-framing
name: no migration / origin / phase-N framing in code
Expand All @@ -635,4 +645,6 @@ repos:
# files the script's internal allowlist would immediately skip.
files: ^(src/synthorg/(?!persistence/(?:postgres|sqlite)/revisions/).*\.(py|sql)|tests/.*\.py|scripts/check_no_migration_framing\.py|\.pre-commit-config\.yaml)$
pass_filenames: false
stages: [pre-commit, pre-push]
# ~13s wall-clock on the full repo; same cost trade-off as
# no-review-origin-in-code above. Moved to pre-push.
stages: [pre-push]
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Web: see `web/CLAUDE.md`. CLI: see `cli/CLAUDE.md` (use `go -C cli`, never `cd c
```bash
uv sync # all deps
uv sync --group docs # docs toolchain (zensical + D2)
bash scripts/install_cli_tools.sh # one-time per-machine: golangci-lint only (CI installs separately; install d2 via docs/getting_started.md)
uv run ruff check src/ tests/ --fix # lint + auto-fix
uv run ruff format src/ tests/ # format
uv run mypy src/ tests/ # strict type-check
Expand Down
9 changes: 9 additions & 0 deletions cli/cmd/new.go
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,15 @@ Available kinds:
synthorg new controller ping`,
GroupID: "core",
Args: cobra.NoArgs,
// Render the help text when the user runs ``synthorg new`` with no
// subcommand, then exit with the usage error code so the parent
// shell can detect that a kind/domain was required. Without the
// explicit ExitUsage, the bare ``synthorg new`` would print help
// and exit 0, indistinguishable from a successful operation.
RunE: func(cmd *cobra.Command, _ []string) error {
_ = cmd.Help()
return NewExitError(ExitUsage, nil)
},
Comment thread
coderabbitai[bot] marked this conversation as resolved.
}

var newServiceCmd = newKindCmd(scaffold.KindService, "service")
Expand Down
26 changes: 25 additions & 1 deletion cli/cmd/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,31 @@ func runStart(cmd *cobra.Command, _ []string) error {

state, err := config.Load(opts.DataDir)
if err != nil {
return fmt.Errorf("loading config: %w", err)
// config.Load(...) returns DefaultState silently when the file
// is absent, so a non-nil error here means the file exists but
// is unreadable, malformed, or fails schema validation.
// Distinguish each shape via typed sentinels so the operator
// knows whether to repair the file or check permissions
// instead of guessing from a generic ``loading config:``
// wrapper.
switch {
case errors.Is(err, config.ErrParsing):
return fmt.Errorf(
"config file is malformed (invalid JSON); "+
"edit it manually or remove it and re-run "+
"'synthorg init': %w", err,
)
case errors.Is(err, config.ErrReading):
return fmt.Errorf(
"config file is unreadable (check filesystem "+
"permissions): %w", err,
)
default:
// Anything else (validation / DataDir canonicalisation) is
// surfaced as-is with a ``config:`` prefix so the operator
// reads the wrapped detail directly.
return fmt.Errorf("config: %w", err)
}
}
safeDir, err := safeStateDir(state)
if err != nil {
Expand Down
16 changes: 14 additions & 2 deletions cli/cmd/update.go
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,11 @@ func runUpdate(cmd *cobra.Command, _ []string) error {

// CLI update (unless --images-only).
if !updateImagesOnly {
if err := updateCLI(cmd, state.AutoUpdateCLI); errors.Is(err, errReexec) {
err := updateCLI(cmd, state.AutoUpdateCLI)
if errors.Is(err, errReexec) {
return reexecUpdate(cmd)
} else if err != nil {
}
if err != nil {
return fmt.Errorf("updating CLI binary: %w", err)
}
}
Expand Down Expand Up @@ -235,6 +237,16 @@ func downloadAndApplyCLI(ctx context.Context, out *ui.UI, result selfupdate.Chec
return nil
}

// Surface a permission error in the install directory before the
// download starts; otherwise the user waits through a multi-MB
// transfer only to fail at the final ``Replace`` step.
if err := selfupdate.ProbeInstallDirWritable(); err != nil {
return fmt.Errorf(
"cannot update CLI in place; re-run as an administrator "+
"or move the binary to a writable directory: %w", err,
)
}

out.Step("Downloading...")
binary, err := selfupdate.Download(ctx, result.AssetURL, result.ChecksumURL, result.SigstoreBundURL)
if err != nil {
Expand Down
17 changes: 15 additions & 2 deletions cli/internal/config/state.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,19 @@ import (

const stateFileName = "config.json"

// Sentinel errors for Load failure modes, classified so callers can
// branch on shape (errors.Is) rather than on error.Error() prefix. The
// shapes are mutually exclusive: at most one wraps any given Load
// error.
var (
// ErrReading is wrapped when the persisted config file exists but
// cannot be read (filesystem permissions, I/O error, etc.).
ErrReading = errors.New("reading config")
// ErrParsing is wrapped when the config file is present and
// readable but its bytes do not decode as valid JSON.
ErrParsing = errors.New("parsing config")
)

// Fine-tune variant identifiers persisted in State.FineTuningVariant and
// used to construct image service names (e.g. "synthorg-fine-tune-gpu").
const (
Expand Down Expand Up @@ -267,12 +280,12 @@ func Load(dataDir string) (State, error) {
defaults.Sandbox = false
return defaults, nil
}
return State{}, fmt.Errorf("reading config %s: %w", path, err)
return State{}, fmt.Errorf("%w %s: %w", ErrReading, path, err)
}
// Unmarshal onto defaults so missing fields retain default values.
s := DefaultState()
if err := json.Unmarshal(data, &s); err != nil {
return State{}, fmt.Errorf("parsing config %s: %w", path, err)
return State{}, fmt.Errorf("%w %s: %w", ErrParsing, path, err)
}
if err := s.validate(); err != nil {
return State{}, fmt.Errorf("config %s: %w", path, err)
Expand Down
37 changes: 37 additions & 0 deletions cli/internal/selfupdate/updater.go
Original file line number Diff line number Diff line change
Expand Up @@ -512,6 +512,43 @@ func Replace(binaryData []byte) error {
return ReplaceAt(binaryData, execPath)
}

// ProbeInstallDirWritable verifies the directory holding the current
// executable is writable BEFORE a download is started. The probe
// creates and removes a short-named tempfile so a permission error
// surfaces in microseconds instead of after the user has already
// waited through a multi-MB download. Returns nil when writable.
func ProbeInstallDirWritable() error {
execPath, err := os.Executable()
if err != nil {
return fmt.Errorf("finding executable path: %w", err)
}
return ProbeInstallDirWritableAt(execPath)
}

// ProbeInstallDirWritableAt is the testable core of
// ProbeInstallDirWritable: it accepts the executable path explicitly
// so unit tests can target an arbitrary directory.
func ProbeInstallDirWritableAt(execPath string) error {
resolved, err := filepath.EvalSymlinks(execPath)
if err != nil {
return fmt.Errorf("resolving symlinks for write probe: %w", err)
}
dir := filepath.Dir(resolved)
f, err := os.CreateTemp(dir, ".synthorg-write-probe.*.tmp")
if err != nil {
return fmt.Errorf("install directory %s is not writable: %w", dir, err)
}
tmpPath := f.Name()
if cerr := f.Close(); cerr != nil {
_ = os.Remove(tmpPath)
return fmt.Errorf("closing write-probe file: %w", cerr)
}
if rerr := os.Remove(tmpPath); rerr != nil {
return fmt.Errorf("removing write-probe file %s: %w", tmpPath, rerr)
}
return nil
}

// ReplaceAt swaps the binary at the given path with new content.
// This is the testable core of Replace.
func ReplaceAt(binaryData []byte, execPath string) error {
Expand Down
7 changes: 7 additions & 0 deletions docs/architecture/decisions.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,13 @@ All significant design and architecture decisions in force today, organized by d

**Mitigation plan:** (1) File upstream PR against `nats-io/nats.py` with the one-line `inspect.iscoroutinefunction` fix; upstream PR status is tracked in the project issue queue (search `nats-py` label); the scoped `filterwarnings` entry in `pyproject.toml` remains the active workaround until a fixed upstream release is available. (2) If upstream is unresponsive by **2026-06-10** (60 days from the 2026-04-11 review), maintain a local monkey-patch in `bus/_nats_compat.py`. (3) Monitor `nats-core` for future JetStream support.

**Verification checkpoint (2026-06-10):** on this date run the checklist below and update this section with the outcome (mark each item Done / Not done / Outcome).

1. Inspect `nats-io/nats.py` open PRs and recent releases on GitHub for the `inspect.iscoroutinefunction` fix.
2. If a fixed release is available: bump the `nats-py` pin in `pyproject.toml`, drop the matching `filterwarnings` entry, run `uv run python -m pytest tests/ -m integration -k nats` to confirm warnings are gone, and replace this checkpoint section with the resolution outcome.
3. If no fixed release exists: implement the local monkey-patch in `src/synthorg/communication/bus/_nats_compat.py` (one-line `nats.aio.client.iscoroutinefunction = inspect.iscoroutinefunction`), import it at bus initialisation, and extend this section with the patch landing date.
4. Re-evaluate `nats-core` JetStream support: a maintained alternative removes the entire mitigation requirement.
Comment on lines +132 to +137
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Checkpoint numerics should follow docs numeric-sourcing policy.

The newly added dated checklist contains hardcoded numerics/date values without <!--RS:...--> markers. Please source these numerics through runtime-stats markers (or move volatile checkpoint timing out of public docs).

As per coding guidelines, “Numerics in README and public docs must be sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/architecture/decisions.md` around lines 132 - 137, The dated checklist
header "Verification checkpoint (2026-06-10)" and any hardcoded numerics in the
new items must be replaced with runtime-stats markers per the docs
numeric-sourcing policy: pull the date/number from data/runtime_stats.yaml and
insert a <!--RS:NAME--> marker instead of the literal "2026-06-10" (or move the
volatile checkpoint into an internal/temporary doc if you prefer). Update the
checklist text in docs/architecture/decisions.md (the "Verification checkpoint"
heading and the numbered items 1–4) to reference the appropriate <!--RS:...-->
keys and ensure the corresponding entries exist in data/runtime_stats.yaml; if
you move the checkpoint out of public docs, note that change here and remove
hardcoded numerics.


## Overarching Pattern

Nearly every decision follows the same architecture: a pluggable protocol interface with one initial implementation shipped, and alternative strategies documented for future extension. This is consistent with the project's protocol-driven design philosophy.
20 changes: 20 additions & 0 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,26 @@ uv sync

`uv sync` creates a virtual environment in `.venv/` and installs all development dependencies (linters, type checker, test runner, pre-commit, etc.).

## Install external CLI tools (one-time per machine)

Some gates and the docs build rely on external binaries that are not Python packages: `golangci-lint` (Go linter, used by the CLI) and `d2` (architecture diagram renderer).

Install `golangci-lint` once per machine:

```bash
bash scripts/install_cli_tools.sh
```

The script downloads the pinned `golangci-lint` version that matches CI (`.github/workflows/cli.yml`). Re-run only after bumping the pinned version; subsequent `uv sync` invocations do NOT re-run the script. CI uses its own action-based install step, so this is strictly a local-developer convenience.

Install `d2` separately (the docs job pins `v0.7.1`). The fastest path is the upstream installer:

```bash
curl -fsSL https://d2lang.com/install.sh | sh -s -- --version v0.7.1
```

On Windows, install via `winget install Terrastruct.d2` or download the release archive from `https://github.com/terrastruct/d2/releases`. Either way, ensure the resulting `d2` binary is on `PATH`; the docs build invokes it directly.
Comment on lines +35 to +53
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

New numeric doc content needs <!--RS:...--> markers.

This section introduces hardcoded numeric values (including version numerics) without runtime-stats markers. Please source numeric literals via <!--RS:...--> or rewrite to avoid explicit numerics.

As per coding guidelines, “Numerics in README and public docs must be sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers”.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~45-~45: The official name of this software platform is spelled with a capital “H”.
Context: ...golangci-lint version that matches CI (.github/workflows/cli.yml`). Re-run only after ...

(GITHUB)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/getting_started.md` around lines 35 - 53, Replace hardcoded version
numerics in docs/getting_started.md with runtime-stats markers: change the d2
version string "v0.7.1" to a marker like <!--RS:D2_VERSION--> and replace any
golangci-lint pinned version mention with a marker like
<!--RS:GOLANGCI_LINT_VERSION--> that sources values from
data/runtime_stats.yaml; ensure any other explicit numeric literals in this
section are similarly converted to <!--RS:...--> markers and update the install
guidance to reference scripts/install_cli_tools.sh and the CI pin in
.github/workflows/cli.yml so readers know where the marker values originate.


## Verify Installation

Run the smoke tests to confirm everything is working:
Expand Down
112 changes: 112 additions & 0 deletions docs/guides/a2a-federation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
title: A2A Federation
description: Register a peer SynthOrg deployment, expose JSON-RPC methods, route tasks across the federation.
---

# A2A Federation

The Agent-to-Agent (A2A) bridge lets one SynthOrg deployment delegate tasks to a peer over JSON-RPC. Each side authenticates with a shared JWT credential and the typed boundary at `synthorg.a2a.rpc_params.parse_rpc_params` validates every inbound `params` block. This guide walks through registering a peer, enabling specific RPC methods, and observing a federation round-trip.

## Concepts

- **Peer**: a SynthOrg deployment reachable at an HTTPS URL with a JSON-RPC endpoint mounted at `/a2a`.
- **Method**: a JSON-RPC operation the gateway exposes. The current method set is `message/send`, `tasks/get`, and `tasks/cancel`.
- **Envelope precedence**: the JSON-RPC `method` field on the envelope always wins; a `method` key smuggled inside `params` is rejected at `parse_rpc_params` time.

## Configuration surface

Settings live under the `a2a` namespace. Resolve them via `SettingsService` or set them in the company-template YAML.

| Key | Type | Default | Purpose |
|---|---|---|---|
| `a2a.enabled` | bool | `false` | Master switch for the federation gateway. |
| `a2a.peer_url` | URL | (unset) | Outbound peer endpoint. |
| `a2a.peer_jwt_secret` | secret | (unset) | HMAC key for outbound JWT. |
| `a2a.methods_enabled` | list[str] | `[]` | Allowlist of inbound methods. |
| `a2a.timeout_seconds` | float | `30` | Per-request wall-clock budget. |

Comment on lines +20 to +27
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace hard-coded numerics in this public guide with runtime-stat markers.

This guide includes literal numeric values (defaults, ports, status codes) instead of <!--RS:NAME--> marker references required for public docs.

As per coding guidelines: “Numerics in README and public docs must be sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers”.

Also applies to: 30-31, 85-87

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/guides/a2a-federation.md` around lines 20 - 27, The table and other doc
spots contain hard-coded numeric defaults (e.g., the `a2a.timeout_seconds`
default value of 30 and other literal ports/status codes); replace each numeric
literal with the appropriate runtime-stat marker `<!--RS:NAME-->` (e.g., swap
the "30" default for `a2a.timeout_seconds` with a matching `<!--RS:...-->`
marker name taken from data/runtime_stats.yaml), update every occurrence noted
(the table row for `a2a.timeout_seconds`, the default/port/status code
occurrences referenced in the comment) to use the marker, and if a suitable
marker does not exist add a descriptive entry to data/runtime_stats.yaml and
reference it here using `<!--RS:NAME-->`; ensure marker names are descriptive
and consistent with existing markers so the public docs render the numeric
values at runtime.

## Worked example: two-node round-trip

The example uses two local processes on ports `8000` (node A) and `8001` (node B); each side has the other registered as its peer.

### Node B (callee)

```bash
SYNTHORG_DATA_DIR=/tmp/synthorg-b \
SYNTHORG_BACKEND_PORT=8001 \
uv run python -m synthorg.api
```

```yaml
# /tmp/synthorg-b/config.yaml
a2a:
enabled: true
methods_enabled:
- tasks/get
peer_jwt_secret: "shared-secret-do-not-commit"
```

### Node A (caller)

```yaml
# /tmp/synthorg-a/config.yaml
a2a:
enabled: true
peer_url: http://localhost:8001/a2a
peer_jwt_secret: "shared-secret-do-not-commit"
methods_enabled: []
```

Call `tasks/get` from node A:

```python
import httpx
import jwt
import uuid

token = jwt.encode({"sub": "synthorg-a", "aud": "synthorg-b"}, "shared-secret-do-not-commit", algorithm="HS256")

payload = {
"jsonrpc": "2.0",
"id": str(uuid.uuid4()),
"method": "tasks/get",
"params": {"task_id": "task-12345"},
}
resp = httpx.post(
"http://localhost:8001/a2a",
json=payload,
headers={"Authorization": f"Bearer {token}"},
)
print(resp.json())
```

Expected outcomes:

- `200` with a `result` block when the task exists.
- `404` (mapped to JSON-RPC error `-32602` with `data.code: "task_not_found"`) when the task is unknown.
- `403` when the bearer JWT does not validate (peer secret mismatch or `aud` claim incorrect).

## Observability

Every inbound JSON-RPC call emits these events:

- `a2a.jsonrpc.received`: at envelope decode; carries `peer`, `method`, `id`.
- `api.boundary.validation_failed`: when `parse_rpc_params` rejects a malformed `params` block.
- `a2a.jsonrpc.dispatched`: at successful method dispatch.
- `a2a.jsonrpc.error`: at error path (with `code` and `message`).

The `a2a.dispatch_latency_seconds` histogram has a `method` label so per-RPC latency is easy to chart.

## Threat model + extension

The boundary check is the only validation gate; downstream handlers MUST treat their typed `params` as already-validated.

To add a new method:

1. Define an `A2A<Method>Params` Pydantic model under `src/synthorg/a2a/rpc_params.py`.
2. Add it to the `A2ARpcParams` discriminated union.
3. Register the handler in the gateway registry.
4. Add the method name to the per-peer `a2a.methods_enabled` allowlist.
5. Cover the wire shape in `tests/unit/a2a/test_<method>.py`.

See [docs/reference/typed-boundaries.md](../reference/typed-boundaries.md) for the boundary contract and [docs/design/a2a.md](../design/a2a.md) for the full protocol design.
Loading
Loading