Skip to content

refactor(memory): per-call processing_kwargs + observability for ST encode#1943

Merged
Aureliolo merged 4 commits into
mainfrom
refactor/st-kwargs-logfire-levels
May 16, 2026
Merged

refactor(memory): per-call processing_kwargs + observability for ST encode#1943
Aureliolo merged 4 commits into
mainfrom
refactor/st-kwargs-logfire-levels

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Two-stage refactor of src/synthorg/memory/embedding/fine_tune.py to adopt the sentence-transformers 5.5 per-call processing_kwargs override.

Stage 1 (first commit, b978eadc4): the 2 encode calls in mine_hard_negatives and the 4 in evaluate_checkpoint now pass distinct processing_kwargs={"text": {"max_length": N, "truncation": True}} for short queries (_QUERY_MAX_LENGTH = 128) vs long passages (_PASSAGE_MAX_LENGTH = 512) instead of relying on the model tokenizer default. Two new module-level Final[int] constants alongside the existing _DEFAULT_* block. Initial pair of unit tests asserting kwargs forwarding via a SentenceTransformer spy.

Stage 2 (second commit, 3db012f8e): pre-PR review pipeline addressed 14 findings:

  • Silent error masking via truncation=True (resilience-audit, Critical): added 2 event constants (memory.fine_tune.encode_invoked, memory.fine_tune.encode_truncation_likely) and an _encode_with_observability helper that emits a DEBUG log per encode call (role / model / max_length / batch_size) and a WARNING when any text's word count exceeds max_length / _TOKENS_PER_WORD (1.33). Quality regressions are no longer silent.
  • Docs drift (docs-consistency, Critical/Major): documented the new query / passage tokenizer caps in docs/reference/embedding-evaluation.md Stage 2/3 and docs/design/memory.md "Domain-Specific Embedding Fine-Tuning". Evaluation stage now explicitly references the same caps.
  • Test assertion fragility (pr-test-analyzer + test-quality-reviewer, Critical/Major): assertions now compare the full kwargs dict (including show_progress_bar=False), not just the processing_kwargs sub-dict. Order-dependent positional unpacking replaced with role-keyed lookup ((model_name, texts) -> call).
  • Spy resilience (test-quality-reviewer, Medium): _RecordingEncoder now raises if production switches to encode_query() / encode_document(), so the alternate sentence-transformers 3+/5+ API can't silently no-op the spy.
  • Function-length breaches (conventions-enforcer, Major, pre-existing): mine_hard_negatives 68 -> 49 lines; evaluate_checkpoint 78 -> 43 lines. Both now under the 50-line CLAUDE.md limit. Extracted helpers: _encode_with_observability, _encode_query_passage_pair, _load_query_passage_pairs, _persist_triples, _mine_negatives_from_pairs, _run_eval_pipeline, _persist_eval_metrics, _report_progress, _select_hard_negatives.
  • WHAT-not-WHY comments (comment-analyzer, Major): removed two comments that restated what the unpacking line below them already showed.
  • Coverage gap (pr-test-analyzer, Minor): added two truncation-warning tests (test_emits_truncation_warning_for_long_query, test_emits_truncation_warning_for_long_passage) that exercise the new WARNING path.
  • Type-design info (type-design-analyzer): added a _FakeSentenceTransformersModule Protocol for the test fake's return type plus a 2-line comment on the np.eye(max(len(texts), 1), ...)[: len(texts)] empty-input guard.

Track B no-op (logfire level=)

The 2026-05-16 bump also added level= to @logfire.instrument(...). A repo-wide grep finds zero such decorators in src/synthorg/; all observability is OpenTelemetry tracer.start_as_current_span(...) which has no level= kwarg. The kwarg has no applicable target here, so this PR ships Track A only. Detail in the plan file (.claude/plans/polished-twirling-stonebraker.md).

Test plan

  • uv run ruff check src/synthorg/ tests/ -> clean.
  • uv run mypy src/synthorg/ tests/ -> clean (3816 source files).
  • uv run python -m pytest tests/ -m unit -n 8 -> 28957 passed, 18 skipped (logfire extra + POSIX-only paths), 146.8 s.
  • uv run pre-commit run --files <all modified> -> all gates green; no em-dashes, no reviewer citations, doc-count floors satisfied, magic-numbers gate clean.

Review coverage

Pre-reviewed by 16 agents (docs-consistency, code-reviewer, python-reviewer, type-design-analyzer, async-concurrency-reviewer, comment-quality-rot, pr-test-analyzer, comment-analyzer, logging-audit, resilience-audit, conventions-enforcer, test-quality-reviewer, plus the 4 codebase-audit mini-pass agents). Full triage table: _audit/pre-pr-review/triage.md. 14 actionable items + 3 info; 14 fixed, 2 user-skipped (item 14 pre-existing PEP 758 comment in untouched code; item 17 sub-agent's own Bash policy note, not actionable).

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 16, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: ASSERTIVE

Plan: Pro

Run ID: f2da10bd-6067-44db-bd55-a8bc805d9af9

📥 Commits

Reviewing files that changed from the base of the PR and between b92ec03 and 0c5f90f.

📒 Files selected for processing (7)
  • .gemini/config.yaml
  • .gemini/styleguide.md
  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
  • src/synthorg/memory/embedding/fine_tune.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Backend
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Runtime Stats Freshness Gate
  • GitHub Check: Lint
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1); env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site per docs/reference/configuration-precedence.md
No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal) per check_no_magic_numbers.py
Use PEP 758 exception syntax; no except A, B: without parens unless binding; no from __future__ import annotations (Python 3.14 has PEP 649)
Type hints on public functions; mypy strict mode required
Use Google-style docstrings
Error classes follow <Domain><Condition>Error naming from DomainError; never inherit Exception/RuntimeError directly per check_domain_error_hierarchy.py
Pydantic v2 frozen models with extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived fields; use NotBlankStr for identifiers
Args models at every system boundary; use parse_typed() for every external dict ingestion per check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helper functions catch Exception (re-raise MemoryError/RecursionError)
Clock seam: inject clock: Clock | None = None; tests inject FakeClock for time-dependent code
Lifecycle services own _lifecycle_lock; timed-out stops mark services as unrestartable
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML per sec-prompt-safety.md
Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), `query(......

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py
**/*.{md,rst}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md

Files:

  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
docs/**/*.{md,d2,mermaid}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (theme 200, Dark Mauve, v0.7.1) for architecture/nested containers; use mermaid for flowcharts/sequence/pipelines; Markdown tables for tabular data

Files:

  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% minimum
Windows unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by check_mock_spec.py (zero-tolerance)
FakeClock and mock_of import from tests._shared; inject via clock= and helper subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: never skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large) for timeouts

Files:

  • tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
  • src/synthorg/memory/embedding/fine_tune.py
🔇 Additional comments (26)
.gemini/config.yaml (1)

1-18: LGTM!

.gemini/styleguide.md (1)

1-148: LGTM!

docs/design/memory.md (1)

268-276: Hardcoded numeric caps remain instead of runtime-stats markers.

The numeric values max_length=512, max_length=128, and the implicit token caps referenced in lines 275-276 should be sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers (e.g., <!--RS:MAX_PASSAGE_LENGTH--> and <!--RS:MAX_QUERY_LENGTH-->).

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

docs/reference/embedding-evaluation.md (2)

185-188: Hardcoded numeric caps should use runtime-stats markers.

The literal values _QUERY_MAX_LENGTH = 128 and _PASSAGE_MAX_LENGTH = 512 should be replaced with <!--RS:QUERY_MAX_LENGTH--> and <!--RS:PASSAGE_MAX_LENGTH--> markers to source these values from data/runtime_stats.yaml.

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".


202-203: Hardcoded token cap references should use runtime-stats markers.

The hardcoded values query (128) / passage (512) should be replaced with runtime-stats markers (e.g., <!--RS:QUERY_MAX_LENGTH--> and <!--RS:PASSAGE_MAX_LENGTH-->).

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

src/synthorg/observability/events/memory.py (1)

116-119: LGTM!

src/synthorg/memory/embedding/fine_tune.py (13)

30-31: LGTM!


87-93: LGTM!


95-98: LGTM!


101-134: LGTM!


137-166: LGTM!


169-214: LGTM!


217-225: LGTM!


464-478: LGTM!


481-507: LGTM!


509-551: LGTM!


742-754: LGTM!


757-815: LGTM!


818-852: LGTM!

tests/unit/memory/embedding/test_fine_tune.py (7)

5-6: LGTM!

Also applies to: 9-9, 12-12, 15-16


31-83: LGTM!


86-100: LGTM!


287-322: LGTM!


323-359: LGTM!


361-414: LGTM!


503-576: LGTM!


Walkthrough

This pull request refactors the embedding fine-tuning pipeline (Stages 2 and 4) to enforce per-role token-length constraints and improve observability. It adds two observability events, shared encoding helpers that apply different max-length caps for queries (128 tokens) and passages (512 tokens) with truncation enabled and a truncation-likely warning, refactors hard-negative mining and evaluation to use these helpers and centralized persistence, extends unit tests to assert per-call max_length behavior and warnings, and updates docs and Gemini review configuration.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main refactoring work: adopting per-call processing_kwargs for sentence-transformers encode and adding observability instrumentation.
Description check ✅ Passed Description comprehensively covers the two-stage refactor, specific changes made, findings addressed, and test results, with full relevance to the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 53.13% which is sufficient. The required threshold is 40.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 16, 2026 17:19 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces explicit token limits and truncation observability for the embedding fine-tuning pipeline, refactoring the encoding logic to ensure consistency between mining and evaluation. The reviewer identified critical runtime NameError risks caused by type hints for CancellationToken and EvalMetrics being evaluated before their imports. Additionally, the reviewer noted that the cancellation token was bypassed during the encoding phase of hard negative mining, which prevents the process from being interruptible.

model_name: str,
queries: list[str],
passages: list[str],
cancellation: CancellationToken | None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The type hint CancellationToken will cause a NameError at runtime because it is only imported within a TYPE_CHECKING block. Since from __future__ import annotations is not used in this file, type hints in function signatures are evaluated at module definition time. You should use a string forward reference (e.g., "CancellationToken | None") for any type hints that are not available at runtime. This applies to several other new functions in this module as well.

    cancellation: "CancellationToken | None",

passages: list[str],
cancellation: CancellationToken | None,
progress_callback: ProgressCallback | None,
) -> EvalMetrics:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The return type hint EvalMetrics will cause a NameError at runtime because it is not imported at the module level (it is only present in the TYPE_CHECKING block and as a local import inside the function body). Use a string forward reference to avoid this issue.

) -> "EvalMetrics":

Comment on lines +790 to +791
eval_metrics_cls: type[EvalMetrics],
) -> EvalMetrics:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Similar to other occurrences in this refactor, EvalMetrics is not defined at the module level and will cause a NameError when the function signature is evaluated during module import. Use string forward references for both the argument and the return type.

    eval_metrics_cls: type["EvalMetrics"],
) -> "EvalMetrics":

model_name=model_name,
queries=queries,
passages=passages,
cancellation=None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The cancellation token is explicitly set to None here, which bypasses cancellation checks during the encoding phase of hard negative mining. This is typically the most time-consuming part of the process and should remain interruptible by passing the cancellation token received by the parent function.

        cancellation=cancellation,

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 16, 2026

Merging this PR will not alter performance

✅ 33 untouched benchmarks
⏩ 21 skipped benchmarks1


Comparing refactor/st-kwargs-logfire-levels (0c5f90f) with main (60364ca)

Open in CodSpeed

Footnotes

  1. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/memory.md`:
- Around line 268-271: Replace hardcoded numeric caps in the prose with
runtime-stats markers: substitute "max_length=512" and "max_length=128" with
<!--RS:MAX_PASSAGE_LENGTH--> and <!--RS:MAX_QUERY_LENGTH--> respectively, and
replace any explicit "token cap" numeric mentions on the later lines (275-276)
with an appropriate <!--RS:...--> marker (e.g., <!--RS:TOKEN_CAP-->); keep the
surrounding text and the warning reference
memory.fine_tune.encode_truncation_likely WARNING unchanged, and ensure the
marker names match keys present in data/runtime_stats.yaml.

In `@docs/reference/embedding-evaluation.md`:
- Around line 197-198: The documentation line "Evaluation (NDCG@10, Recall@10)
re-applies the same query (128) / passage (512) token caps with truncation
enabled..." is inconsistently placed under Stage 3 wording; update the sentence
to explicitly place evaluation under Stage 4 (Deploy) to match the pipeline
stage labels — e.g., change the wording to "Stage 4 — Evaluation (NDCG@10,
Recall@10) re-applies..." or otherwise move the evaluation description into the
Stage 4 section so that the phrase "Evaluation (NDCG@10, Recall@10)..." clearly
belongs to Stage 4 and removes any ambiguity about stage ownership.
- Around line 185-188: Replace the hardcoded numeric caps "128" and "512" in the
embedding-evaluation doc with runtime-stats markers: substitute the literal
`_QUERY_MAX_LENGTH = 128` and `_PASSAGE_MAX_LENGTH = 512` occurrences with
`<!--RS:QUERY_MAX_LENGTH-->` and `<!--RS:PASSAGE_MAX_LENGTH-->` respectively
(also update the second occurrence around lines 197-198), keeping the
surrounding wording and the `processing_kwargs={"text": {"max_length": NAME,
"truncation": True}}` example and the
`memory.fine_tune.encode_truncation_likely` log reference unchanged so the doc
pulls values from data/runtime_stats.yaml at build time.

In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 446-472: The _mine_negatives_from_pairs function currently
disables cancellation by passing cancellation=None to
_encode_query_passage_pair; update the call to forward the received cancellation
token (use cancellation=cancellation) so encoding respects cancellation checks
like _run_eval_pipeline does; modify the _encode_query_passage_pair invocation
in _mine_negatives_from_pairs to pass through the cancellation parameter and
keep all other args unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4eca8776-9a12-47a6-96af-53b369016b97

📥 Commits

Reviewing files that changed from the base of the PR and between df1f946 and 3db012f.

📒 Files selected for processing (5)
  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
  • src/synthorg/memory/embedding/fine_tune.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Build Backend
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1); env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site per docs/reference/configuration-precedence.md
No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal) per check_no_magic_numbers.py
Use PEP 758 exception syntax; no except A, B: without parens unless binding; no from __future__ import annotations (Python 3.14 has PEP 649)
Type hints on public functions; mypy strict mode required
Use Google-style docstrings
Error classes follow <Domain><Condition>Error naming from DomainError; never inherit Exception/RuntimeError directly per check_domain_error_hierarchy.py
Pydantic v2 frozen models with extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived fields; use NotBlankStr for identifiers
Args models at every system boundary; use parse_typed() for every external dict ingestion per check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helper functions catch Exception (re-raise MemoryError/RecursionError)
Clock seam: inject clock: Clock | None = None; tests inject FakeClock for time-dependent code
Lifecycle services own _lifecycle_lock; timed-out stops mark services as unrestartable
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML per sec-prompt-safety.md
Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), `query(......

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py
**/*.{md,rst}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md

Files:

  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
docs/**/*.{md,d2,mermaid}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (theme 200, Dark Mauve, v0.7.1) for architecture/nested containers; use mermaid for flowcharts/sequence/pipelines; Markdown tables for tabular data

Files:

  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% minimum
Windows unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by check_mock_spec.py (zero-tolerance)
FakeClock and mock_of import from tests._shared; inject via clock= and helper subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: never skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large) for timeouts

Files:

  • tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
  • src/synthorg/memory/embedding/fine_tune.py
🔇 Additional comments (11)
src/synthorg/observability/events/memory.py (1)

116-119: LGTM!

src/synthorg/memory/embedding/fine_tune.py (7)

30-31: LGTM!


87-93: LGTM!


95-99: LGTM!


101-134: LGTM!


137-164: LGTM!


167-190: LGTM!


475-476: LGTM!

Also applies to: 479-516, 707-780

tests/unit/memory/embedding/test_fine_tune.py (3)

31-83: LGTM!


86-101: LGTM!


287-521: LGTM!

Comment thread docs/design/memory.md
Comment on lines +268 to +271
2. **Hard negative mining**: base model embeds all passages (max_length=512) and queries
(max_length=128) with truncation enabled; top-k semantically similar but non-matching
passages become hard negatives. Inputs that overflow the token cap surface a
`memory.fine_tune.encode_truncation_likely` WARNING so silent quality loss is visible
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use runtime-stats markers for newly added numeric values in public docs.

The newly added numeric caps are hardcoded in prose. Please source these numerics via <!--RS:NAME--> markers to satisfy the docs data-sourcing rule.

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

Also applies to: 275-276

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/design/memory.md` around lines 268 - 271, Replace hardcoded numeric caps
in the prose with runtime-stats markers: substitute "max_length=512" and
"max_length=128" with <!--RS:MAX_PASSAGE_LENGTH--> and
<!--RS:MAX_QUERY_LENGTH--> respectively, and replace any explicit "token cap"
numeric mentions on the later lines (275-276) with an appropriate <!--RS:...-->
marker (e.g., <!--RS:TOKEN_CAP-->); keep the surrounding text and the warning
reference memory.fine_tune.encode_truncation_likely WARNING unchanged, and
ensure the marker names match keys present in data/runtime_stats.yaml.

Comment on lines +185 to +188
- Encoding constraints: per-call `processing_kwargs={"text": {"max_length": N, "truncation": True}}`
with `_QUERY_MAX_LENGTH = 128` for queries and `_PASSAGE_MAX_LENGTH = 512` for passages.
Inputs whose word count likely exceeds the token limit emit a
`memory.fine_tune.encode_truncation_likely` WARNING log.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Replace hardcoded numeric caps with runtime-stats markers.

These newly introduced public-doc numerics should be injected via <!--RS:NAME--> markers instead of inline literals.

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

Also applies to: 197-198

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/reference/embedding-evaluation.md` around lines 185 - 188, Replace the
hardcoded numeric caps "128" and "512" in the embedding-evaluation doc with
runtime-stats markers: substitute the literal `_QUERY_MAX_LENGTH = 128` and
`_PASSAGE_MAX_LENGTH = 512` occurrences with `<!--RS:QUERY_MAX_LENGTH-->` and
`<!--RS:PASSAGE_MAX_LENGTH-->` respectively (also update the second occurrence
around lines 197-198), keeping the surrounding wording and the
`processing_kwargs={"text": {"max_length": NAME, "truncation": True}}` example
and the `memory.fine_tune.encode_truncation_likely` log reference unchanged so
the doc pulls values from data/runtime_stats.yaml at build time.

Comment thread docs/reference/embedding-evaluation.md Outdated
Comment thread src/synthorg/memory/embedding/fine_tune.py
@codecov
Copy link
Copy Markdown

codecov Bot commented May 16, 2026

Codecov Report

❌ Patch coverage is 90.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.20%. Comparing base (60364ca) to head (0c5f90f).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/memory/embedding/fine_tune.py 89.77% 4 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1943      +/-   ##
==========================================
+ Coverage   85.14%   85.20%   +0.06%     
==========================================
  Files        1846     1846              
  Lines      107753   107802      +49     
  Branches     9287     9288       +1     
==========================================
+ Hits        91741    91855     +114     
+ Misses      13763    13694      -69     
- Partials     2249     2253       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Aureliolo Aureliolo force-pushed the refactor/st-kwargs-logfire-levels branch from 3db012f to b92ec03 Compare May 16, 2026 17:34
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 16, 2026 17:36 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/synthorg/memory/embedding/fine_tune.py`:
- Around line 173-179: The code currently assumes every JSONL record has "query"
and "positive_passage" and will raise opaque KeyError; iterate the loaded pairs
and validate each record at the ingestion boundary (use parse_typed() or a small
schema checker) to ensure required fields exist and are the right type, raising
a contextual ValueError identifying the bad record (index and missing/invalid
fields) if validation fails; after validation (or after filtering out invalid
rows if intended) enforce require_non_empty and then build queries = [p["query"]
for p in validated_pairs] and passages = [p["positive_passage"] for p in
validated_pairs]; reference _read_jsonl, pairs, require_non_empty, parse_typed()
and the returned queries/passages to locate where to add this check.
- Around line 146-154: The code currently calls _encode_with_observability to
produce q_embs before honoring a potential pre-cancellation; add an early
cancellation check (call cancellation.check()) immediately before invoking
_encode_with_observability so a cancelled run skips the expensive encode. Update
the block around q_embs = await _encode_with_observability(...) to call
cancellation.check() if cancellation is not None prior to awaiting the encode,
leaving the existing post-encode check in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d7b39860-2367-4847-88d8-6f0375d2306b

📥 Commits

Reviewing files that changed from the base of the PR and between 3db012f and b92ec03.

📒 Files selected for processing (7)
  • .gemini/config.yaml
  • .gemini/styleguide.md
  • docs/design/memory.md
  • docs/reference/embedding-evaluation.md
  • src/synthorg/memory/embedding/fine_tune.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Build Fine-Tune (gpu, fine-tune-gpu)
  • GitHub Check: Build Fine-Tune (cpu, fine-tune-cpu)
  • GitHub Check: Build Backend
  • GitHub Check: CodSpeed Python benchmarks
  • GitHub Check: Build Web Assets (melange)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Type Check
  • GitHub Check: Lighthouse Site
  • GitHub Check: Build Preview
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Configuration precedence: DB > env > code default via SettingsService/ConfigResolver (Cat-1); env > code default (Cat-2, read_only_post_init); Cat-3 bootstrap secrets pure env at boot site per docs/reference/configuration-precedence.md
No os.environ.get outside startup; pre-init Cat-2 reads use settings.bootstrap_resolver.resolve_init_value
No hardcoded numeric values; numerics live in settings/definitions/; allowlist only 0/1/-1, HTTP codes, hex masks, powers-of-2, and module-level annotated named constants (NAME: int|float|Final|Final[int]|Final[float] = literal) per check_no_magic_numbers.py
Use PEP 758 exception syntax; no except A, B: without parens unless binding; no from __future__ import annotations (Python 3.14 has PEP 649)
Type hints on public functions; mypy strict mode required
Use Google-style docstrings
Error classes follow <Domain><Condition>Error naming from DomainError; never inherit Exception/RuntimeError directly per check_domain_error_hierarchy.py
Pydantic v2 frozen models with extra="forbid" on API DTOs (Request/Response/Snapshot/Result/Envelope/Status/Info/Summary suffixes); use @computed_field for derived fields; use NotBlankStr for identifiers
Args models at every system boundary; use parse_typed() for every external dict ingestion per check_boundary_typed.py
Immutability: use model_copy(update=...) or copy.deepcopy(); deepcopy at system boundaries
Async: use asyncio.TaskGroup for fan-out/fan-in; helper functions catch Exception (re-raise MemoryError/RecursionError)
Clock seam: inject clock: Clock | None = None; tests inject FakeClock for time-dependent code
Lifecycle services own _lifecycle_lock; timed-out stops mark services as unrestartable
Untrusted content (SEC-1): use wrap_untrusted() from engine.prompt_safety; use HTMLParseGuard for HTML per sec-prompt-safety.md
Repository CRUD: implement save(entity), get(id), delete(id) -> bool, list_items(...), `query(......

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/embedding/fine_tune.py
**/*.{md,rst}

📄 CodeRabbit inference engine (CLAUDE.md)

Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md

Files:

  • docs/reference/embedding-evaluation.md
  • docs/design/memory.md
docs/**/*.{md,d2,mermaid}

📄 CodeRabbit inference engine (CLAUDE.md)

Use D2 (theme 200, Dark Mauve, v0.7.1) for architecture/nested containers; use mermaid for flowcharts/sequence/pipelines; Markdown tables for tabular data

Files:

  • docs/reference/embedding-evaluation.md
  • docs/design/memory.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: use @pytest.mark.{unit,integration,e2e,slow}; async auto; timeout 30s global; coverage 80% minimum
Windows unit tests use WindowsSelectorEventLoopPolicy (3.14 IOCP teardown race); subprocess tests override back
Test doubles: use FakeClock for Clock seam, mock_of[T](**overrides) for typed-boundary substitutions, SimpleNamespace for attribute-bags; bare MagicMock at typed boundary blocked by check_mock_spec.py (zero-tolerance)
FakeClock and mock_of import from tests._shared; inject via clock= and helper subscript
Hypothesis: 10 deterministic CI examples; failures are real bugs (fix + add @example(...))
Flaky tests: never skip/xfail; fix fundamentally; use asyncio.Event().wait() not sleep(large) for timeouts

Files:

  • tests/unit/memory/embedding/test_fine_tune.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/embedding/test_fine_tune.py
🧠 Learnings (1)
📚 Learning: 2026-05-05T09:04:46.195Z
Learnt from: Aureliolo
Repo: Aureliolo/synthorg PR: 1760
File: scripts/_dual_backend_parity_lib.py:215-216
Timestamp: 2026-05-05T09:04:46.195Z
Learning: This repository targets Python 3.14+ and follows PEP 758. Therefore, reviewer tooling should NOT treat unparenthesized multi-exception `except` clauses written without an `as` clause (e.g., `except MemoryError, RecursionError:`) as syntax errors. Only flag `except`-clause problems when they are genuinely invalid for Python 3.14+.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/embedding/test_fine_tune.py
  • src/synthorg/memory/embedding/fine_tune.py
🔇 Additional comments (4)
.gemini/config.yaml (1)

1-18: LGTM!

.gemini/styleguide.md (1)

1-147: LGTM!

docs/design/memory.md (1)

268-271: Use runtime-stats markers instead of inline numeric caps in public docs.

This still uses hardcoded numeric values in prose for token caps.

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

Also applies to: 275-276

docs/reference/embedding-evaluation.md (1)

185-188: Replace hardcoded token-cap numerics with runtime-stats markers.

Public-doc numeric values are still inline literals instead of marker-based sourced values.

As per coding guidelines, "Numerics in README and public docs sourced from data/runtime_stats.yaml via <!--RS:NAME--> markers per data/README.md".

Also applies to: 202-203

Comment thread src/synthorg/memory/embedding/fine_tune.py
Comment thread src/synthorg/memory/embedding/fine_tune.py
Aureliolo added 4 commits May 16, 2026 19:56
…1 reviewer-config)

- Forward cancellation token in _mine_negatives_from_pairs so the
  encoding phase of hard-negative mining is interruptible
  (CodeRabbit + Gemini, fine_tune.py:462).
- Restructure embedding-evaluation.md pipeline stages: split the old
  Stage 3 evaluation bullet into a dedicated Stage 4, renumber Deploy
  to Stage 5 so the doc matches the fine_tune.py module header
  (5-stage pipeline) and removes stage-ownership ambiguity (CodeRabbit).
- Add .gemini/styleguide.md + .gemini/config.yaml so Gemini Code Assist
  stops flagging PEP 649 TYPE_CHECKING annotations as NameError risks
  on every PR, and reads project conventions from CLAUDE.md.

Skipped (factually wrong against current code):
- Gemini #1/#2/#3 (NameError on CancellationToken / EvalMetrics in
  TYPE_CHECKING block): Python 3.14 PEP 649 evaluates annotations
  lazily; existing unquoted CancellationToken | None at fine_tune.py:568
  has been in CI without issue. CLAUDE.md mandates this convention.
- CodeRabbit RS-marker findings on docs/design/memory.md and
  docs/reference/embedding-evaluation.md: out of scope for
  scripts/check_doc_numeric_macros.py, which scopes to README + 4 docs
  (neither file present) and matches digits only adjacent to specific
  stat nouns (tests/providers/agents/stars/releases).
- Add an early cancellation.check() before the query encode in
  _encode_query_passage_pair so a pre-cancelled run skips the
  expensive first encode call (CodeRabbit, fine_tune.py:154).
- Validate JSONL records at the ingestion boundary in
  _load_query_passage_pairs: every record must have the 'query' and
  'positive_passage' string fields. Raise contextual ValueError /
  TypeError identifying the bad record index, and log
  MEMORY_FINE_TUNE_VALIDATION_FAILED so silent data-shape regressions
  surface during fine-tune runs (CodeRabbit, fine_tune.py:179).
- Cover both new failure paths in test_fine_tune.py
  (missing-field + non-string-field records).

Skipped (factually wrong against current code, same as round 1):
- CodeRabbit RS-marker findings on docs/design/memory.md and
  docs/reference/embedding-evaluation.md: both files remain out of
  scope for scripts/check_doc_numeric_macros.py, which scans only
  README + 4 specific docs and only matches digits adjacent to
  tracked stat nouns (tests/providers/agents/stars/releases).
  max_length=N is not a tracked stat noun and the gate would not
  fail on these lines.
@Aureliolo Aureliolo force-pushed the refactor/st-kwargs-logfire-levels branch from b92ec03 to 0c5f90f Compare May 16, 2026 18:00
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 16, 2026 18:01 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 3aa9d20 into main May 16, 2026
79 checks passed
@Aureliolo Aureliolo deleted the refactor/st-kwargs-logfire-levels branch May 16, 2026 18:20
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview May 16, 2026 18:20 — with GitHub Actions Inactive
Aureliolo pushed a commit that referenced this pull request May 17, 2026
<!-- HIGHLIGHTS_START -->
## Highlights

> _AI-generated summary (model: `openai/gpt-4.1-mini` via GitHub
Models). Commit-based changelog below._

### What you'll notice
- Frontend WP-6 update with UX polish improves user interface and
workflow.
- Dashboard and training endpoint improvements enhance observability and
dispatch behavior.
- Web storybook now supports change detection for more responsive UI
interactions.
- Git hooks now isolated per worktree for cleaner repository management.
- Providers automatically detect native streaming support in Litellm
models.

### What's new
- Added a new pipeline to convert Pydantic DTOs to TypeScript for better
front-end compatibility.

### Under the hood
- Refactored settings to three precedence categories, removing YAML tier
for simpler configuration.
- Completed RootConfig mirror coverage for enhanced configuration
consistency.
- Adopted API conventions with better query performance and forbidden
extra fields for stricter validation.
- Improved persistence, layer discipline, and restart safety in core
work packages.
- CI updated with split test jobs and tightened coverage gates for
better test quality.
- Switched to direct Trivy binary for security scans, removing previous
Trivy action dependency.
- Enhanced memory management with per-call processing options and better
observability during speech-to-text encoding.
- Various dependency updates for Python, infrastructure, and lock files
maintain security and stability.
- Removed TypeScript DTO type-tightening overlays to simplify type
management.
- Codebase audit tightened skill sets to prevent false positivity in
class detection by 2026.

<!-- HIGHLIGHTS_END -->

:robot: I have created a release *beep* *boop*
---


##
[0.8.5](v0.8.4...v0.8.5)
(2026-05-17)


### Features

* **codegen:** pydantic-to-typescript DTO pipeline + parity gate (closes
[#1889](#1889))
([#1909](#1909))
([0265ef5](0265ef5))
* **storybook:** enable changeDetection + trim web/CLAUDE.md
([#1939](#1939))
([3b1f4c0](3b1f4c0))
* **web,setup:** WP-6 frontend + UX polish
([#1941](#1941))
([d9ca76d](d9ca76d))


### Bug Fixes

* correct invalid git for-each-ref syntax in post-merge-cleanup skill
([#1946](#1946))
([69a1649](69a1649))
* dashboard polish, training endpoint dispatch, and observability
cleanup ([#1911](#1911))
([b61e9e8](b61e9e8))
* per-worktree git-hook isolation + hookify gate migration + MSW drift
fix ([#1949](#1949))
([e3f8495](e3f8495))
* **providers:** read supports_native_streaming from litellm model info
([#1942](#1942))
([60364ca](60364ca))
* security and audit coverage (closes
[#1883](#1883))
([#1904](#1904))
([d8ebf55](d8ebf55))


### Performance

* **ci:** mypy --num-workers=4 + enable ruff TID255
([#1944](#1944))
([484c1d3](484c1d3))


### Refactoring

* **ci:** drop aquasecurity/trivy-action, use direct trivy binary
([#1940](#1940))
([df1f946](df1f946))
* **memory:** per-call processing_kwargs + observability for ST encode
([#1943](#1943))
([3aa9d20](3aa9d20))
* Phase 7 follow-up — complete RootConfig mirror coverage (closes
[#1907](#1907))
([#1914](#1914))
([605500b](605500b))
* **settings:** collapse precedence to three categories; drop YAML tier
(closes [#1890](#1890))
([#1910](#1910))
([efd54c9](efd54c9))
* WP-3 API conventions + query performance + project-wide extra=forbid
([#1953](#1953))
([504d579](504d579)),
closes [#1918](#1918)
* WP-4 settings + cross-cutting (clock seam, contextvars, dispatch,
plugin surfaces)
([#1954](#1954))
([7207d92](7207d92))
* **wp1:** persistence + layer discipline + restart safety
([#1945](#1945))
([57586fb](57586fb))


### Documentation

* **wp5:** public-facing truth refresh
([#1924](#1924))
([afb5cc5](afb5cc5))


### CI/CD

* split test job by marker with airtight aggregate coverage gate
([#1948](#1948))
([0b818d5](0b818d5)),
closes [#1938](#1938)
[#1937](#1937)


### Maintenance

* **codebase-audit:** tighten skill to prevent 2026-05-15 FP classes
([#1923](#1923))
([9317ed1](9317ed1))
* Lock file maintenance
([#1913](#1913))
([c08a355](c08a355))
* Lock file maintenance
([#1950](#1950))
([8940ab1](8940ab1))
* remove TS DTO type-tightening overlays
([#1915](#1915))
([d296214](d296214)),
closes [#1906](#1906)
* Update Infrastructure dependencies
([#1928](#1928))
([d19fae5](d19fae5))
* Update Python dependencies
([#1929](#1929))
([75cc2c8](75cc2c8))
* **wp7:** hygiene, stubs, test/CI/tooling, doc gaps, boundary patterns
doc ([#1926](#1926))
([c29eb32](c29eb32))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: synthorg-repo-bot[bot] <279117679+synthorg-repo-bot[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant