Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`TestWriteResponseShapes`** test class — sentinel-scan parametrized over every write tool plus two registry-completeness tests. Catches regressions when a future write tool echoes caller-supplied payload fields. The `ECHO_EXEMPTIONS` registry doubles as the executable spec for what counts as a primary handle vs a payload echo ([#243](https://github.com/cmeans/mcp-awareness/issues/243)).

### Changed
- **Design doc** — `docs/design/hybrid-retrieval-multilingual.md` — close Phase 1.05 (extension selection) with **option 3 — defer non-Western language support from Layer 1**. Decision recorded 2026-04-11 after the empirical PG17.9 verification ([#257](https://github.com/cmeans/mcp-awareness/pull/257)) returned a definitive negative on pgroonga regconfig integration, ruling out the original "install pgroonga, use the 4 entries" path. The trilemma was resolved in favor of pragmatic Layer 1 shipping: at 1 month into mcp-awareness development with no public users and no signal on multilingual demand, Layer 1 ships with the 28 stock snowball regconfigs and `simple` as the fallback for everything else. CJK + Hebrew + Thai + Khmer support becomes a deliberate follow-up release when actual demand surfaces. The decision tree (per-language parser extensions, branched-pgroonga path, or external search index) is preserved in the design doc for the future evaluation, with the empirical verification status of each option documented in the "Verified empirical results for future reference" subsection — zhparser confirmed via context7 during [#246](https://github.com/cmeans/mcp-awareness/pull/246), pgroonga 4.0.6 empirically ruled out by [#257](https://github.com/cmeans/mcp-awareness/pull/257)'s PG17.9 verification, Typesense 29.0 empirically tested in a 20-operation spike on 2026-04-11 (see awareness `typesense-spike-2026-04-11` and `~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md` for the full test matrix), and Meilisearch documented per its official documentation reviewed via context7 against `/meilisearch/documentation` on 2026-04-11 but not empirically tested. Phase 3 (non-Western language extension install) is reframed as a wiring-PR follow-on contingent on demand. The managed-Postgres compatibility section is reframed as contingent on Phase 3 reactivation. Closes [#249](https://github.com/cmeans/mcp-awareness/issues/249) (gating question answered, mechanism chosen) and [#248](https://github.com/cmeans/mcp-awareness/issues/248) (original premise — measure pgroonga regconfig memory cost — moot since those regconfigs do not exist; surviving stock-snowball measurement scope deferred as below-the-line for current scale).
- **Design doc** — `docs/design/hybrid-retrieval-multilingual.md` — record the empirical PG17.9 verification results for Steps 0 and 1 of the schema verification task ([#249](https://github.com/cmeans/mcp-awareness/issues/249)). **Step 0 (Substantive 3, gating): pgroonga 4.0.6 does not register any regconfigs in `pg_ts_config`** — verified by capturing `SELECT cfgname FROM pg_ts_config` before and after `CREATE EXTENSION pgroonga` against `groonga/pgroonga:latest-alpine-17`; both queries returned the same 29 rows (28 stock snowball + `simple`). `to_tsvector('japanese', '...')` errors with `text search configuration "japanese" does not exist`. The pgroonga extension is functional under its documented integration model (`USING pgroonga` index access method + `&@` operator successfully indexes/queries Japanese and Chinese content); the regconfig absence is by design, not a packaging bug. **Step 1 (Substantive 2, generated-column pattern): works on PG17.9** — `tsv tsvector GENERATED ALWAYS AS (to_tsvector(language, content)) STORED` is accepted at `CREATE TABLE`, populates correctly per row's regconfig, regenerates dynamically when `language` is updated, works with a standard GIN index (`Bitmap Index Scan` confirmed via `EXPLAIN ANALYZE` with `enable_seqscan=off`), and fails at INSERT time when handed a missing regconfig — exactly the case the startup-cache validation is designed to catch. The trigger-based fallback is therefore not needed for the wiring PR (kept in the design doc as documented escape hatch). One Step 1 checkbox remains open: confirming the combined hybrid CTE plan uses both HNSW and GIN indexes (requires a `pgvector` + chosen-non-Western-FTS image, deferred to the wiring PR). Step 2 (#248 memory measurement) and Step 3 (RDS compatibility) remain open and now contingent on Phase 1.05's mechanism choice. **Phase 1.05 (extension selection) is now the load-bearing open decision** — the original "install pgroonga, use the 4 entries" path is empirically ruled out, leaving the three documented options: per-language parser extensions like zhparser, pgroonga with a branched query path, or deferral of non-Western support from Layer 1.
- **Design doc** — `docs/design/hybrid-retrieval-multilingual.md` — record the pgroonga regconfig finding from [#246](https://github.com/cmeans/mcp-awareness/pull/246)'s QA cycle (rounds 3–5): pgroonga's documented integration is its own PostgreSQL index access method, not the standard `regconfig` registry the Layer 1 design assumes. Layer 1's verification task (Substantive 2) is now gated on a new Substantive 3 task (Step 0 of the revised verification): does pgroonga even register the assumed regconfigs in `pg_ts_config`? Tracked as [#249](https://github.com/cmeans/mcp-awareness/issues/249); [#248](https://github.com/cmeans/mcp-awareness/issues/248) (Postgres memory cost) is now blocked on [#249](https://github.com/cmeans/mcp-awareness/issues/249). Adds zhparser as a verified counter-example proving the design pattern (regconfig → tsvector → GIN → standard FTS operators) works for non-Western languages with the right extension, but only for Chinese; Japanese / Korean / Hebrew equivalents are not yet verified. Defers non-Western FTS mechanism selection to the wiring PR (new Phase 1.05) with three explicit options: per-language parser extensions, pgroonga with a branched query path, or deferral from Layer 1. Phase 3 (non-Western language extension install) is reframed to cover all three options. Risk section and managed-Postgres compatibility analysis updated to reflect that extension choice is open.
- **perf:** trim echoed input from write-tool responses to reduce token waste ([#243](https://github.com/cmeans/mcp-awareness/issues/243)). Five tools change; eight retain handles or server-derived fields only. Static `action` strings are dropped because they carry zero information on tools whose value is hard-coded.
Expand Down
Loading