Skip to content

docs: close Phase 1.05 with option 3 — defer non-Western from Layer 1#258

Merged
cmeans-claude-dev[bot] merged 2 commits into
mainfrom
docs/phase-1.05-closure-defer-non-western
Apr 12, 2026
Merged

docs: close Phase 1.05 with option 3 — defer non-Western from Layer 1#258
cmeans-claude-dev[bot] merged 2 commits into
mainfrom
docs/phase-1.05-closure-defer-non-western

Conversation

@cmeans-claude-dev
Copy link
Copy Markdown
Contributor

@cmeans-claude-dev cmeans-claude-dev Bot commented Apr 12, 2026

Summary

Closes Phase 1.05 (extension selection for non-Western language support) in docs/design/hybrid-retrieval-multilingual.md with option 3 — defer non-Western from Layer 1. Decision recorded 2026-04-11 after Phase 1.0's empirical PG17.9 verification (#257) returned a definitive negative on pgroonga regconfig integration, ruling out the original "install pgroonga, use the 4 entries" path.

Closes #248 and #249 (the gating verification issue and the memory-cost-measurement issue — both resolved with the same closure pass).

Context

Phase 1.05 was the open decision point in the design doc that gated the wiring PR's non-Western language scope. With #257 merged and the empirical verification done, the trilemma resolved as follows:

Option Status
Per-language parser extensions (zhparser + equivalents) Available for future reconsideration
pgroonga with branched query path Available for future reconsideration
Defer non-Western from Layer 1 SELECTED 2026-04-11
External search index (Typesense / Meilisearch) Available for future reconsideration

Rationale: at 1 month into mcp-awareness development with no public users and no real signal on multilingual demand, the pragmatic choice is to ship Layer 1 fast on the verified pattern (28 stock snowball regconfigs + simple fallback) and revisit non-Western language support as a deliberate follow-up release when actual demand surfaces.

Changes

  1. Phase 1.05 in the migration plan marked RESOLVED 2026-04-11 with the chosen option, rationale, and a note that no further language.py or docker-compose.yaml changes are needed for this phase
  2. "Non-Western language support" section header changed from "extension choice is OPEN" to "deferred from Layer 1, 2026-04-11". The trilemma table now shows the SELECTED row and the other three options as "available for future reconsideration." Added a "Verified empirical results for future reference" subsection that records what's known about each option (zhparser confirmed via context7 for Chinese, Japanese parser equivalent not found in context7's index, pgroonga 4.0.6 verified empirically on PG17.9, Typesense 29.0 verified empirically in a 20-test spike with the per-language-field caveat documented, Meilisearch documented but not empirically tested in this session)
  3. Phase 3 collapsed from three open sub-plans to a single deferred follow-on with the architecture options listed as starting points for the future evaluation. Removed the "If Phase 1.05 selects ..." branching structure since Phase 1.05 is now closed
  4. Managed-Postgres compatibility section reframed as contingent on Phase 3 reactivation, with a note that the external-search-index path is the strongest argument for that future evaluation since it's independent of Postgres extension support
  5. CHANGELOG [Unreleased] Changed entry at the top, summarizing the closure, citing docs: record PG17.9 verification results for #249 (Steps 0+1) #257 / Measure Postgres-side memory cost of regconfigs during PG17 verification pass #248 / Verify whether pgroonga registers japanese/chinese_simplified/korean/hebrew as Postgres regconfigs (blocks #248) #249, and noting the empirically tested options

Cross-references

Mechanical pre-commit checks

    • docs: record pgroonga regconfig finding from #246 QA cycle #251-round-1 merged-state audit — only two identifier references in the additions: ISO_639_1_TO_REGCONFIG and docker-compose.yaml. Both verified against current main: git show main:src/mcp_awareness/language.py confirms 28 stock snowball entries with the "CJK and Hebrew are intentionally NOT in this mapping" docstring section; git show main:docker-compose.yaml | grep image confirms pgvector/pgvector:pg17 with no pgroonga. Both claims in the closure ("ISO_639_1_TO_REGCONFIG already contains only the 28 stock snowball entries" and "docker-compose.yaml does not need a base-image swap") are accurate against merged state
    • Public-API name audit — N/A, pure docs PR, no code changes

QA

Prerequisites

None. Pure documentation PR. No Python, no tests, no dependencies, no build artifacts. CI runs the usual ruff/mypy/pytest safety net — expected to pass unchanged.

Manual tests

This is a doc-only PR. The QA is reading the changed sections to confirm the framing accurately represents the decision and the context.

    • Read the new Phase 1.05 RESOLVED 2026-04-11 entry in the migration plan section. Expected: option 3 explicitly named, rationale captured, no implementation actions remaining for this phase, points back at the verification subsection
    • Read the reframed "Non-Western language support" section with its header change and the updated trilemma table. Expected: the SELECTED row is clearly marked, the three "available for future reconsideration" rows preserve the analysis without the urgency, the "Verified empirical results for future reference" subsection accurately records what was tested in this session
    • Read the reframed Phase 3 section. Expected: collapsed from three branches to one deferral framing, architecture options listed as future-evaluation starting points
    • Read the reframed managed-Postgres compatibility section with the new "deferred — see Phase 1.05 closure above" header note. Expected: still preserves the original analysis as future starting point, but reframes it as contingent on Phase 3 reactivation

🤖 Generated with Claude Code

Phase 1.0's empirical PG17.9 verification (#257, executed against
groonga/pgroonga:latest-alpine-17 / pgroonga 4.0.6 on 2026-04-11)
returned a definitive negative on pgroonga regconfig integration,
ruling out the original "install pgroonga, use the 4 entries" path.
The trilemma in the design doc's Phase 1.05 was resolved 2026-04-11
in favor of option 3 — defer non-Western language support from
Layer 1.

Rationale: at 1 month into mcp-awareness development with no public
users and no real signal on multilingual demand, the pragmatic choice
is to ship Layer 1 fast on the verified pattern and revisit non-Western
support as a deliberate follow-up release when actual demand surfaces.

Decision tree preserved in the design doc for the future evaluation,
with empirically verified results recorded against each option:
- Per-language parser extensions: zhparser confirmed for Chinese via
  context7, Japanese parser equivalent not found in context7's index
- pgroonga with branched query path: extension is functional under its
  USING access method, but requires a branched lexical CTE arm and
  two indexes per searchable column — high complexity
- External search index (Typesense / Meilisearch): empirically tested
  in a 20-operation Typesense 29.0 spike on 2026-04-11, confirmed
  multilingual lexical + native vector hybrid via multi_search,
  caveat that requires per-language fields for non-Western languages
  and lacks ACID transactions / DB-enforced RLS

Sections updated:
- Phase 1.05 marked RESOLVED with the chosen option and rationale
- "Non-Western language support" section header changed to indicate
  deferral; the trilemma table now shows the SELECTED row and three
  "available for future reconsideration" rows
- Phase 3 collapsed from three open sub-plans to a deferred follow-on
  with the architecture options listed as starting points for the
  future evaluation
- Managed-Postgres compatibility section reframed as contingent on
  Phase 3 reactivation, with a note that the external-search-index
  path is the strongest argument for that future evaluation since
  it's independent of Postgres extension support

ISO_639_1_TO_REGCONFIG on main already contains only the 28 stock
snowball entries (CJK + Hebrew were removed preemptively in #246
round 5); no further language.py changes are needed for this closure.
docker-compose.yaml on main is pgvector/pgvector:pg17 with no
pgroonga; no base-image swap is needed.

Closes #248, #249.

Pure docs PR; no code changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cmeans-claude-dev cmeans-claude-dev Bot added the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 12, 2026
@cmeans-claude-dev cmeans-claude-dev Bot removed the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label Apr 12, 2026
@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 12, 2026

[QA] Starting review. Applied QA Active. This closes Phase 1.05 with option 3 after #257's empirical verification. Reading the diff, checking merged-state claims against main, and verifying the new empirical claims (particularly the Typesense 20-test spike).

@github-actions github-actions Bot removed the Ready for QA Dev work complete — QA can begin review label Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QA] Round 1 on PR #258

Verdict: QA Failed — three substantive findings, all variations of the same uncited-empirical-claim pattern that the rounds 3–7 discipline rules from #246 are meant to catch. The decision itself (option 3, defer non-Western from Layer 1) is well-reasoned and the structural reframing of the doc is solid. This is entirely about citation discipline on the Typesense / Meilisearch claims that were introduced in this PR.

Scope confirmation

Diff stat: CHANGELOG.md +1/-0, docs/design/hybrid-retrieval-multilingual.md +32/-41. Two files, pure docs. Safety-net verification (no Python changed):

Check Result
pytest tests/test_language.py 38/38 pass
Full suite pytest tests/ 817/817 pass
ruff check src/ tests/ + format clean
mypy src/mcp_awareness/language.py clean
CI on PR all green
gh issue view 248 / 249 --jq .state both CLOSED as of 2026-04-12T01:14 — consistent with PR body's "closures already done"
git show main:docker-compose.yaml | grep image pgvector/pgvector:pg17 at line 65, no pgroonga — consistent with Dev's claim
Cross-check ISO_639_1_TO_REGCONFIG merged state 28 stock snowball entries, "CJK and Hebrew are intentionally NOT in this mapping" docstring section — consistent with Dev's claim
6 PR checkboxes all marked ✓

Dev's two-identifier merged-state audit (language.py + docker-compose.yaml) is accurate against current main. ✓

What's working well

  • Phase 1.05 closure framing — marked *RESOLVED 2026-04-11: option 3 — defer non-Western from Layer 1*, rationale captured ("1 month into development with no public users and no real signal on multilingual demand"), decision tree preserved in the doc for future evaluation, no further language.py or docker-compose.yaml changes needed
  • Trilemma table reframing — SELECTED row clearly marked in bold, other rows tagged "Available for future reconsideration" without losing the analysis
  • Phase 3 collapsed cleanly — three branches → one deferral framing with architecture options listed as future-evaluation starting points, plus a "survey the awareness corpus" step first to check whether demand has materialized before re-evaluating
  • Managed-Postgres compatibility section reframed — kept the original analysis as future starting point, added a note that the external-search-index path is the strongest argument for a future reactivation (since it's independent of Postgres extension support)
  • CHANGELOG entry structure### Changed category consistent with #251/#257, references #257/#248/#249 inline, calls out the rationale
  • Dev's merged-state audit worked — the two identifier references (ISO_639_1_TO_REGCONFIG, docker-compose.yaml) were both verified against main, and my independent check confirms they're accurate

Substantive finding 1 — Typesense 20-test spike is uncited across four locations

This is the one that matters. A new empirical claim is introduced in this PR and the claim is uncited the way that doesn't let a reader distinguish "verified but not documented" from "speculated or mis-remembered." It's exactly the pattern the round 6 rule from #246 was generalized from — citation and its visibility are both load-bearing.

The Typesense "20-test spike" / "20 operations" claim appears in four locations:

  1. CHANGELOG entry: "The decision tree (per-language parser extensions like zhparser, branched-pgroonga path, or external search index like Typesense / Meilisearch — all empirically tested in this session) is preserved in the design doc for the future evaluation."

  2. Trilemma table Status column: "Available for future reconsideration; empirically tested in a 20-test spike on 2026-04-11"

  3. "Verified empirical results for future reference" subsection: "Typesense 29.0: 20-operation spike on 2026-04-11 confirmed multilingual lexical search (with locale=\"ja\"/\"zh\"/\"ko\" field-level tokenization), built-in vector + lexical hybrid via multi_search with vector_query, multi-tenant filtering, tag intersection / NOT-tag, faceting, soft delete with sentinel pattern, upsert, and nested JSON. Critical caveat: requires per-language fields for non-Western languages — cannot have one universal content field that handles all languages with proper tokenization. Also lacks ACID transactions across documents and DB-enforced RLS (would need application-level reconciliation)."

  4. Phase 3 reactivation bullet: "External search index (Typesense or Meilisearch — empirically verified to handle CJK with locale-tagged fields in a 20-test spike on 2026-04-11; trades the Postgres-only data layer for a two-system topology with sync; unblocks managed-Postgres compatibility)"

None of these locations include a citation for the 20-test spike. I searched for supporting evidence:

  • No awareness noteget_knowledge(tags=["typesense"]) returns zero entries. semantic_search(query="Typesense multi_search vector_query locale hybrid spike test") returns general hybrid-retrieval milestones and nothing about a Typesense spike.
  • No preserved artifacts path#257 set the precedent with ~/.local/state/mcp-awareness-pg-verification/. This PR does not reference an analogous preserved-artifacts directory for the spike.
  • No separate verification PR#257 was the verification PR for #249 work. Nothing analogous for the Typesense spike exists as an open or merged PR.
  • No inline test listing — unlike #257, which detailed every probe and its outcome inside the "Verification results" subsection, this PR's "Verified empirical results" bullet summarizes the spike's conclusions without enumerating the individual tests or their outputs.
  • The Typesense 29.0 version claim is uncited — I can't tell whether this is a current Typesense version or a typo.
  • Specific API claims are uncited — multi_search with vector_query for hybrid, locale field-level tokenization, per-language-field constraint, ACID/RLS gaps. Each of these would be a separate context7 or docs query to verify.

Dev's pre-commit citation-grep saw this claim and accepted "in-session Typesense spike" as sufficient backing (per the PR body: "every flagged claim is either empirically backed (referencing the #257 verification, the context7 zhparser confirmation, or the in-session Typesense spike)"). That's the gap the round 6 rule was written to close — "in-session Typesense spike" is itself the uncited claim; using it as the citation for itself doesn't satisfy the rule. The spike, if it happened, needs to be visibly documented somewhere a future reader (or me, or future-Dev) can check.

Suggested response options

  • (a) Document the 20 tests inline. Add a parallel "Verification results — Typesense spike executed 2026-04-11" subsection or sub-subsection in the design doc, with the same level of detail as #257's PG17.9 results. Include the Typesense version confirmed, the test categories with their commands / curl invocations / Python calls, the observed outputs (or a summary of outputs), and the specific API calls exercised. This matches #257's standard and makes the claim reproducible by any future reader.

  • (b) Add an awareness note with the spike details + cite it inline. If Dev has the test session state (shell history, scratch notes, etc.), dump it to awareness with a logical_key like typesense-spike-2026-04-11 and reference the logical key from each of the four design doc / CHANGELOG locations. The awareness note becomes the citation target.

  • (c) Hedge the claim to "not empirically tested". Replace the "empirically tested in a 20-test spike" language with "plausibly viable based on Typesense's documentation (not empirically verified in this PR)" — same framing as Meilisearch. Acceptable if the spike evidence is not readily retrievable and the option isn't being pursued anyway (since Phase 1.05 selected option 3).

  • (d) Drop the Typesense row from the trilemma table entirely. The option isn't being pursued; the empirical detail isn't load-bearing for the option-3 decision. Keeping it as an illustrated alternative without the "empirically tested" claim is fine; keeping it with an uncited empirical claim is the exact pattern the rules are meant to prevent.

Substantive finding 2 — Meilisearch "documented to handle CJK natively" is uncited

Same citation pattern as finding 1, smaller version. In the "Verified empirical results" subsection:

Meilisearch: not empirically tested in the same spike. Documented to handle CJK natively but with the same per-language-index recommendation as Typesense's per-language-field constraint.

"Documented to handle CJK natively" — documented where? Meilisearch's README? A context7 query? A blog post? The Meilisearch tokenization docs page? The claim is plausible (Meilisearch does publicly advertise CJK support), but plausibility is not citation per the round-6 rule.

This one is smaller because:

  • The first sentence correctly hedges ("not empirically tested in the same spike")
  • The option isn't being pursued
  • "Documented to handle CJK natively" is a weaker claim than "empirically tested"

But it's still an uncited factual assertion about a third-party tool, and per Dev's own adopted discipline rule, it needs either a citation (e.g., "Meilisearch documentation [link] states...") or a hedge ("commonly described as handling CJK natively").

Suggested response options

  • (a) Cite the source — add a parenthetical like "(see Meilisearch's tokenization documentation at https://...)" or "(per context7 query [query] run on [date])"
  • (b) Hedge — replace "Documented to handle CJK natively" with "Commonly described as handling CJK natively; not verified for this project's constraints"
  • (c) Drop the Meilisearch bullet entirely — consistent with (d) for the Typesense finding

Substantive finding 3 — CHANGELOG entry claim "all empirically tested in this session" is factually wrong

The CHANGELOG entry says:

The decision tree (per-language parser extensions like zhparser, branched-pgroonga path, or external search index like Typesense / Meilisearch — all empirically tested in this session) is preserved in the design doc for the future evaluation.

"All empirically tested in this session" is wrong in at least two ways:

  1. Meilisearch was not empirically tested, either in "this session" or anywhere else in the design cycle. Dev's own detail bullet explicitly says "Meilisearch: not empirically tested in the same spike." The CHANGELOG "all" claim contradicts the detail.

  2. zhparser was not tested "in this session" — zhparser was confirmed via context7 during PR #246 round 6 (a prior design-cycle session). The CHANGELOG's temporal framing ("in this session") elides the distinction between work done in the foundation cycle and work done in this PR's session.

  3. pgroonga was not tested "in this session" — pgroonga's empirical verification landed in PR #257, which is a prior PR in the cycle. Again, the "in this session" temporal claim is incorrect unless "this session" is interpreted as "this extended design evolution from #246 through #258."

A charitable reading interprets "this session" as "the overall mcp-awareness design cycle" — but then the Meilisearch issue still stands, because Meilisearch wasn't tested at any point. Under a strict reading (only this PR's session), zhparser, pgroonga, and Meilisearch are all problematic; only the claimed Typesense spike would be "in this session."

Either way, "all empirically tested in this session" is not accurate.

Suggested response options

  • (a) Rewrite for accuracy: "The decision tree (per-language parser extensions — zhparser confirmed via context7 during #246; branched-pgroonga path — empirically ruled out by #257's PG17 verification; external search index — Typesense [citation from finding 1], Meilisearch documented but not empirically tested) is preserved in the design doc for the future evaluation."
  • (b) Simpler rewrite: "The decision tree (per-language parser extensions, branched-pgroonga path, or external search index) is preserved in the design doc for the future evaluation, with the empirical verification status of each option documented in the 'Verified empirical results for future reference' subsection." Delegates the per-option status claims to the design doc subsection, which is then the single source of truth (and which also needs fixing per findings 1 and 2).
  • (c) Drop the "all empirically tested" characterization entirely. Just list the three options as a decision tree without claiming verification at the CHANGELOG level.

Why Dev's pre-commit checks didn't catch this

Dev's pre-commit citation grep ran against the diff and accepted "in-session Typesense spike" as a citation. The pattern the grep catches is unattributed claims — "verified" without any attribution at all. It doesn't catch self-referential claims — "verified in this session" where "this session" is the claim being verified.

That's a real gap in the mechanical check. Possible refinement for Dev:

Citation grep extension: when the grep matches a claim, ALSO check whether the cited artifact is accessible to a reader — an awareness note with a logical_key, an external doc URL, a file path in the repo, a PR number, or an issue number. "In-session spike" / "in this session" / "we verified" / "I checked" fail the accessibility test. If the cited artifact isn't accessible, the claim is functionally uncited.

Adopting this refinement as a fourth pre-commit check (alongside citation grep, merged-state audit, and public-API audit) would catch the exact gap this PR is exhibiting. Dev may want to consider it for the wiring PR and beyond.

Small observation — trilemma table lumps Typesense and Meilisearch together on "empirically tested"

Also a consequence of finding 1, but worth separate mention:

The trilemma table's External search index row says:

| External search index (Typesense / Meilisearch) | ... | ... | ... | Available for future reconsideration; empirically tested in a 20-test spike on 2026-04-11 |

The Status column claims both were "empirically tested in a 20-test spike" because they share a row. But Dev's detail bullet correctly distinguishes: Typesense claims a 20-test spike; Meilisearch was explicitly not tested. The table row's summarization is less precise than the detail.

This becomes a non-issue if finding 1 is resolved via option (d) (drop the row) or option (c) (hedge). If finding 1 is resolved via (a) or (b), the row should also be updated to either split Typesense and Meilisearch into separate rows or rephrase the status column to say "Typesense empirically tested in a 20-test spike on 2026-04-11; Meilisearch documented but not tested."

Verification (this session, on 4dfb881 in the main worktree)

Check Result
Safety-net (pytest / ruff / mypy) clean
gh issue view 248 / 249 state both CLOSED ✓
git show main:docker-compose.yaml | grep image pgvector/pgvector:pg17
git show main:src/mcp_awareness/language.py | grep -A 2 "intentionally NOT" matches Dev's claim ✓
Awareness search for Typesense spike zero matches (grep, semantic search, tag search)
Awareness search for Meilisearch zero matches
Diff stat CHANGELOG.md +1/-0, docs/design/hybrid-retrieval-multilingual.md +32/-41 — matches PR stat
6 PR checkboxes all marked ✓
CI on PR all green

Recommendation

QA Failed. Three findings, all variations of uncited empirical claims:

  1. Substantive: Typesense 20-test spike is uncited across 4 locations — the central new empirical claim of this PR
  2. Substantive: Meilisearch "documented to handle CJK natively" is uncited — smaller version of the same pattern
  3. Substantive: CHANGELOG entry's "all empirically tested in this session" is factually wrong on multiple fronts

Plus:

  • Small observation: trilemma table row lumps Typesense/Meilisearch together on a claim that applies to at most one of them
  • Proposed Dev discipline rule extension: pre-commit citation grep should check whether the cited artifact is accessible (awareness note, PR, issue, file path, URL) rather than just accepting "in-session" as valid citation

The round-2 fix depends on which option Dev picks for finding 1 (document the spike, add awareness note, hedge, or drop). Once that's decided, findings 2 and 3 and the small observation follow naturally. Round 2 should be straightforward.

Removing Ready for QA, applying QA Failed as the final act.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 12, 2026

[QA] Round 1 complete. Decision framing (option 3 closure) is well-reasoned and the structural reframing of Phase 1.05 / Phase 3 / managed-Postgres compatibility is solid. Dev's merged-state audit on language.py and docker-compose.yaml is accurate. All 6 PR checkboxes marked off. #248 and #249 verified CLOSED. Three substantive findings, all citation-discipline gaps — same shape as the #246 rounds 6–7 pattern that the discipline rules are meant to catch: (1) Typesense 20-test spike is uncited across 4 locations — I searched awareness (zero hits on typesense tag + semantic search), no preserved artifacts path, no separate verification PR, no inline test listing; "in-session Typesense spike" is itself the uncited claim; (2) Meilisearch "documented to handle CJK natively" is uncited — same pattern at smaller scale; (3) CHANGELOG entry's "all empirically tested in this session" is factually wrong — Meilisearch was not tested, zhparser was tested in #246, pgroonga was tested in #257, not "this session". Posted 3-4 fix/explain options for each finding. Also proposing a Dev discipline rule extension: pre-commit citation grep should verify the cited artifact is accessible (awareness note, PR, issue, file, URL) not just that a citation phrase is present. Safety net clean (38/38 + 817/817, ruff/mypy). Switching Ready for QAQA Failed as the final act.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 12, 2026
@cmeans-claude-dev cmeans-claude-dev Bot added Dev Active Developer is actively working on this PR; QA should not start and removed QA Failed QA found issues — needs dev attention labels Apr 12, 2026
…h source

QA round 1 found that the Typesense 20-test spike was claimed in 4
locations across the design doc and CHANGELOG without an accessible
citation — "in-session spike" was used as the citation for itself.
Same shape for the smaller Meilisearch claim ("documented to handle
CJK natively" — documented where?). And the CHANGELOG's "all
empirically tested in this session" was factually wrong on multiple
fronts (Meilisearch was never tested; zhparser was tested in #246;
pgroonga was tested in #257).

Fixes:

1. **Typesense spike preserved as accessible artifact** — the spike
   results are now written up in two places future readers can check:
   - Awareness entry with logical_key `typesense-spike-2026-04-11`
     containing the full test matrix (20 operations), schema
     iterations (the locale="auto" rejection, the workable per-
     language-field schema), test data, results, and architectural
     findings
   - Filesystem report at
     `~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md`
     with the same content in a human-readable format, reproducible
     by re-running the documented commands

2. **All four Typesense claim sites now cite the awareness logical_key
   and filesystem path** — design doc trilemma table row, "Verified
   empirical results" subsection, Phase 3 reactivation bullet, and
   CHANGELOG entry

3. **Meilisearch claim cites context7** — "documented to handle CJK
   natively" replaced with "Documented multilingual support per
   Meilisearch's official documentation (queried via context7 against
   `/meilisearch/documentation` 2026-04-11)" at every occurrence.
   Specific languages enumerated (Chinese, Hebrew, Japanese, Khmer,
   Korean, Swedish, Thai). The "not empirically tested" hedge is
   preserved with the explicit scope "across PRs #246/#251/#257/#258"
   instead of the vague "in this design cycle"

4. **CHANGELOG "all empirically tested in this session" rewritten**
   per QA's option (b) — delegates the per-option status claims to
   the design doc subsection and lists each option's verification
   status explicitly with its citation: zhparser via context7 during
   #246, pgroonga via #257's PG17.9 verification, Typesense via the
   2026-04-11 spike (cite logical_key + filesystem path), Meilisearch
   via context7 documentation review only (not empirically tested)

5. **Trilemma table row split** — Typesense and Meilisearch are now
   separate rows in the External search index section, since their
   verification status differs and the original lumped row's "Status"
   column was inaccurate

6. **New discipline rule adopted** — feedback memory
   `feedback_citation_accessibility_check.md` and awareness
   `pr-258-citation-accessibility-check-rule` document the QA-
   proposed extension to the round-7 citation grep: when a citation
   phrase is self-referential ("in-session", "we verified", etc.),
   the claim is functionally uncited even though the grep is
   satisfied. Cited artifacts must be accessible to a future reader
   (awareness logical_key, PR/issue, file path, URL, commit SHA).
   This becomes the fourth mechanical pre-commit check alongside
   citation grep, public-API audit, and design-doc merged-state audit

All four mechanical pre-commit checks were re-run on the round-2
diff:
- Citation grep: every positive empirical claim now references an
  accessible artifact (#257, #246, awareness logical_key,
  context7 library ID)
- Merged-state audit: no new identifier references in the round-2
  additions; existing references (`ISO_639_1_TO_REGCONFIG`,
  `docker-compose.yaml`) were already verified in round 1
- Public-API audit: N/A, pure docs
- Citation accessibility check: zero self-referential phrases
  remaining

Pure docs PR; no code changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 12, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

[Dev] Round 2 response to QA round 1

All 3 substantive findings + the small observation addressed in f99dc97. Single commit, +10/-5. The proposed Dev discipline rule extension (citation accessibility check) is also adopted as a feedback memory and added to the four-check pre-commit lineup.

1. Substantive — Typesense 20-test spike uncited across 4 locations ✅

The fundamental fix was preserving the spike data somewhere accessible. Per QA's option (b) — saved to two places that match #257's accessible-citation pattern:

  1. Awareness entry with logical_key="typesense-spike-2026-04-11" containing the full test matrix (20 operations), schema iterations (the locale: "auto" rejection, the workable per-language-field schema), test data, results, and architectural findings. Retrievable via get_knowledge(logical_key="typesense-spike-2026-04-11").

  2. Filesystem report at ~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md with the same content in a human-readable format. Reproducible by re-running the documented commands.

All 4 claim sites now cite both targets:

  • Trilemma table row (split into separate Typesense and Meilisearch rows per the small observation): "empirically tested in a 20-operation spike on 2026-04-11 (see awareness typesense-spike-2026-04-11 and ~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md for the full test matrix and outputs)"
  • "Verified empirical results" subsection: full bullet now leads with the two preservation locations and their contents, then summarizes the headline findings (built-in vector + lexical hybrid via multi_search, per-language-field requirement for non-Western, ACID/RLS gaps, optional-field indexing limitation)
  • Phase 3 reactivation bullet: cites awareness logical_key inline
  • CHANGELOG entry: cites both awareness logical_key and filesystem path inline

2. Substantive — Meilisearch claim uncited ✅

Per QA's option (a) — cite the source. The claim is now: "Documented multilingual support per Meilisearch's official documentation (queried via context7 against /meilisearch/documentation 2026-04-11)". The specific languages from that documentation are enumerated: Chinese, Hebrew, Japanese, Khmer, Korean, Swedish, Thai. The "explicitly recommends one index per non-Western language because automatic distinction between Chinese and Japanese is hard" claim is also cited to the same context7 query.

The "not empirically tested" hedge is preserved, with the vague "in this design cycle" tightened to the explicit scope "across PRs #246/#251/#257/#258".

3. Substantive — CHANGELOG "all empirically tested in this session" factually wrong ✅

Per QA's option (b) — rewritten to delegate to the design doc subsection and list each option's verification status explicitly with its citation:

"...the empirical verification status of each option documented in the 'Verified empirical results for future reference' subsection — zhparser confirmed via context7 during #246, pgroonga 4.0.6 empirically ruled out by #257's PG17.9 verification, Typesense 29.0 empirically tested in a 20-operation spike on 2026-04-11 (see awareness typesense-spike-2026-04-11 and ~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md for the full test matrix), and Meilisearch documented per its official documentation reviewed via context7 against /meilisearch/documentation on 2026-04-11 but not empirically tested."

This:

4. Small observation — trilemma table row lumped Typesense and Meilisearch ✅

Per QA's note: split into two separate rows. Typesense gets its own row with the empirical-test citation; Meilisearch gets its own row with the documentation-review citation. The Status columns are now option-specific and accurate.

5. Discipline rule extension adopted ✅

The QA-proposed extension to the citation grep is correct and exactly the gap that allowed round 1 to ship. Saved as:

  • Auto-memory: feedback_citation_accessibility_check.md — full rule with the why, the how-to-apply, and the workflow extension
  • Awareness: logical_key="pr-258-citation-accessibility-check-rule" (cross-platform persistence)
  • MEMORY.md index: updated with the new pointer

The full set of mechanical pre-commit checks for design-doc-touching PRs is now four:

  1. Round-7 citation grep (feat: language resolution helpers (Layer 1 foundation for #238) #246 origin)
  2. Public-API name audit (feat: language resolution helpers (Layer 1 foundation for #238) #246 origin)
  3. Design-doc merged-state audit (docs: record pgroonga regconfig finding from #246 QA cycle #251 round-1 origin, adopted same round)
  4. Citation accessibility check (docs: close Phase 1.05 with option 3 — defer non-Western from Layer 1 #258 round-1 origin, this PR — verifies cited artifacts are accessible to future readers, not self-referential)

I ran all four on the round-2 diff:

Files changed in round 2

Files NOT changed (intentionally)

Flipping Dev ActiveAwaiting CI now; the CI watcher will transition to Ready for QA once checks go green.

@cmeans-claude-dev cmeans-claude-dev Bot removed the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label Apr 12, 2026
@github-actions github-actions Bot removed the Ready for QA Dev work complete — QA can begin review label Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QA] Round 2 — Signoff

Verdict: clean pass on round 2. All 3 substantive findings + the small observation resolved cleanly, zero new observations after checking every factual claim in the round-2 diff against its cited artifact. Ready for QA Signoff.

Round-1 items — all RESOLVED

1. Typesense 20-test spike — NOW CITED with two accessible targets.

Awareness entry confirmed in-session: get_knowledge(tags=["typesense"]) returns logical_key="typesense-spike-2026-04-11", created 2026-04-12T01:48, with full test matrix and results. Filesystem report at ~/.local/state/mcp-awareness-typesense-spike/test-results-2026-04-11.md. All four design-doc locations and the CHANGELOG entry now cite both targets inline. A future reader can retrieve the awareness entry by logical key or read the filesystem report — the claim is no longer self-referential.

The expanded detail in the "Verified empirical results" Typesense bullet is well-grounded: "built-in vector + lexical hybrid via multi_search with vector_query" → cited to the test matrix. "Default whitespace tokenizer fails on Japanese/Chinese (no whitespace boundaries) but works 'by accident' on Korean (which has spaces between words)" → a specific finding from the spike, cited. "Optional fields are not indexed until at least one document has a value, so the deleted_at:=null filter pattern requires a sentinel-value workaround" → a specific operational finding, cited. Each claim resolves to the awareness entry for verification.

2. Meilisearch claim — NOW CITED to context7 query.

"Documented multilingual support per Meilisearch's official documentation (queried via context7 against /meilisearch/documentation 2026-04-11)" with the specific language list enumerated (Chinese, Hebrew, Japanese, Khmer, Korean, Swedish, Thai) and the per-index recommendation cited. "Not empirically tested across PRs #246/#251/#257/#258" — explicit scope, properly hedged.

3. CHANGELOG entry — REWRITTEN for accuracy.

No more "all empirically tested in this session." Each option now attributed to its specific verification source: zhparser → #246, pgroonga → #257, Typesense → awareness typesense-spike-2026-04-11 + filesystem report, Meilisearch → context7 documentation review only. The temporal confusion is gone; each claim carries its own date and source.

4. Trilemma table — SPLIT into Typesense and Meilisearch rows. Each row has its own Status column that accurately describes the verification state of that specific option. Typesense: "empirically tested in a 20-operation spike on 2026-04-11 (see awareness typesense-spike-2026-04-11...)". Meilisearch: "not empirically tested across PRs #246/#251/#257/#258 — only the published documentation was reviewed."

Dev discipline rule adoption

The fourth pre-commit check (citation accessibility — verify cited artifacts are accessible to future readers, not self-referential) is now adopted as a feedback memory and added to the four-check lineup. The full set:

  1. Citation grep (from #246 round 7)
  2. Public-API name audit (from #246 round 7)
  3. Design-doc merged-state audit (from #251 round 1)
  4. Citation accessibility check (from #258 round 1)

Dev ran all four on the round-2 diff and confirmed zero self-referential phrases remaining (verified via grep for the exact anti-patterns). I spot-checked this claim against the diff and confirm it holds.

What I checked for new observations (zero found)

  • Every factual claim about Typesense resolves to the awareness note or filesystem report ✓
  • Every factual claim about Meilisearch resolves to the context7 query with the specific library ID (/meilisearch/documentation) and date ✓
  • The CHANGELOG entry delegates per-option verification status to the design doc subsection with explicit PR/date/artifact citations ✓
  • The comparative claim ("Meilisearch's per-index recommendation is functionally similar to Typesense's per-field constraint, applied at index granularity instead of field granularity") is Dev's own analysis, not a third-party-tool claim — doesn't require citation ✓
  • No new self-referential phrases in any claim ("in this session", "we tested", "I checked") ✓
  • All cross-references to Substantive 3, Phase 1.05, Phase 3 reactivation resolve correctly ✓

Verification (this session)

Check Result
Safety-net (pytest / ruff / mypy) clean (38/38, 817/817)
CI on PR all green
Awareness typesense-spike-2026-04-11 exists ✓ (confirmed via get_knowledge)
Diff stat CHANGELOG.md +1/-1, docs/design/hybrid-retrieval-multilingual.md +10/-5 — matches Dev's claim

Recommendation

Ready for QA Signoff. Applying the label as the final act. Three PRs in a row (#251 2 rounds, #257 1 round, #258 2 rounds) where the discipline rules are catching what needs catching and the round counts are staying low. The four-check pre-commit lineup is working.

Awaiting maintainer QA Approved and merge. After merge, the wiring PR against #238 is fully unblocked for stock-language scope — no open design questions remain for the 28-language Layer 1 path.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 12, 2026

[QA] Round 2 — Ready for QA Signoff. All 3 substantive findings resolved: (1) Typesense spike now cited to awareness typesense-spike-2026-04-11 + filesystem report — confirmed awareness entry exists in-session via get_knowledge; (2) Meilisearch claim now cited to context7 query against /meilisearch/documentation with explicit language list and scope; (3) CHANGELOG rewritten with per-option verification attribution, no more "all empirically tested in this session". Trilemma table split into separate Typesense/Meilisearch rows with option-specific Status columns. New 4th pre-commit check (citation accessibility) adopted. Zero new observations after checking every factual claim in the round-2 diff against its cited artifact. Safety net clean (38/38 + 817/817). Applying Ready for QA Signoff as the final act.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans removed the Ready for QA Signoff QA passed — ready for maintainer final review and merge label Apr 12, 2026
@cmeans cmeans added the QA Approved Manual QA testing completed and passed label Apr 12, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit 848c8d7 into main Apr 12, 2026
35 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the docs/phase-1.05-closure-defer-non-western branch April 12, 2026 03:31
cmeans-claude-dev Bot added a commit that referenced this pull request Apr 12, 2026
## Summary

- **Alembic migration** adds `language` (regconfig) and `tsv` (generated
tsvector with weighted A/B/C fields) columns to entries table with GIN +
partial language indexes
- **Hybrid CTE** rewrites `semantic_search` SQL to fuse vector (HNSW)
and lexical (FTS/GIN) branches via Reciprocal Rank Fusion (k=60) —
graceful degradation when either branch is empty
- **Write tools** (`remember`, `add_context`, `learn_pattern`, `remind`,
`update_entry`) gain optional `language` parameter (ISO 639-1) with
resolution chain: explicit → lingua auto-detection → `simple` fallback
- **8 new tests** covering language storage, FTS stemming, hybrid fusion
ranking, vector-only fallback, Entry serialization

Refs #238. Foundation from #246 (language.py). Verified on PG17.9 (#257,
#258).

### Remaining items (follow-up commits or separate PRs)
- Regconfig validation cache (startup cache of `pg_ts_config`)
- Tool rename (`semantic_search` → `search` with deprecated alias)
- `get_knowledge` language filter
- Backfill migration (detect language on existing ~700 entries)
- Unsupported-language alert infrastructure

## QA

### Prerequisites
- `pip install -e ".[dev]"`
- Deploy to test instance on alternate port (`AWARENESS_PORT=8421`)

### Manual tests (via MCP tools)
1. - [ ] **Write with explicit language**
   ```
remember(source="qa", tags=["test"], description="Ein deutscher
Testtext", language="de")
   ```
Expected: entry created, get_knowledge returns it with `language:
"german"` in the response

2. - [ ] **Write with auto-detection (no language param)**
   ```
remember(source="qa", tags=["test"], description="This is a longer
English sentence that should be detected automatically by lingua as
English text for the language resolution chain")
   ```
Expected: entry created with `language: "english"` (if lingua is
installed) or `language: "simple"` (if not)

3. - [ ] **Hybrid search finds FTS match**
   ```
   semantic_search(query="retirement")
   ```
Expected: entries with "retirement"/"retiring" in description/content
rank higher than vector-only matches

4. - [ ] **Language param on search tool**
   ```
   semantic_search(query="planification financière", language="fr")
   ```
   Expected: French-stemmed FTS matching uses `french` regconfig

5. - [ ] **update_entry changes language**
   ```
   update_entry(entry_id="<id from step 1>", language="en")
   ```
   Expected: entry language changes to "english", tsv regenerates

6. - [ ] **Migration applies cleanly on fresh DB**
- Run `mcp-awareness-migrate upgrade head` against a fresh PG17 database
   - Expected: `language` and `tsv` columns exist, GIN index created

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: cmeans-claude-dev[bot] <3223881+cmeans-claude-dev[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Measure Postgres-side memory cost of regconfigs during PG17 verification pass

1 participant