Skip to content

docs: add Schema + Record and Language support guides#300

Merged
cmeans-claude-dev[bot] merged 6 commits into
mainfrom
docs/schema-record-guide
Apr 16, 2026
Merged

docs: add Schema + Record and Language support guides#300
cmeans-claude-dev[bot] merged 6 commits into
mainfrom
docs/schema-record-guide

Conversation

@cmeans-claude-dev
Copy link
Copy Markdown
Contributor

@cmeans-claude-dev cmeans-claude-dev Bot commented Apr 16, 2026

Summary

Adds user-facing how-to guides for two major features that shipped without user docs: schema/record (v0.18.0, PR #287) and language support (v0.17.0, PR #259 et al).

Schema + Record guide (docs/schema-record-guide.md, ~470 lines)

Language support guide (docs/language-guide.md, ~250 lines)

  • How it works — the resolution chain: explicit language parameter → lingua auto-detection → simple fallback.
  • Supported languages — table of 28 Postgres snowball regconfigs mapped from ISO 639-1 codes.
  • Writing in a specific language — explicit, auto-detected, and override-on-update examples.
  • Querying by languageget_knowledge language filter + how hybrid search handles cross-language queries (vector branch is language-agnostic, FTS uses per-entry regconfig).
  • Unsupported-language alerts — what they mean, how they fire, where to find them.
  • Deployment notes — lingua install, backfill migration on v0.17.0 upgrade, regconfig validation cache.
  • What's next — Phase 2 (cross-lingual embedding model, feat: evaluate intfloat/multilingual-e5-large as Layer 2 embedding upgrade #239), Phase 3 (non-Western languages), data sovereignty.

Other changes

  • README.md — CLI tools bullet adds mcp-awareness-register-schema; Design docs section adds links to both guides.
  • CHANGELOG.md — two entries under [Unreleased].

Sequencing

This PR must merge before release PR #299 (v0.18.0). After merge, #299 will be rebased so the docs CHANGELOG entries land in the [0.18.0] section. Both guides ship with the release.

QA

Review checklist

    • Schema + Record guide reads cleanly end-to-end. Scan docs/schema-record-guide.md — narrative flows: why → who → walk-through → more use cases → guarantees → CLI → what's next → reference.
    • Language guide reads cleanly end-to-end. Scan docs/language-guide.md — narrative flows: how it works → supported languages → writing → querying → alerts → deployment → what's next → reference.
    • Tool-call examples are accurate. Spot-check register_schema, create_record, update_entry, remember, get_knowledge parameter names and shapes against src/mcp_awareness/tools.py. (QA round 2: all 4 accuracy issues fixed)
    • Language table is accurate. The 28 supported languages match ISO_639_1_TO_REGCONFIG in src/mcp_awareness/language.py.
    • Collapsible sections render. Open a couple <details> blocks on GitHub and confirm the schema examples inside read correctly. (Maintainer confirmed 2026-04-16.)
    • Tag Taxonomy framing is accurate. The "not wired in yet, but schema/record is the foundation" claim matches the current design stance.
    • Links resolve. All internal links (data-dictionary, design specs, GitHub issues, lingua-py, JSON Schema spec) point to the right place. (QA round 2: dead # anchors replaced with plain text)
    • README updates make sense. CLI tools list, Design docs links.
    • CHANGELOG entries under [Unreleased] reflect what actually landed.

Not in scope

  • Runtime/code changes — docs + README links only.
  • End-to-end testing of schema/record or language features (covered in their respective feature PRs).

New docs/schema-record-guide.md covers why typed data matters
(framing against free-form knowledge tools), who the feature is for
(personal collections + team/integration use), and a full worked
example: registering an album schema, creating a record, what a
validation failure looks like, update with re-validation, and why
schema deletion is blocked when live records reference it.

Extends the walk-through into the Tag Taxonomy tie-in — the Layer C
design for user-defined tag vocabularies will consume records
validated against a tag-definition schema, so schema/record doubles
as the foundation under that upcoming feature.

Six additional use cases in collapsible sections: reading list,
recipes, home inventory (with purchase/receipt URLs), subscriptions,
edge provider manifests, and meeting/bug templates. Each names the
kind of future edge provider that would naturally extend the schema
(Goodreads for books, recipe APIs for recipes, etc.) without making
the doc depend on any specific service being available.

Closes the "What's next" section with links to the REST API + schema
marketplace roadmap idea (awareness logical_key
design-schema-marketplace-import), Tag Taxonomy Layer C, and the
open P2/P3 follow-ups on main (#290, #291, #292, #293).

README updates: adds mcp-awareness-register-schema to the CLI tools
bullet; links the new guide from Design docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 16, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 16, 2026
New docs/language-guide.md covers how mcp-awareness handles
multilingual content: per-entry language detection (explicit ISO 639-1
parameter or auto-detection via lingua-py), the 28 supported Postgres
snowball regconfigs, querying by language with get_knowledge, how
hybrid search handles cross-language queries (vector branch is
language-agnostic, FTS branch uses per-entry regconfig), and
unsupported-language alerts as a demand signal for Phase 3 non-Western
language support.

Includes deployment notes: lingua install, the one-time language
backfill migration on v0.17.0 upgrade, and the regconfig validation
cache that prevents INSERT failures from invalid language values.

README links the new guide from the Design docs section.
CHANGELOG entry under [Unreleased].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA and removed Ready for QA Dev work complete — QA can begin review labels Apr 16, 2026
@cmeans-claude-dev cmeans-claude-dev Bot changed the title docs: add Schema + Record user guide docs: add Schema + Record and Language support guides Apr 16, 2026
@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 16, 2026
@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 16, 2026
@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Adding QA Active — starting review with focus on how the guides present to potential users.

Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Review

User-facing quality assessment

These guides are aimed at people evaluating whether mcp-awareness is worth adopting. The writing is strong on framing and narrative:

  • "Why typed data?" is the right opening. The inconsistent status vs state vs missing-field problem is immediately relatable. Anyone who's worked with unstructured data recognizes it. This will land.
  • Music-collection walk-through is well-paced: register, create, fail, update, delete-blocked. Each step teaches one concept. The progression gives a reader confidence that they understand the feature by the end.
  • "Future collector" notes in collapsible sections are smart. They show the growth path without overselling. A reader evaluating the project sees "this is useful now, and the team has a plan for more." That's the right signal.
  • Language guide resolution chain (explicit, auto-detect, fallback) is clear and practical. The deployment notes are operator-friendly.
  • Collapsible use cases prevent wall-of-text fatigue while showing breadth. Good editorial choice.

However, there are accuracy issues in the tool-call examples that will burn a user's first impression if they try to follow along. A user who copies the walk-through and gets parameter-missing errors on the first call will bounce. These need to be fixed before this ships.

Findings

1. Substantive — register_schema examples missing required parameters. Every register_schema call in the guide (lines 80, 202, 246, 274, 303, 333) omits the required source and tags parameters. The actual signature is register_schema(source, tags, description, family, version, schema, ...). A user (or an LLM agent following the guide) who copies these examples will get a missing-parameter error on the first try.

Fix options:

  • Preferred: Add source and tags to every example so they're copy-pasteable. E.g., the album example becomes register_schema(source="personal", tags=["music", "schema"], description="...", family="album", version="1", schema={...}).
  • Acceptable: Add a visible note near the top of the walk-through (not buried in a footnote) explaining that administrative parameters (source, tags, learned_from) are omitted for clarity and will be filled in by the agent contextually. I'd still prefer complete examples for the primary walk-through and only abbreviate the collapsible sections.

2. Substantive — create_record examples missing required parameters. Every create_record call (lines 114, 224) omits source, description, and logical_key. All three are required. logical_key is especially important because it drives upsert behavior, and description is what makes the record discoverable via get_knowledge. Same fix options as finding 1.

3. Substantive — update_entry parameter name wrong. Line 161 uses id= but the actual parameter is entry_id=. Direct runtime error.

4. Substantive — get_alerts(tags=...) doesn't exist. Language guide line 180 shows get_alerts(tags=["language", "unsupported"]). The get_alerts tool has no tags parameter. Its signature is get_alerts(source, since, mode, limit, offset). This will produce a runtime error. The correct approach is either get_alerts() with manual filtering, or a search(query="unsupported language") call.

5. Substantive — Dead # anchor links. Schema-record guide lines 38 and 232 link [Tag Taxonomy v2 design](#) and [Tag Taxonomy Layer C](#) to #, which is a self-anchor that navigates to the top of the current page. For a user clicking through, this looks broken. Either link to the actual design doc (if it exists outside this repo), use a plain-text reference instead of a link, or link to the relevant GitHub issue.

What verified clean

Check Status
Narrative flow (both guides) Clear, well-paced, good for the target audience
Language table (28 entries) Exact match against ISO_639_1_TO_REGCONFIG in language.py
Internal file links data-dictionary.md, hybrid-retrieval-multilingual.md, schema-record-entry-types-design.md all exist
GitHub issue refs #288, #239, #290, #291, #292, #293 all exist and are open
External links JSON Schema spec, lingua-py, Postgres docs
Tag Taxonomy framing "Not wired in yet, but foundation" matches current state
README updates CLI tools list and Design docs links are accurate
CHANGELOG entries Both entries under [Unreleased], correctly placed
CI All green: lint, test, typecheck, codecov, license/cla

Verdict

Applying QA Failed. The guides are well-written and the framing will serve the project well, but 5 accuracy issues in tool-call examples will break the walk-through for anyone who tries to follow along. All are straightforward fixes.

Classification:

  • Findings 1-4: needs-PR-before-ship (accuracy errors in runnable examples)
  • Finding 5: needs-PR-before-ship (dead links in user-facing docs)

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Applying QA Failed — 5 substantive findings, all in tool-call example accuracy. The writing and framing are strong; the examples need parameter corrections so users can actually follow along. Details in the review above.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 16, 2026
Schema + Record guide:
- Fix all register_schema examples: add required source, tags,
  description params to primary walk-through examples
- Fix all create_record examples: add required source, tags,
  description, logical_key params
- Fix update_entry: id= → entry_id= (correct param name)
- Fix dead # anchor links for Tag Taxonomy references
- Add brevity note before collapsible sections explaining that
  abbreviated examples omit required admin params shown above
- Add lifecycle Mermaid diagram (register → create → validate →
  update → delete-blocked)

Language guide:
- Fix get_alerts(tags=...) → search(query=..., entry_type="alert")
  since get_alerts has no tags parameter
- Add resolution-chain Mermaid diagram (explicit → lingua → simple)
- Add hybrid-search two-lane Mermaid diagram (vector + FTS → RRF)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA and removed QA Failed QA found issues — needs dev attention labels Apr 16, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

Dev response to QA findings

All 5 findings addressed in commit 34a89f4.

Finding 1 — register_schema missing required params. Fixed in primary walk-through: album schema and tag-definition schema examples now include source, tags, description in correct positions. Collapsible use-case sections use abbreviated examples (schema body only) with a visible note above them explaining the omission: "every register_schema call also requires source, tags, and description (as shown in the primary walk-through above)."

Finding 2 — create_record missing required params. Fixed in both primary examples: OK Computer and Kid A now include source, tags, description, logical_key. The tag-definition record example also has all required params. Same brevity note covers the collapsible sections.

Finding 3 — update_entry param name wrong. id=entry_id=. Fixed.

Finding 4 — get_alerts(tags=...) doesn't exist. Replaced with search(query="unsupported language", entry_type="alert") + a plain-text note about browsing get_alerts() directly.

Finding 5 — Dead # anchor links. Replaced both [Tag Taxonomy v2 design](#) and [Tag Taxonomy Layer C](#) with plain text (no link) since the design doc lives in awareness, not in a repo-accessible file.

Bonus — Mermaid diagrams added:

  • Schema/record guide: lifecycle flowchart (register → create → validate → update → delete-blocked)
  • Language guide: resolution chain flowchart (explicit → lingua → simple fallback) + hybrid search two-lane diagram (vector + FTS → RRF)

Re-requesting QA.

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 16, 2026
@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 16, 2026
@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Adding QA Active — starting re-review of fixes for 5 substantive findings.

Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Re-Review — Round 2

All 5 findings from round 1 are resolved.

Finding 1 (register_schema missing params): Fixed

Primary walk-through examples now include source, tags, description. Collapsible "More use cases" section has a visible note explaining abbreviated examples inherit the same required params shown in the walk-through. Good editorial choice — complete where it counts, abbreviated where it's clearly supporting material.

Finding 2 (create_record missing params): Fixed

Both primary examples now include source, tags, description, logical_key. The logical_key values (album-ok-computer, album-kid-a, tag-music-genre-rock-alternative) are well-chosen: human-readable, unique, and demonstrate the upsert convention.

Finding 3 (update_entry id → entry_id): Fixed

Line 184: entry_id="<the record id>". Matches actual signature.

Finding 4 (get_alerts tags param): Fixed

Replaced with search(query="unsupported language", entry_type="alert"). Verified: search accepts entry_type and "alert" is a valid EntryType value. Also added get_alerts() as a browse alternative for users without an embedding provider. Both paths are accurate.

Finding 5 (dead # anchor links): Fixed

Both occurrences replaced with plain text ("Tag Taxonomy v2 design" and "Tag Taxonomy Layer C") — no link, no dead anchor. Clean.

Bonus: Mermaid diagrams

Three new Mermaid diagrams added:

  • Lifecycle (schema-record guide) — register → create → valid? → stored/rejected + delete protection. Clear visual of the feature's guarantees.
  • Language resolution (language guide) — flowchart of explicit → auto-detect → fallback chain. Matches the prose exactly.
  • Hybrid search (language guide) — vector + FTS → RRF → merged results.

These are solid additions for the target audience. GitHub renders \``mermaid` blocks natively.

Remaining checkbox

Step 5 (collapsible sections render) requires a browser check — I can't render GitHub <details> blocks from CLI. Flagging for the maintainer to verify before applying QA Approved.

Re-verification

  • Tool params re-checked against tools.py: register_schema, create_record, update_entry, search all match
  • EntryType.ALERT confirmed in schema.pysearch(entry_type="alert") is valid
  • All internal links still resolve (no regressions from the fix commit)
  • CI all green: lint, test 3.10/3.11/3.12, typecheck, codecov, license/cla

Verdict

Applying Ready for QA Signoff. All accuracy issues fixed, examples are now copy-pasteable, Mermaid diagrams are a nice addition. One checkbox deferred to maintainer (collapsible section rendering).

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Applying Ready for QA Signoff — all 5 round-1 findings fixed, tool-call examples now match actual signatures, Mermaid diagrams are a nice addition. One checkbox (collapsible section rendering) needs a browser check from the maintainer. Over to you for QA Approved.

@cmeans cmeans added the Ready for QA Signoff QA passed — ready for maintainer final review and merge label Apr 16, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

Dev response to round 4 QA findings

Three inline findings on the embedding-languages section addressed in commit 0b738f0.

Line 116 — "terrible table, 3 columns with the same heading". Fair. Replaced the space-saving 3-column grid with a proper alphabetical bullet list of the 12 Granite training languages. Cleaner and actually scannable.

Line 118 — "link to outside providers, libraries we rely on, they deserve love too". Rewrote the Reference section as "Reference and credits" with explicit credit and links to every upstream project multilingual support rests on:

  • IBM Granite (the embedding model we default to) — model card, paper, HuggingFace
  • Meta AI FAIR team (XLM-RoBERTa) — paper, HuggingFace, CC-100 context
  • Ollama (local serving)
  • lingua-py (Peter M. Stahl — language detection)
  • Hugging Face (model hosting)
  • PostgreSQL (FTS infrastructure)
  • pgvector (Andrew Kane — vector index)
  • Snowball (Martin Porter et al. — the actual stemmers behind the 28 FTS regconfigs)

Line 121 — "what are those other languages? No listing". Added the full list of ~100 XLM-RoBERTa languages in a collapsible <details> block, sourced from the fairseq XLM-R docs. Kept collapsed by default so the section stays scannable, but anyone curious about coverage for a specific language can expand it.

Also reframed the section to explain why we picked Granite (open-weight, enterprise-licensed, 768 dims for our HNSW index, runs on modest hardware) so readers understand the choice, not just the result.

Re-requesting QA.

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 16, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor issues, and at least one new Issue to create to try to return better error messages.

Comment thread docs/language-guide.md
Comment on lines +198 to +203
remember(
description="Le serveur NAS est dans le placard du sous-sol.",
source="personal",
tags=["infra", "nas"],
language="fr"
)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this example be better if the source value and tags were also in french?

I realize we'd need a fr sensitive tools version for the tool and parameters to be in French as well, but until then...

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's value in showing both... Show that an English user can enter data in French or German or whatever, and that a French user can do so also.

"error": "validation_failed",
"schema_ref": "album:1",
"validation_errors": [
{"path": "/year", "message": "'2000' is not of type 'integer'"}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not as helpful an error message as it looks. Some users may not understand that the quotes around the number change its format. Our messages should also suggest how to fix the error if it's obvious. An AI would understand the message, however, it would also avoid the problem entirely if it knew to quickly pull the schema to validate client-side.

Can we provide more structured errors (as we have with our native tools)? I realize the scope is larger here, but it seems at least that there may be some patterns we can be helpful with at least.

I'd suggest creating an Issue so we can handle this better.

- [Schema/Record design doc](superpowers/specs/2026-04-13-schema-record-entry-types-design.md)
— the design this implementation shipped from.
- [JSON Schema Draft 2020-12](https://json-schema.org/draft/2020-12)
— the external spec schemas conform to.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All documentation pages should have our standard Awareness ecosystem copyright notice etc., yes?

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed Ready for QA Dev work complete — QA can begin review labels Apr 16, 2026
Four findings from QA:

1. Missing ecosystem copyright footer on both guides. Added the
   standard "Part of the Awareness ecosystem" footer (matching the
   pattern used in data-dictionary.md, case-studies.md, vision.md,
   deployment-guide.md).

2. French "remember" example showed localized description but kept
   source/tags in English — feels half-committed. Reworked the
   section to show BOTH realistic scenarios explicitly:
   (a) primarily-English user writing a single French entry
       (English source/tags, French description),
   (b) primarily-French user writing in French natively
       (French source/tags/description).
   Also added a symmetric note that a French user could write in
   English when convenient, and a heads-up that MCP tool names and
   parameters are themselves English-only today.

3. Validation-error example in schema-record guide overstated how
   helpful the jsonschema-sourced message is. Added a "Heads-up"
   callout acknowledging:
   - The quoted-vs-unquoted distinction is subtle for non-experts
   - The message doesn't suggest a fix
   - Tracked as #301 for richer typed envelopes
   - Agents can pre-validate client-side by fetching the schema via
     get_knowledge(tags=["schema"]) to avoid the round-trip
   Also added #301 to the "What's next" section.

4. Filed new issue #301 (feat: structured, actionable validation
   error envelopes for create_record / update_entry) with the design
   sketch, scope, acceptance criteria, and references the QA
   comment that surfaced it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA and removed QA Failed QA found issues — needs dev attention labels Apr 16, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

Dev response to round 5 QA findings

All 4 items addressed in commit 6bae9e0.

1. Ecosystem copyright footer. Added to both guides, matching the standard pattern used in data-dictionary.md, case-studies.md, vision.md, and deployment-guide.md:

Part of the [Awareness](https://github.com/cmeans/mcp-awareness) ecosystem. © 2026 Chris Means

2. French example — show both directions + localized source/tags. Reworked the "Explicit language" section to show both realistic scenarios explicitly:

  • Primarily English-speaking user writing a single French entry (English source/tags, French description) — the "I own a property in France" scenario
  • Primarily French-speaking user keeping notes in French (French source/tags/description) — since those values are just labels you'll search by later

Plus a note that a French user could write in English when convenient (symmetry), and a heads-up that the MCP tool names and parameter names are still English-only today (future i18n pass out of scope).

3. Validation error message honesty. The jsonschema-sourced message is accurate but not as actionable as it looks. Added a "Heads-up" callout after the error block acknowledging:

Also added #301 to the "What's next" section.

4. New issue filed: #301feat: structured, actionable validation error envelopes for create_record / update_entry. Includes:

  • Background (with the QA comment as the trigger)
  • Proposal: typed per-keyword envelope with expected_type, actual_type, actual_value, optional suggestion, preserving raw jsonschema message for debug
  • Scope: 6 keyword patterns to cover (type, required, enum, min/max, pattern, additionalProperties)
  • Helper location (src/mcp_awareness/validation.py)
  • Agent-side improvement suggestion (tool description nudge to fetch schema before writing)
  • Acceptance criteria + references

Re-requesting QA.

@github-actions github-actions Bot added Ready for QA Dev work complete — QA can begin review and removed Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 16, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (One observation, but I'll let it pass as it was temporary.)

Comment on lines +72 to +83
```mermaid
flowchart LR
A["Register Schema\n(immutable)"] --> B["Create Record\n(validated)"]
B --> C{Valid?}
C -- Yes --> D["Stored ✓"]
C -- No --> E["Rejected ✗\n(all errors returned)"]
D --> F["Update Record"]
F --> C
G["Delete Schema"] --> H{Records\nexist?}
H -- Yes --> I["Blocked ✗"]
H -- No --> J["Deleted ✓"]
```
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image

This Mermaid chart was blank for me for some reason.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A page refresh resolved the issue...so maybe this is a non-issue...but please check to confirm there's not something in the entry that'll make a difference here.

@cmeans cmeans added QA Active QA is actively reviewing; Dev should not push changes and removed Ready for QA Dev work complete — QA can begin review labels Apr 16, 2026
@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Adding QA Active — starting round 6 re-review of fixes for rounds 4 and 5 findings (ecosystem footer, French example directionality, validation error callout, XLM-R list, upstream credits, bulleted language list).

Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA Re-Review — Round 6

Rounds 4 and 5 fixes (commits 0b738f0 + 6bae9e0) verified. Everything clean.

What landed

Ecosystem footer — added to both guides, byte-identical to the pattern in data-dictionary.md, case-studies.md, vision.md, etc. Logo path ../docs/branding/awareness-logo-32.svg resolves correctly (the file exists at docs/branding/awareness-logo-32.svg); the ../docs/ prefix is a project-wide convention, not a per-guide bug.

French example directionality (schema-record guide) — "Explicit language" section now shows two realistic scenarios:

  • English-speaking user writing French content (English source/tags, French description)
  • Primarily French user with localized labels (source="personnel", French tags, French description)

Plus a note acknowledging tool names and parameters are still English-only. This is exactly the kind of content that signals the project takes multilingual adoption seriously.

Validation error callout (schema-record guide, lines 183-192) — after the '2000' is not of type 'integer' example, a > Heads-up: callout names the rough edge directly:

  • The quote-vs-no-quote distinction is subtle for non-experts
  • The message comes unmodified from the jsonschema library
  • Agent-side workaround: pre-validate by fetching the schema first
  • Tracked at #301

Verified #301 is open (feat: structured, actionable validation error envelopes). Referenced from the "What's next" section too. This kind of honest documentation — acknowledging rough edges while pointing at the fix — builds trust.

Round 4 fixes (0b738f0):

  • 12-language grid table → alphabetical bulleted list. Cleaner.
  • Full ~100-language XLM-RoBERTa list in a collapsible <details> block, sourced from fairseq docs.
  • Reference section expanded into "Reference and credits" with proper attribution to IBM Granite, Meta AI FAIR, Ollama, lingua-py, Hugging Face, PostgreSQL, pgvector, and Snowball. Good open-source hygiene and a better look for evaluators checking whether the project is well-kept.
  • Rationale added for why Granite was chosen (open-weight, enterprise-licensed, 768-dim fits HNSW, runs on modest hardware).

Verification

Check Result
Default embedding model granite-embedding:278m in server.py:112 and embeddings.py:106
HNSW / GIN / ts_rank_cd / 500-char cap All still match code (no regressions)
Tool-call examples (from earlier rounds) entry_id, source/tags/description/logical_key, search(entry_type="alert") all still correct
#301 referenced Issue exists and is open
TOC anchors Reference and credits#reference-and-credits matches
Logo path docs/branding/awareness-logo-32.svg exists; pattern matches rest of docs
All internal links Resolve (design/hybrid-retrieval-multilingual.md, data-dictionary.md)
CI All green: lint, test 3.10/3.11/3.12, typecheck, codecov, license/cla

Non-blocking observation (pre-existing, not introduced by this PR)

The callout's suggested get_knowledge(tags=["schema"]) works via tag convention, but a cleaner path would be get_knowledge(entry_type="schema") since schema is a valid EntryType. However, the get_knowledge tool's own docstring at tools.py:203 is stale — it only lists 'pattern', 'context', 'preference', 'note' and doesn't mention schema/record/alert/etc. That's a tool docstring bug, not a guide bug. Worth filing separately so the tool's self-documentation matches _VALID_ENTRY_TYPES.

Verdict

Applying Ready for QA Signoff. Rounds 4 and 5 are both clean. The guides have grown from "accurate" (round 2) to "useful" (round 3) to "welcoming and trustworthy" (rounds 4-5). The upstream credits and validation-error honesty are exactly the kind of touches that help a project earn a reputation for being well-maintained.

Step 5 (collapsible rendering) still needs a browser check from the maintainer before QA Approved.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Applying Ready for QA Signoff — rounds 4 and 5 fixes all clean. Upstream credits and validation-error honesty are nice touches. One non-blocking observation: get_knowledge tool's docstring is stale (doesn't list new entry types); worth a separate ticket. Step 5 (collapsible rendering) still needs a browser check.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 16, 2026
@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 16, 2026

Step 5 (collapsible rendering) confirmed by maintainer — all checkboxes now green.

Non-blocking observation from round 6 filed as #302 (get_knowledge docstring lists stale entry_type values; audit all tools). Not a blocker for this PR.

@cmeans cmeans added QA Approved Manual QA testing completed and passed and removed Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 16, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit b720791 into main Apr 16, 2026
36 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the docs/schema-record-guide branch April 16, 2026 20:15
cmeans-claude-dev Bot added a commit that referenced this pull request Apr 16, 2026
…guides (#299)

## Summary

Version stamp for **v0.18.0**. No code changes — everything under this
release was already tested and QA-approved in the feature PRs that land
under `[Unreleased]`:

- **#287** — schema + record entry types with JSON Schema validation
- **#295** — non-superuser RLS test harness (closed #289)
- **#298** — CLA Assistant bot installation
- **#300** — Schema + Record user guide, Language support guide (closed
#285)

### Changes in this PR

- `pyproject.toml`: `0.17.0` → `0.18.0`
- `CHANGELOG.md`: rename `[Unreleased]` section to `[0.18.0] -
2026-04-16`; add comparison link
- `README.md`: `16 releases` → `17 releases`; `30 tools` → `32 tools`;
added schema/record to the "Current status > Knowledge store" feature
list; links to the two new guides

### Known gap (already in CHANGELOG)

- **#288** — bulk `delete_entry` paths (by tags / by source) don't
consult `schema_in_use`. Single-id path is protected; bulk is explicitly
flagged in code. P2 medium follow-up.

## QA

Per repo convention, release PRs don't need a manual QA checklist — all
code under `[0.18.0]` was already QA-approved in its feature PR.
Lightweight review only:

1. - [ ] CHANGELOG `[0.18.0]` section matches what was actually merged
since v0.17.0 (PRs #287, #295, #298, #300 visible; no stragglers).
2. - [ ] `pyproject.toml` version matches the intended tag.
3. - [ ] Comparison links at the bottom of CHANGELOG resolve cleanly.
4. - [ ] CI is green.

## After merge

Tag and push:

```
git tag -a v0.18.0 -m "v0.18.0 — schema/record entries, CLA bot, RLS harness, schema/record + language guides"
git push origin v0.18.0
```

Docker images rebuild off `:latest` on tag push; no
`docker-compose.yaml` update needed.

---------

Co-authored-by: cmeans-claude-dev[bot] <3223881+cmeans-claude-dev[bot]@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant