docs: add Schema + Record and Language support guides#300
Conversation
New docs/schema-record-guide.md covers why typed data matters (framing against free-form knowledge tools), who the feature is for (personal collections + team/integration use), and a full worked example: registering an album schema, creating a record, what a validation failure looks like, update with re-validation, and why schema deletion is blocked when live records reference it. Extends the walk-through into the Tag Taxonomy tie-in — the Layer C design for user-defined tag vocabularies will consume records validated against a tag-definition schema, so schema/record doubles as the foundation under that upcoming feature. Six additional use cases in collapsible sections: reading list, recipes, home inventory (with purchase/receipt URLs), subscriptions, edge provider manifests, and meeting/bug templates. Each names the kind of future edge provider that would naturally extend the schema (Goodreads for books, recipe APIs for recipes, etc.) without making the doc depend on any specific service being available. Closes the "What's next" section with links to the REST API + schema marketplace roadmap idea (awareness logical_key design-schema-marketplace-import), Tag Taxonomy Layer C, and the open P2/P3 follow-ups on main (#290, #291, #292, #293). README updates: adds mcp-awareness-register-schema to the CLI tools bullet; links the new guide from Design docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
New docs/language-guide.md covers how mcp-awareness handles multilingual content: per-entry language detection (explicit ISO 639-1 parameter or auto-detection via lingua-py), the 28 supported Postgres snowball regconfigs, querying by language with get_knowledge, how hybrid search handles cross-language queries (vector branch is language-agnostic, FTS branch uses per-entry regconfig), and unsupported-language alerts as a demand signal for Phase 3 non-Western language support. Includes deployment notes: lingua install, the one-time language backfill migration on v0.17.0 upgrade, and the regconfig validation cache that prevents INSERT failures from invalid language values. README links the new guide from the Design docs section. CHANGELOG entry under [Unreleased]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Adding QA Active — starting review with focus on how the guides present to potential users. |
cmeans
left a comment
There was a problem hiding this comment.
QA Review
User-facing quality assessment
These guides are aimed at people evaluating whether mcp-awareness is worth adopting. The writing is strong on framing and narrative:
- "Why typed data?" is the right opening. The inconsistent
statusvsstatevs missing-field problem is immediately relatable. Anyone who's worked with unstructured data recognizes it. This will land. - Music-collection walk-through is well-paced: register, create, fail, update, delete-blocked. Each step teaches one concept. The progression gives a reader confidence that they understand the feature by the end.
- "Future collector" notes in collapsible sections are smart. They show the growth path without overselling. A reader evaluating the project sees "this is useful now, and the team has a plan for more." That's the right signal.
- Language guide resolution chain (explicit, auto-detect, fallback) is clear and practical. The deployment notes are operator-friendly.
- Collapsible use cases prevent wall-of-text fatigue while showing breadth. Good editorial choice.
However, there are accuracy issues in the tool-call examples that will burn a user's first impression if they try to follow along. A user who copies the walk-through and gets parameter-missing errors on the first call will bounce. These need to be fixed before this ships.
Findings
1. Substantive — register_schema examples missing required parameters. Every register_schema call in the guide (lines 80, 202, 246, 274, 303, 333) omits the required source and tags parameters. The actual signature is register_schema(source, tags, description, family, version, schema, ...). A user (or an LLM agent following the guide) who copies these examples will get a missing-parameter error on the first try.
Fix options:
- Preferred: Add
sourceandtagsto every example so they're copy-pasteable. E.g., the album example becomesregister_schema(source="personal", tags=["music", "schema"], description="...", family="album", version="1", schema={...}). - Acceptable: Add a visible note near the top of the walk-through (not buried in a footnote) explaining that administrative parameters (
source,tags,learned_from) are omitted for clarity and will be filled in by the agent contextually. I'd still prefer complete examples for the primary walk-through and only abbreviate the collapsible sections.
2. Substantive — create_record examples missing required parameters. Every create_record call (lines 114, 224) omits source, description, and logical_key. All three are required. logical_key is especially important because it drives upsert behavior, and description is what makes the record discoverable via get_knowledge. Same fix options as finding 1.
3. Substantive — update_entry parameter name wrong. Line 161 uses id= but the actual parameter is entry_id=. Direct runtime error.
4. Substantive — get_alerts(tags=...) doesn't exist. Language guide line 180 shows get_alerts(tags=["language", "unsupported"]). The get_alerts tool has no tags parameter. Its signature is get_alerts(source, since, mode, limit, offset). This will produce a runtime error. The correct approach is either get_alerts() with manual filtering, or a search(query="unsupported language") call.
5. Substantive — Dead # anchor links. Schema-record guide lines 38 and 232 link [Tag Taxonomy v2 design](#) and [Tag Taxonomy Layer C](#) to #, which is a self-anchor that navigates to the top of the current page. For a user clicking through, this looks broken. Either link to the actual design doc (if it exists outside this repo), use a plain-text reference instead of a link, or link to the relevant GitHub issue.
What verified clean
| Check | Status |
|---|---|
| Narrative flow (both guides) | Clear, well-paced, good for the target audience |
| Language table (28 entries) | Exact match against ISO_639_1_TO_REGCONFIG in language.py |
| Internal file links | data-dictionary.md, hybrid-retrieval-multilingual.md, schema-record-entry-types-design.md all exist |
| GitHub issue refs | #288, #239, #290, #291, #292, #293 all exist and are open |
| External links | JSON Schema spec, lingua-py, Postgres docs |
| Tag Taxonomy framing | "Not wired in yet, but foundation" matches current state |
| README updates | CLI tools list and Design docs links are accurate |
| CHANGELOG entries | Both entries under [Unreleased], correctly placed |
| CI | All green: lint, test, typecheck, codecov, license/cla |
Verdict
Applying QA Failed. The guides are well-written and the framing will serve the project well, but 5 accuracy issues in tool-call examples will break the walk-through for anyone who tries to follow along. All are straightforward fixes.
Classification:
- Findings 1-4: needs-PR-before-ship (accuracy errors in runnable examples)
- Finding 5: needs-PR-before-ship (dead links in user-facing docs)
|
Applying QA Failed — 5 substantive findings, all in tool-call example accuracy. The writing and framing are strong; the examples need parameter corrections so users can actually follow along. Details in the review above. |
Schema + Record guide: - Fix all register_schema examples: add required source, tags, description params to primary walk-through examples - Fix all create_record examples: add required source, tags, description, logical_key params - Fix update_entry: id= → entry_id= (correct param name) - Fix dead # anchor links for Tag Taxonomy references - Add brevity note before collapsible sections explaining that abbreviated examples omit required admin params shown above - Add lifecycle Mermaid diagram (register → create → validate → update → delete-blocked) Language guide: - Fix get_alerts(tags=...) → search(query=..., entry_type="alert") since get_alerts has no tags parameter - Add resolution-chain Mermaid diagram (explicit → lingua → simple) - Add hybrid-search two-lane Mermaid diagram (vector + FTS → RRF) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dev response to QA findingsAll 5 findings addressed in commit 34a89f4. Finding 1 — Finding 2 — Finding 3 — Finding 4 — Finding 5 — Dead Bonus — Mermaid diagrams added:
Re-requesting QA. |
|
Adding QA Active — starting re-review of fixes for 5 substantive findings. |
cmeans
left a comment
There was a problem hiding this comment.
QA Re-Review — Round 2
All 5 findings from round 1 are resolved.
Finding 1 (register_schema missing params): Fixed
Primary walk-through examples now include source, tags, description. Collapsible "More use cases" section has a visible note explaining abbreviated examples inherit the same required params shown in the walk-through. Good editorial choice — complete where it counts, abbreviated where it's clearly supporting material.
Finding 2 (create_record missing params): Fixed
Both primary examples now include source, tags, description, logical_key. The logical_key values (album-ok-computer, album-kid-a, tag-music-genre-rock-alternative) are well-chosen: human-readable, unique, and demonstrate the upsert convention.
Finding 3 (update_entry id → entry_id): Fixed
Line 184: entry_id="<the record id>". Matches actual signature.
Finding 4 (get_alerts tags param): Fixed
Replaced with search(query="unsupported language", entry_type="alert"). Verified: search accepts entry_type and "alert" is a valid EntryType value. Also added get_alerts() as a browse alternative for users without an embedding provider. Both paths are accurate.
Finding 5 (dead # anchor links): Fixed
Both occurrences replaced with plain text ("Tag Taxonomy v2 design" and "Tag Taxonomy Layer C") — no link, no dead anchor. Clean.
Bonus: Mermaid diagrams
Three new Mermaid diagrams added:
- Lifecycle (schema-record guide) — register → create → valid? → stored/rejected + delete protection. Clear visual of the feature's guarantees.
- Language resolution (language guide) — flowchart of explicit → auto-detect → fallback chain. Matches the prose exactly.
- Hybrid search (language guide) — vector + FTS → RRF → merged results.
These are solid additions for the target audience. GitHub renders \``mermaid` blocks natively.
Remaining checkbox
Step 5 (collapsible sections render) requires a browser check — I can't render GitHub <details> blocks from CLI. Flagging for the maintainer to verify before applying QA Approved.
Re-verification
- Tool params re-checked against
tools.py:register_schema,create_record,update_entry,searchall match EntryType.ALERTconfirmed inschema.py—search(entry_type="alert")is valid- All internal links still resolve (no regressions from the fix commit)
- CI all green: lint, test 3.10/3.11/3.12, typecheck, codecov, license/cla
Verdict
Applying Ready for QA Signoff. All accuracy issues fixed, examples are now copy-pasteable, Mermaid diagrams are a nice addition. One checkbox deferred to maintainer (collapsible section rendering).
|
Applying Ready for QA Signoff — all 5 round-1 findings fixed, tool-call examples now match actual signatures, Mermaid diagrams are a nice addition. One checkbox (collapsible section rendering) needs a browser check from the maintainer. Over to you for QA Approved. |
Dev response to round 4 QA findingsThree inline findings on the embedding-languages section addressed in commit 0b738f0. Line 116 — "terrible table, 3 columns with the same heading". Fair. Replaced the space-saving 3-column grid with a proper alphabetical bullet list of the 12 Granite training languages. Cleaner and actually scannable. Line 118 — "link to outside providers, libraries we rely on, they deserve love too". Rewrote the Reference section as "Reference and credits" with explicit credit and links to every upstream project multilingual support rests on:
Line 121 — "what are those other languages? No listing". Added the full list of ~100 XLM-RoBERTa languages in a collapsible Also reframed the section to explain why we picked Granite (open-weight, enterprise-licensed, 768 dims for our HNSW index, runs on modest hardware) so readers understand the choice, not just the result. Re-requesting QA. |
cmeans
left a comment
There was a problem hiding this comment.
A few minor issues, and at least one new Issue to create to try to return better error messages.
| remember( | ||
| description="Le serveur NAS est dans le placard du sous-sol.", | ||
| source="personal", | ||
| tags=["infra", "nas"], | ||
| language="fr" | ||
| ) |
There was a problem hiding this comment.
Wouldn't this example be better if the source value and tags were also in french?
I realize we'd need a fr sensitive tools version for the tool and parameters to be in French as well, but until then...
There was a problem hiding this comment.
Maybe there's value in showing both... Show that an English user can enter data in French or German or whatever, and that a French user can do so also.
| "error": "validation_failed", | ||
| "schema_ref": "album:1", | ||
| "validation_errors": [ | ||
| {"path": "/year", "message": "'2000' is not of type 'integer'"} |
There was a problem hiding this comment.
This is not as helpful an error message as it looks. Some users may not understand that the quotes around the number change its format. Our messages should also suggest how to fix the error if it's obvious. An AI would understand the message, however, it would also avoid the problem entirely if it knew to quickly pull the schema to validate client-side.
Can we provide more structured errors (as we have with our native tools)? I realize the scope is larger here, but it seems at least that there may be some patterns we can be helpful with at least.
I'd suggest creating an Issue so we can handle this better.
| - [Schema/Record design doc](superpowers/specs/2026-04-13-schema-record-entry-types-design.md) | ||
| — the design this implementation shipped from. | ||
| - [JSON Schema Draft 2020-12](https://json-schema.org/draft/2020-12) | ||
| — the external spec schemas conform to. |
There was a problem hiding this comment.
All documentation pages should have our standard Awareness ecosystem copyright notice etc., yes?
Four findings from QA:
1. Missing ecosystem copyright footer on both guides. Added the
standard "Part of the Awareness ecosystem" footer (matching the
pattern used in data-dictionary.md, case-studies.md, vision.md,
deployment-guide.md).
2. French "remember" example showed localized description but kept
source/tags in English — feels half-committed. Reworked the
section to show BOTH realistic scenarios explicitly:
(a) primarily-English user writing a single French entry
(English source/tags, French description),
(b) primarily-French user writing in French natively
(French source/tags/description).
Also added a symmetric note that a French user could write in
English when convenient, and a heads-up that MCP tool names and
parameters are themselves English-only today.
3. Validation-error example in schema-record guide overstated how
helpful the jsonschema-sourced message is. Added a "Heads-up"
callout acknowledging:
- The quoted-vs-unquoted distinction is subtle for non-experts
- The message doesn't suggest a fix
- Tracked as #301 for richer typed envelopes
- Agents can pre-validate client-side by fetching the schema via
get_knowledge(tags=["schema"]) to avoid the round-trip
Also added #301 to the "What's next" section.
4. Filed new issue #301 (feat: structured, actionable validation
error envelopes for create_record / update_entry) with the design
sketch, scope, acceptance criteria, and references the QA
comment that surfaced it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dev response to round 5 QA findingsAll 4 items addressed in commit 6bae9e0. 1. Ecosystem copyright footer. Added to both guides, matching the standard pattern used in 2. French example — show both directions + localized source/tags. Reworked the "Explicit language" section to show both realistic scenarios explicitly:
Plus a note that a French user could write in English when convenient (symmetry), and a heads-up that the MCP tool names and parameter names are still English-only today (future i18n pass out of scope). 3. Validation error message honesty. The
Also added #301 to the "What's next" section. 4. New issue filed: #301 — feat: structured, actionable validation error envelopes for create_record / update_entry. Includes:
Re-requesting QA. |
cmeans
left a comment
There was a problem hiding this comment.
LGTM (One observation, but I'll let it pass as it was temporary.)
| ```mermaid | ||
| flowchart LR | ||
| A["Register Schema\n(immutable)"] --> B["Create Record\n(validated)"] | ||
| B --> C{Valid?} | ||
| C -- Yes --> D["Stored ✓"] | ||
| C -- No --> E["Rejected ✗\n(all errors returned)"] | ||
| D --> F["Update Record"] | ||
| F --> C | ||
| G["Delete Schema"] --> H{Records\nexist?} | ||
| H -- Yes --> I["Blocked ✗"] | ||
| H -- No --> J["Deleted ✓"] | ||
| ``` |
There was a problem hiding this comment.
A page refresh resolved the issue...so maybe this is a non-issue...but please check to confirm there's not something in the entry that'll make a difference here.
|
Adding QA Active — starting round 6 re-review of fixes for rounds 4 and 5 findings (ecosystem footer, French example directionality, validation error callout, XLM-R list, upstream credits, bulleted language list). |
cmeans
left a comment
There was a problem hiding this comment.
QA Re-Review — Round 6
Rounds 4 and 5 fixes (commits 0b738f0 + 6bae9e0) verified. Everything clean.
What landed
Ecosystem footer — added to both guides, byte-identical to the pattern in data-dictionary.md, case-studies.md, vision.md, etc. Logo path ../docs/branding/awareness-logo-32.svg resolves correctly (the file exists at docs/branding/awareness-logo-32.svg); the ../docs/ prefix is a project-wide convention, not a per-guide bug.
French example directionality (schema-record guide) — "Explicit language" section now shows two realistic scenarios:
- English-speaking user writing French content (English
source/tags, Frenchdescription) - Primarily French user with localized labels (
source="personnel", Frenchtags, Frenchdescription)
Plus a note acknowledging tool names and parameters are still English-only. This is exactly the kind of content that signals the project takes multilingual adoption seriously.
Validation error callout (schema-record guide, lines 183-192) — after the '2000' is not of type 'integer' example, a > Heads-up: callout names the rough edge directly:
- The quote-vs-no-quote distinction is subtle for non-experts
- The message comes unmodified from the
jsonschemalibrary - Agent-side workaround: pre-validate by fetching the schema first
- Tracked at #301
Verified #301 is open (feat: structured, actionable validation error envelopes). Referenced from the "What's next" section too. This kind of honest documentation — acknowledging rough edges while pointing at the fix — builds trust.
Round 4 fixes (0b738f0):
- 12-language grid table → alphabetical bulleted list. Cleaner.
- Full ~100-language XLM-RoBERTa list in a collapsible
<details>block, sourced from fairseq docs. - Reference section expanded into "Reference and credits" with proper attribution to IBM Granite, Meta AI FAIR, Ollama, lingua-py, Hugging Face, PostgreSQL, pgvector, and Snowball. Good open-source hygiene and a better look for evaluators checking whether the project is well-kept.
- Rationale added for why Granite was chosen (open-weight, enterprise-licensed, 768-dim fits HNSW, runs on modest hardware).
Verification
| Check | Result |
|---|---|
| Default embedding model | granite-embedding:278m in server.py:112 and embeddings.py:106 ✓ |
| HNSW / GIN / ts_rank_cd / 500-char cap | All still match code (no regressions) |
| Tool-call examples (from earlier rounds) | entry_id, source/tags/description/logical_key, search(entry_type="alert") all still correct |
| #301 referenced | Issue exists and is open |
| TOC anchors | Reference and credits → #reference-and-credits matches |
| Logo path | docs/branding/awareness-logo-32.svg exists; pattern matches rest of docs |
| All internal links | Resolve (design/hybrid-retrieval-multilingual.md, data-dictionary.md) |
| CI | All green: lint, test 3.10/3.11/3.12, typecheck, codecov, license/cla |
Non-blocking observation (pre-existing, not introduced by this PR)
The callout's suggested get_knowledge(tags=["schema"]) works via tag convention, but a cleaner path would be get_knowledge(entry_type="schema") since schema is a valid EntryType. However, the get_knowledge tool's own docstring at tools.py:203 is stale — it only lists 'pattern', 'context', 'preference', 'note' and doesn't mention schema/record/alert/etc. That's a tool docstring bug, not a guide bug. Worth filing separately so the tool's self-documentation matches _VALID_ENTRY_TYPES.
Verdict
Applying Ready for QA Signoff. Rounds 4 and 5 are both clean. The guides have grown from "accurate" (round 2) to "useful" (round 3) to "welcoming and trustworthy" (rounds 4-5). The upstream credits and validation-error honesty are exactly the kind of touches that help a project earn a reputation for being well-maintained.
Step 5 (collapsible rendering) still needs a browser check from the maintainer before QA Approved.
|
Applying Ready for QA Signoff — rounds 4 and 5 fixes all clean. Upstream credits and validation-error honesty are nice touches. One non-blocking observation: |
|
Step 5 (collapsible rendering) confirmed by maintainer — all checkboxes now green. Non-blocking observation from round 6 filed as #302 ( |
…guides (#299) ## Summary Version stamp for **v0.18.0**. No code changes — everything under this release was already tested and QA-approved in the feature PRs that land under `[Unreleased]`: - **#287** — schema + record entry types with JSON Schema validation - **#295** — non-superuser RLS test harness (closed #289) - **#298** — CLA Assistant bot installation - **#300** — Schema + Record user guide, Language support guide (closed #285) ### Changes in this PR - `pyproject.toml`: `0.17.0` → `0.18.0` - `CHANGELOG.md`: rename `[Unreleased]` section to `[0.18.0] - 2026-04-16`; add comparison link - `README.md`: `16 releases` → `17 releases`; `30 tools` → `32 tools`; added schema/record to the "Current status > Knowledge store" feature list; links to the two new guides ### Known gap (already in CHANGELOG) - **#288** — bulk `delete_entry` paths (by tags / by source) don't consult `schema_in_use`. Single-id path is protected; bulk is explicitly flagged in code. P2 medium follow-up. ## QA Per repo convention, release PRs don't need a manual QA checklist — all code under `[0.18.0]` was already QA-approved in its feature PR. Lightweight review only: 1. - [ ] CHANGELOG `[0.18.0]` section matches what was actually merged since v0.17.0 (PRs #287, #295, #298, #300 visible; no stragglers). 2. - [ ] `pyproject.toml` version matches the intended tag. 3. - [ ] Comparison links at the bottom of CHANGELOG resolve cleanly. 4. - [ ] CI is green. ## After merge Tag and push: ``` git tag -a v0.18.0 -m "v0.18.0 — schema/record entries, CLA bot, RLS harness, schema/record + language guides" git push origin v0.18.0 ``` Docker images rebuild off `:latest` on tag push; no `docker-compose.yaml` update needed. --------- Co-authored-by: cmeans-claude-dev[bot] <3223881+cmeans-claude-dev[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Summary
Adds user-facing how-to guides for two major features that shipped without user docs: schema/record (v0.18.0, PR #287) and language support (v0.17.0, PR #259 et al).
Schema + Record guide (
docs/schema-record-guide.md, ~470 lines)mcp-awareness-register-schemafor_systemschemas.Language support guide (
docs/language-guide.md, ~250 lines)languageparameter → lingua auto-detection →simplefallback.get_knowledgelanguage filter + how hybrid search handles cross-language queries (vector branch is language-agnostic, FTS uses per-entry regconfig).Other changes
README.md— CLI tools bullet addsmcp-awareness-register-schema; Design docs section adds links to both guides.CHANGELOG.md— two entries under[Unreleased].Sequencing
This PR must merge before release PR #299 (v0.18.0). After merge, #299 will be rebased so the docs CHANGELOG entries land in the
[0.18.0]section. Both guides ship with the release.QA
Review checklist
docs/schema-record-guide.md— narrative flows: why → who → walk-through → more use cases → guarantees → CLI → what's next → reference.docs/language-guide.md— narrative flows: how it works → supported languages → writing → querying → alerts → deployment → what's next → reference.register_schema,create_record,update_entry,remember,get_knowledgeparameter names and shapes againstsrc/mcp_awareness/tools.py. (QA round 2: all 4 accuracy issues fixed)ISO_639_1_TO_REGCONFIGinsrc/mcp_awareness/language.py.<details>blocks on GitHub and confirm the schema examples inside read correctly. (Maintainer confirmed 2026-04-16.)#anchors replaced with plain text)[Unreleased]reflect what actually landed.Not in scope