feat: unsupported-language alerts for detected-but-unmapped languages#269
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
cmeans
left a comment
There was a problem hiding this comment.
[QA] Round 1 on PR #269 — QA Failed
Verdict: one small observation (performance, not correctness). Code is well-structured, test coverage is thorough (836/836 pass, +10 new), CI green, CHANGELOG accurate.
What is working well
- Clean separation of concerns.
detect_language_iso()inlanguage.pyreturns the raw ISO code for unmapped languages — exactly what the alert infrastructure needs._check_unsupported_language()intools.pyis a clean helper with the right guards: only fires when resolved=="simple" AND lingua detected a non-mapped ISO. - Upsert semantics for alerts. One alert per unsupported language (via
upsert_alertwithalert_id=f"unsupported-language-{iso}"), not per entry. So 100 Japanese entries produce oneunsupported-language-jaalert, not 100. - Alert failure does not break writes.
try/exceptwrapper withlogger.debugon failure. The testtest_alert_failure_does_not_break_writeexplicitly verifies this by makingupsert_alertraise and confirming theremembercall still succeeds. ✓ - Correct guard logic.
_check_unsupported_languageonly fires whenresolved != "simple"fails (i.e., resolved IS simple) ANDdetect_language_isoreturns a non-None, non-mapped ISO. Theiso in ISO_639_1_TO_REGCONFIGguard is defense-in-depth for the edge case where an explicit unknown ISO code overrides detection but the text happens to be in a mapped language — reachable but rare, correctly prevents a false alert. - All 4 write tools wired.
learn_pattern,remember,add_context,remindall call_check_unsupported_language. ✓ - Test coverage. 6 language tests (not 5 as the PR body says) + 4 server tests (not 3):
TestDetectLanguageIso: short/empty/unavailable/mapped/unmapped/uncertain — all edge cases coveredTestUnsupportedLanguageAlert: fires for unmapped / no alert for mapped / failure does not break write / no alert with explicit language
- CHANGELOG:
### Added, accurate, refs #264 + #238 ✓
Small observation — double lingua detection on the same text
_check_unsupported_language(text, resolved) calls detect_language_iso(text), which runs _get_detector().detect_language_of(text). But resolve_language(text_for_detection=text) — called just before — already ran detect_language(text), which also calls _get_detector().detect_language_of(text).
So lingua's detector runs twice on the same text for every write-tool call where resolved == "simple". lingua's detection is CPU-only (no I/O), deterministic for the same input, and fast for short text — so this is a performance observation, not a correctness issue. The two calls will always agree.
Suggested fix options:
-
(a) Accept the double call. lingua detection is fast (~ms for typical awareness-length text), and the simplicity of keeping
resolve_languageand_check_unsupported_languageas independent functions has value. The "simple" path is the minority case (most entries are in mapped languages), so the double detection only fires for unmapped-language writes. -
(b) Return the raw ISO code from
resolve_languagealongside the regconfig. Refactorsresolve_languageto return(regconfig, iso_or_none)— but this changes the foundation layer that went through 8 QA rounds, and every caller would need updating. -
(c) Cache the last detection result in module state.
_get_detector().detect_language_of(text)result cached by text hash, returned immediately on repeat. Adds complexity for marginal gain.
I lean toward (a) — accept the double call. The performance cost is negligible for the typical use case, and the code clarity of having two independent functions (one for regconfig resolution, one for demand signaling) is worth the minor redundancy. If you pick (a), just add a brief comment in _check_unsupported_language noting the intentional double detection so a future optimizer understands it was considered.
Verification
| Check | Result |
|---|---|
| Full suite | 836/836 pass (+10 from baseline 826) |
ruff check src/ tests/ + format |
clean |
| CI on PR | all green |
| CHANGELOG | ### Added, accurate, refs #264 + #238 |
| New test count | 6 language + 4 server = 10 new tests (PR body says "5 + 3" — minor count discrepancy) |
Applying QA Failed as the final act.
|
[QA] Round 1 — QA Failed. Small observation: double lingua detection on the same text — |
f62c178 to
0c2a571
Compare
|
[Dev] QA round 1 fix: added docstring comment noting the intentional double lingua detection (lingua caches internally, cost negligible, simpler than threading ISO through resolve_language API). Rebased on main. Will label Ready for QA after CI passes. |
…#264) When lingua detects a language not in the regconfig mapping, write tools fire an info-level structural alert (unsupported-language-{iso}). One alert per language via upsert. Signals demand for Phase 3 non-Western support. New detect_language_iso() in language.py returns raw ISO code even for unmapped languages. 8 new tests (5 language, 3 server). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecov gap) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d 1) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0c2a571 to
b26239a
Compare
cmeans
left a comment
There was a problem hiding this comment.
Missed opportunity to leverage const SIMPLE.
| raw ISO code through resolve_language would complicate its API for | ||
| a rare-path optimization. | ||
| """ | ||
| if resolved != "simple": |
There was a problem hiding this comment.
Shouldn't we be using the const we have?
There was a problem hiding this comment.
[Dev] Fixed — using SIMPLE constant from language.py. Pushed 759b244.
…eview) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| """If alert firing fails, the write still succeeds.""" | ||
| import mcp_awareness.tools as tools_mod | ||
|
|
||
| monkeypatch.setattr(tools_mod, "resolve_language", lambda **kwargs: "simple") |
There was a problem hiding this comment.
Still using "simple", thought we had a solution for test as well.
…ing literals Replace all remaining "simple" literals across src/ and tests/ with the SIMPLE constant from language.py. schema.py, store.py, tools.py, test_server.py, and test_store.py updated. Only the definition in language.py retains the literal. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
cmeans
left a comment
There was a problem hiding this comment.
[QA] Round 3 — Signoff
Comprehensive SIMPLE constant cleanup per maintainer review. All "simple" string literals eliminated across 5 files:
schema.py—Entrydefault,to_dict()check,from_dict()default (3 replacements,SIMPLEimported fromlanguage.py) ✓store.py—query_languagedefault in Protocol (1 replacement,SIMPLEimported) ✓tools.py—_check_unsupported_languageguard,get_knowledgehint path,searchtool fallback (3 replacements) ✓test_server.py— 4 replacements,SIMPLEimported frommcp_awareness.language✓test_store.py— localSIMPLE = "simple"constant removed, replaced with import frommcp_awareness.language✓
Zero remaining "simple" string literals in any key file (verified via grep). 848/848 pass. Zero new observations.
Ready for QA Signoff.
|
[QA] Round 3 — Ready for QA Signoff. All |
Summary
unsupported-language-{iso}) when lingua detects a language not in the regconfig mappingdetect_language_iso()inlanguage.pyreturns raw ISO code even for unmapped languagesCloses #264. Refs #238.
QA
Prerequisites
pip install -e ".[dev]"AWARENESS_PORT=8421)Manual tests (via MCP tools)
Expected: entry created with
language: "simple", andget_alertsshows anunsupported-language-jainfo alertExpected: entry created with
language: "english", no new unsupported-language alertWrite another Japanese entry — expected: same alert updated, not a second alert
🤖 Generated with Claude Code