Skip to content

feat: unsupported-language alerts for detected-but-unmapped languages#269

Merged
cmeans-claude-dev[bot] merged 5 commits into
mainfrom
feat/unsupported-language-alert
Apr 12, 2026
Merged

feat: unsupported-language alerts for detected-but-unmapped languages#269
cmeans-claude-dev[bot] merged 5 commits into
mainfrom
feat/unsupported-language-alert

Conversation

@cmeans-claude-dev
Copy link
Copy Markdown
Contributor

@cmeans-claude-dev cmeans-claude-dev Bot commented Apr 12, 2026

Summary

  • Write tools fire info-level structural alert (unsupported-language-{iso}) when lingua detects a language not in the regconfig mapping
  • One alert per language via upsert (not per entry)
  • New detect_language_iso() in language.py returns raw ISO code even for unmapped languages
  • 5 new language tests + 3 new server tests

Closes #264. Refs #238.

QA

Prerequisites

  • pip install -e ".[dev]"
  • Deploy to test instance on alternate port (AWARENESS_PORT=8421)

Manual tests (via MCP tools)

    • Write Japanese text — alert fires
    remember(source="qa", tags=["test"], description="日本語のサンプルテキストです。これは十分に長いテストです。テキスト検出のために書いています。")
    

    Expected: entry created with language: "simple", and get_alerts shows an unsupported-language-ja info alert

    • Write English text — no alert
    remember(source="qa", tags=["test"], description="This is a normal English sentence that should be detected as English by lingua")
    

    Expected: entry created with language: "english", no new unsupported-language alert

    • Alert is upserted, not duplicated
      Write another Japanese entry — expected: same alert updated, not a second alert

🤖 Generated with Claude Code

@cmeans-claude-dev cmeans-claude-dev Bot added the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@github-actions github-actions Bot added the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 12, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cmeans-claude-dev cmeans-claude-dev Bot added Ready for QA Dev work complete — QA can begin review and removed Dev Active Developer is actively working on this PR; QA should not start labels Apr 12, 2026
@github-actions github-actions Bot removed the Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA label Apr 12, 2026
@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QA] Round 1 on PR #269 — QA Failed

Verdict: one small observation (performance, not correctness). Code is well-structured, test coverage is thorough (836/836 pass, +10 new), CI green, CHANGELOG accurate.

What is working well

  • Clean separation of concerns. detect_language_iso() in language.py returns the raw ISO code for unmapped languages — exactly what the alert infrastructure needs. _check_unsupported_language() in tools.py is a clean helper with the right guards: only fires when resolved=="simple" AND lingua detected a non-mapped ISO.
  • Upsert semantics for alerts. One alert per unsupported language (via upsert_alert with alert_id=f"unsupported-language-{iso}"), not per entry. So 100 Japanese entries produce one unsupported-language-ja alert, not 100.
  • Alert failure does not break writes. try/except wrapper with logger.debug on failure. The test test_alert_failure_does_not_break_write explicitly verifies this by making upsert_alert raise and confirming the remember call still succeeds. ✓
  • Correct guard logic. _check_unsupported_language only fires when resolved != "simple" fails (i.e., resolved IS simple) AND detect_language_iso returns a non-None, non-mapped ISO. The iso in ISO_639_1_TO_REGCONFIG guard is defense-in-depth for the edge case where an explicit unknown ISO code overrides detection but the text happens to be in a mapped language — reachable but rare, correctly prevents a false alert.
  • All 4 write tools wired. learn_pattern, remember, add_context, remind all call _check_unsupported_language. ✓
  • Test coverage. 6 language tests (not 5 as the PR body says) + 4 server tests (not 3):
    • TestDetectLanguageIso: short/empty/unavailable/mapped/unmapped/uncertain — all edge cases covered
    • TestUnsupportedLanguageAlert: fires for unmapped / no alert for mapped / failure does not break write / no alert with explicit language
  • CHANGELOG: ### Added, accurate, refs #264 + #238

Small observation — double lingua detection on the same text

_check_unsupported_language(text, resolved) calls detect_language_iso(text), which runs _get_detector().detect_language_of(text). But resolve_language(text_for_detection=text) — called just before — already ran detect_language(text), which also calls _get_detector().detect_language_of(text).

So lingua's detector runs twice on the same text for every write-tool call where resolved == "simple". lingua's detection is CPU-only (no I/O), deterministic for the same input, and fast for short text — so this is a performance observation, not a correctness issue. The two calls will always agree.

Suggested fix options:

  • (a) Accept the double call. lingua detection is fast (~ms for typical awareness-length text), and the simplicity of keeping resolve_language and _check_unsupported_language as independent functions has value. The "simple" path is the minority case (most entries are in mapped languages), so the double detection only fires for unmapped-language writes.

  • (b) Return the raw ISO code from resolve_language alongside the regconfig. Refactors resolve_language to return (regconfig, iso_or_none) — but this changes the foundation layer that went through 8 QA rounds, and every caller would need updating.

  • (c) Cache the last detection result in module state. _get_detector().detect_language_of(text) result cached by text hash, returned immediately on repeat. Adds complexity for marginal gain.

I lean toward (a) — accept the double call. The performance cost is negligible for the typical use case, and the code clarity of having two independent functions (one for regconfig resolution, one for demand signaling) is worth the minor redundancy. If you pick (a), just add a brief comment in _check_unsupported_language noting the intentional double detection so a future optimizer understands it was considered.

Verification

Check Result
Full suite 836/836 pass (+10 from baseline 826)
ruff check src/ tests/ + format clean
CI on PR all green
CHANGELOG ### Added, accurate, refs #264 + #238
New test count 6 language + 4 server = 10 new tests (PR body says "5 + 3" — minor count discrepancy)

Applying QA Failed as the final act.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 12, 2026

[QA] Round 1 — QA Failed. Small observation: double lingua detection on the same text — _check_unsupported_language() calls detect_language_iso(text) which re-runs lingua detection after resolve_language() already ran it. CPU-only, deterministic, fast — performance observation not correctness. I lean toward option (a) (accept the double call, add a comment noting it was considered). Code is otherwise clean and well-tested: 836/836 pass (+10 new tests), clean alert-failure guard, upsert semantics, all 4 write tools wired. Switching Ready for QAQA Failed as the final act.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed Ready for QA Dev work complete — QA can begin review QA Active QA is actively reviewing; Dev should not push changes labels Apr 12, 2026
@cmeans cmeans force-pushed the feat/unsupported-language-alert branch from f62c178 to 0c2a571 Compare April 12, 2026 18:33
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed QA Failed QA found issues — needs dev attention Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
@cmeans-claude-dev cmeans-claude-dev Bot added the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@cmeans-claude-dev
Copy link
Copy Markdown
Contributor Author

[Dev] QA round 1 fix: added docstring comment noting the intentional double lingua detection (lingua caches internally, cost negligible, simpler than threading ISO through resolve_language API). Rebased on main. Will label Ready for QA after CI passes.

cmeans-claude-dev[bot] and others added 3 commits April 12, 2026 14:19
…#264)

When lingua detects a language not in the regconfig mapping, write tools
fire an info-level structural alert (unsupported-language-{iso}). One
alert per language via upsert. Signals demand for Phase 3 non-Western
support.

New detect_language_iso() in language.py returns raw ISO code even for
unmapped languages. 8 new tests (5 language, 3 server).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ecov gap)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d 1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@cmeans-claude-dev cmeans-claude-dev Bot removed the Dev Active Developer is actively working on this PR; QA should not start label Apr 12, 2026
@cmeans cmeans force-pushed the feat/unsupported-language-alert branch from 0c2a571 to b26239a Compare April 12, 2026 19:20
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed Ready for QA Dev work complete — QA can begin review Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label Apr 12, 2026
@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed QA Active QA is actively reviewing; Dev should not push changes labels Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed opportunity to leverage const SIMPLE.

Comment thread src/mcp_awareness/tools.py Outdated
raw ISO code through resolve_language would complicate its API for
a rare-path optimization.
"""
if resolved != "simple":
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be using the const we have?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Dev] Fixed — using SIMPLE constant from language.py. Pushed 759b244.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 12, 2026
…eview)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed QA Failed QA found issues — needs dev attention Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still finding the same shit.

Comment thread tests/test_server.py Outdated
"""If alert firing fails, the write still succeeds."""
import mcp_awareness.tools as tools_mod

monkeypatch.setattr(tools_mod, "resolve_language", lambda **kwargs: "simple")
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still using "simple", thought we had a solution for test as well.

@cmeans cmeans added QA Failed QA found issues — needs dev attention and removed Ready for QA Dev work complete — QA can begin review labels Apr 12, 2026
…ing literals

Replace all remaining "simple" literals across src/ and tests/ with the
SIMPLE constant from language.py. schema.py, store.py, tools.py,
test_server.py, and test_store.py updated. Only the definition in
language.py retains the literal.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA Ready for QA Dev work complete — QA can begin review and removed QA Failed QA found issues — needs dev attention Awaiting CI Dev complete, waiting for CI/Codecov to pass before QA labels Apr 12, 2026
@cmeans cmeans added the QA Active QA is actively reviewing; Dev should not push changes label Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[QA] Round 3 — Signoff

Comprehensive SIMPLE constant cleanup per maintainer review. All "simple" string literals eliminated across 5 files:

  • schema.pyEntry default, to_dict() check, from_dict() default (3 replacements, SIMPLE imported from language.py) ✓
  • store.pyquery_language default in Protocol (1 replacement, SIMPLE imported) ✓
  • tools.py_check_unsupported_language guard, get_knowledge hint path, search tool fallback (3 replacements) ✓
  • test_server.py — 4 replacements, SIMPLE imported from mcp_awareness.language
  • test_store.py — local SIMPLE = "simple" constant removed, replaced with import from mcp_awareness.language

Zero remaining "simple" string literals in any key file (verified via grep). 848/848 pass. Zero new observations.

Ready for QA Signoff.

@cmeans
Copy link
Copy Markdown
Owner

cmeans commented Apr 12, 2026

[QA] Round 3 — Ready for QA Signoff. All "simple" literals eliminated across 5 files, now using canonical SIMPLE constant from language.py everywhere. Zero remaining literals (grep verified). 848/848 pass. Applying label as final act.

@cmeans cmeans added Ready for QA Signoff QA passed — ready for maintainer final review and merge and removed Ready for QA Dev work complete — QA can begin review QA Active QA is actively reviewing; Dev should not push changes labels Apr 12, 2026
Copy link
Copy Markdown
Owner

@cmeans cmeans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cmeans cmeans added QA Approved Manual QA testing completed and passed and removed Ready for QA Signoff QA passed — ready for maintainer final review and merge labels Apr 12, 2026
@cmeans-claude-dev cmeans-claude-dev Bot merged commit 901c5d2 into main Apr 12, 2026
35 checks passed
@cmeans-claude-dev cmeans-claude-dev Bot deleted the feat/unsupported-language-alert branch April 12, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QA Approved Manual QA testing completed and passed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: unsupported-language alert when lingua detects a language outside ISO_639_1_TO_REGCONFIG

1 participant