cmeans · cmeans · Mar 21, 2026 · Mar 21, 2026 · Mar 21, 2026 · Mar 21, 2026
@@ -43,7 +43,7 @@ src/mcp_awareness/
 - Pattern matching uses word-overlap between effect string and alert fields (hyphens/dashes normalized); hour ranges handle overnight wraparound
 - Soft delete: `delete_entry` moves to trash (30-day retention), `restore_entry` recovers, `get_deleted` lists trash. Bulk deletes require `confirm=True` (dry-run by default). Auto-purged by existing `_cleanup_expired`.
 - Resource descriptions carry behavioral hints — duplicate guidance in both server instructions and docstrings
-- Store uses threading.Lock on writes for async safety; _cleanup_expired is debounced (10s interval)
+- Store uses threading.Lock on writes for async safety; _cleanup_expired spawns a background daemon thread (never blocks the caller), debounced (10s interval), only triggered by writes
 - Transport: stdio (default) or streamable-http via AWARENESS_TRANSPORT env var; HTTP on AWARENESS_HOST:AWARENESS_PORT/mcp
 - Secret path auth: `AWARENESS_MOUNT_PATH` env var (e.g., `/my-secret`) rewrites `/my-secret/mcp` → `/mcp`, returns 404 for all other paths. Used with Cloudflare WAF to block unauthenticated traffic at the edge.
 

@@ -7,22 +7,22 @@
 
 ## What this is
 
+<img src="docs/images/android-briefing-demo.png" alt="Claude on Android surfacing an infrastructure alert during an unrelated conversation" width="220" align="right">
+
 `mcp-awareness` is a portable knowledge and awareness layer for AI agents. It gives any MCP-compatible AI assistant — Claude, ChatGPT, Cursor, or whatever comes next — access to a shared store of knowledge, system status, and operational context that *you* own and control.
 
 **The problem it solves:** Every AI platform has its own memory silo. Knowledge you build up in Claude doesn't exist in ChatGPT. Context from your desktop assistant doesn't follow you to mobile. If you switch platforms, you start over. Your AI knows you — but only within its walled garden.
 
 **What `mcp-awareness` does:** It externalizes that knowledge into a self-hosted service that any agent can read from and write to, using the open [Model Context Protocol](https://modelcontextprotocol.io/) (MCP). Tell one agent about your infrastructure, your projects, your preferences — and every agent knows it. Permanently, portably, privately.
 
+<br clear="both">
+
 ### What it looks like in practice
 
 In a single prompt — *"save your knowledge about me to awareness"* — Claude.ai wrote 39 tagged, searchable knowledge entries covering infrastructure, projects, family, health, finances, and operational patterns. Those entries are immediately accessible from Claude Code, Claude Desktop, or any other MCP client. The knowledge doesn't belong to Claude anymore. It belongs to the system.
 
 That same store also provides ambient system awareness: edge processes report status and alerts, a collation engine applies suppressions and patterns, and agents receive a pre-computed briefing (~200 tokens) at conversation start. If something needs attention, the agent mentions it. If not, silence.
 
-<p align="center">
-  <img src="docs/images/android-briefing-demo.png" alt="Claude on Android surfacing an infrastructure alert during an unrelated conversation" width="300">
-</p>
-
 ## How it started
 
 This project began with a single memory instruction in Claude.ai:
@@ -167,6 +167,41 @@ docker compose up -d
 docker compose --profile quick up -d mcp-awareness tunnel-quick
 ```
 
+## Tools
+
+The server exposes 14 MCP tools. Clients that support MCP resources also get 6 read-only resources, but since many clients (including Claude.ai) only surface tools, every resource has a tool mirror.
+
+### Read tools
+
+| Tool | Description |
+|------|-------------|
+| `get_briefing` | Compact awareness summary (~200 tokens all-clear, ~500 with issues). Call at conversation start. Pre-filtered through patterns and suppressions. |
+| `get_alerts` | Active alerts, optionally filtered by source. Drill-down from briefing. |
+| `get_status` | Full status for a specific source including metrics and inventory. |
+| `get_knowledge` | All knowledge entries: learned patterns, historical context, preferences. |
+| `get_suppressions` | Active alert suppressions with expiry times and escalation settings. |
+
+### Write tools
+
+| Tool | Description |
+|------|-------------|
+| `report_status` | Report system status. Called periodically by edge processes. Upserts one entry per source; stale if TTL expires without refresh. |
+| `report_alert` | Report or resolve an alert. Captures diagnostics at detection time. Levels: `warning`, `critical`. Types: `threshold`, `structural`, `baseline`. |
+| `learn_pattern` | Record permanent knowledge from conversation. Tagged and searchable. Any agent writes; any agent reads. Set `learned_from` to your platform. |
+| `add_context` | Record time-limited knowledge (default 30 days). Use for events, temporary situations, or facts that lose relevance. |
+| `set_preference` | Set a portable presentation preference (e.g., `alert_verbosity`, `check_frequency`). Upserts by key + scope. |
+| `suppress_alert` | Suppress alerts by source/tags/metric. Time-limited with escalation override — critical alerts can break through. |
+
+### Data management tools
+
+| Tool | Description |
+|------|-------------|
+| `delete_entry` | Soft-delete entries (30-day trash). By ID, by source + type, or by source. Bulk deletes require `confirm=True` (dry-run by default). |
+| `restore_entry` | Restore a soft-deleted entry from trash. |
+| `get_deleted` | List all entries in trash with IDs for restore. |
+
+See the [Data Dictionary](docs/data-dictionary.md) for full schema documentation.
+
 ## Security
 
 The awareness store may contain personal information. Securing the endpoint is not optional. The current approach uses two layers:

@@ -107,16 +107,12 @@ Written by agents via `set_preference`. Keyed by `key` + `scope` (upserted). Por
 
 - **Upsert behavior:** `status` entries are upserted by `source`. `alert` entries by `source` + `alert_id`. `preference` entries by `key` + `scope`. Other types always insert new rows.
 - **Soft delete:** `delete_entry` sets the `deleted` timestamp. Entry remains in the database for 30 days, recoverable via `restore_entry`. Bulk deletes require `confirm=True` (dry-run by default).
-- **Auto-purge:** Expired entries (`expires < now`) and old soft-deleted entries (`deleted` > 30 days ago) are cleaned up by `_cleanup_expired`, which runs piggyback on store operations (reads and writes), debounced to at most every 10 seconds. There is no background scheduler — if the server receives no traffic, expired entries remain in the database until the next interaction. **Note:** auto-purge performs a hard `DELETE`, not a soft delete. Expired entries bypass the trash entirely — once past their expiry, they are permanently removed on the next cleanup pass.
+- **Auto-purge:** Expired entries (`expires < now`) and old soft-deleted entries (`deleted` > 30 days ago) are cleaned up by `_cleanup_expired`, which runs on a background thread triggered by write operations, debounced to at most every 10 seconds. Cleanup never blocks the request that triggers it — the debounce check is instant, and the actual DELETE runs on a separate thread with its own SQLite connection. Read operations do not trigger cleanup. If the server receives no write traffic, expired entries remain in the database until the next write. **Note:** auto-purge performs a hard `DELETE`, not a soft delete. Expired entries bypass the trash entirely — once past their expiry, they are permanently removed on the next cleanup pass.
 - **Staleness:** Status entries with `ttl_sec` are marked stale in the briefing if no update arrives within the TTL window. The entry itself is not deleted — it remains as the last known state.
 - **Hard deletes:** The API only performs soft deletes. If you delete the SQLite database file or run manual SQL `DELETE` statements, that data is gone permanently — there is no recovery mechanism beyond your own backups. Back up `awareness.db` regularly.
 
-### Known limitation: cleanup blocks requests
-
-The `_cleanup_expired` pass runs synchronously inside the write lock, meaning it blocks the request that triggers it. The 10-second debounce limits frequency, but when it fires, the caller waits for the `DELETE` to complete before getting their response. For a small single-user database this is negligible, but it will not scale. A future improvement should either move cleanup to a background task or filter expired entries at query time and purge asynchronously.
-
 ## SQLite configuration
 
 - **WAL mode** enabled for concurrent read/write safety
 - **Thread safety:** Write operations are protected by `threading.Lock` for async compatibility
-- **Cleanup debouncing:** `_cleanup_expired` runs at most every 10 seconds to avoid overhead on frequent reads
+- **Background cleanup:** `_cleanup_expired` spawns a daemon thread with its own SQLite connection, debounced to at most every 10 seconds, triggered only by writes
@@ -65,7 +65,7 @@ def is_suppressed(alert: Entry, suppressions: list[Entry]) -> bool:
         # Tag match — check against alert tags AND alert content (alert_id, message)
         # so that a suppression tagged "qbittorrent" matches an alert about qbittorrent
         # even if the alert's structural tags are ["infra", "nas", "docker"]
-        s_tags = s_data.get("tags")
+        s_tags = s.tags
         if s_tags and not _suppression_tags_match(s_tags, alert):
             continue
 

@@ -122,15 +122,19 @@ async def get_briefing() -> str:
     Returns a compact summary (~200 tokens all-clear, ~500 with issues).
     If attention_needed is true, mention the suggested_mention or compose
     your own from the source headlines. If false, nothing to report.
-    Pre-filtered through patterns and suppressions — no further processing needed."""
+    Pre-filtered through patterns and suppressions — no further processing needed.
+    This tool always returns structured JSON. If you receive an unstructured
+    error, the failure is in the transport or platform layer, not in awareness."""
     return json.dumps(generate_briefing(store), indent=2)
 
 
 @mcp.tool()
 async def get_alerts(source: str | None = None) -> str:
     """Get active alerts, optionally filtered by source.
     Drill-down from briefing — call when briefing shows attention_needed
-    and you want alert details. Returns full alert entries with diagnostics."""
+    and you want alert details. Returns full alert entries with diagnostics.
+    This tool always returns structured JSON. If you receive an unstructured
+    error, the failure is in the transport or platform layer, not in awareness."""
     alerts = store.get_active_alerts(source)
     return json.dumps([a.to_dict() for a in alerts], indent=2)
 
@@ -139,19 +143,36 @@ async def get_alerts(source: str | None = None) -> str:
 async def get_status(source: str) -> str:
     """Get full status for a specific source including metrics and inventory.
     Call when the briefing indicates issues with a source or user asks
-    about a specific system."""
+    about a specific system. This tool always returns structured JSON.
+    If you receive an unstructured error, the failure is in the transport
+    or platform layer, not in awareness."""
     entry = store.get_latest_status(source)
     if entry:
         return json.dumps(entry.to_dict(), indent=2)
     return json.dumps({"error": f"No status found for source: {source}"})
 
 
 @mcp.tool()
-async def get_knowledge() -> str:
-    """Get all knowledge entries: learned patterns, historical context, preferences.
+async def get_knowledge(
+    source: str | None = None,
+    tags: list[str] | None = None,
+    entry_type: str | None = None,
+) -> str:
+    """Get knowledge entries: learned patterns, historical context, preferences.
     Knowledge belongs to the system, not any specific agent. Call when you need
-    context about a system's normal behavior or operational patterns."""
-    entries = store.get_knowledge()
+    context about a system's normal behavior or operational patterns.
+    Filter by source, tags, and/or entry_type to reduce response size.
+    Valid entry_type values: 'pattern', 'context', 'preference'.
+    This tool always returns JSON with a status field or an entry list.
+    If you receive an unstructured error, the failure is in the transport
+    or platform layer, not in awareness."""
+    if entry_type:
+        et = EntryType(entry_type)
+        entries = store.get_entries(entry_type=et, source=source, tags=tags)
+    else:
+        entries = store.get_knowledge(tags=tags)
+        if source:
+            entries = [e for e in entries if e.source == source]
     return json.dumps([e.to_dict() for e in entries], indent=2)
 
 
@@ -232,7 +253,9 @@ async def learn_pattern(
     Any agent can write; any agent can read. Knowledge is portable across platforms.
     Use this when you learn something about a system's normal behavior —
     e.g., 'qBittorrent sometimes stopped for maintenance on Fridays'.
-    Do NOT use agent memory for this — use this tool so all agents benefit."""
+    Do NOT use agent memory for this — use this tool so all agents benefit.
+    Returns JSON with status and entry id. If you receive an unstructured
+    error, the failure is in the transport or platform layer, not in awareness."""
     now = now_iso()
     entry = Entry(
         id=make_id(),
@@ -283,7 +306,6 @@ async def suppress_alert(
             "suppress_level": level,
             "escalation_override": escalation_override,
             "reason": reason,
-            "tags": tags,
         },
     )
     store.add(entry)
@@ -351,7 +373,9 @@ async def delete_entry(
     For bulk deletes (by source), set confirm=True. Without it, a dry-run count
     is returned so the user can verify before committing.
     Use when the user says 'forget that', 'delete the pattern about X',
-    or 'remove everything about Y'. Entries auto-purge after 30 days."""
+    or 'remove everything about Y'. Entries auto-purge after 30 days.
+    Returns JSON with status and count. If you receive an unstructured
+    error, the failure is in the transport or platform layer, not in awareness."""
     if entry_id:
         trashed = store.soft_delete_by_id(entry_id)
         return json.dumps(

@@ -138,16 +138,31 @@ def _insert_entry(self, entry: Entry) -> None:
         )
 
     def _cleanup_expired(self) -> None:
-        """Delete entries whose expires timestamp is in the past (debounced)."""
-        if time.monotonic() - self._last_cleanup < self._cleanup_interval:
+        """Schedule cleanup of expired entries on a background thread (debounced).
+
+        Never blocks the calling request. The actual DELETE runs in a
+        separate thread with its own SQLite connection.
+        """
+        now = time.monotonic()
+        if now - self._last_cleanup < self._cleanup_interval:
             return
-        now = datetime.now(timezone.utc).isoformat()
-        self._conn.execute(
-            "DELETE FROM entries WHERE expires IS NOT NULL AND expires <= ?",
-            (now,),
-        )
-        self._conn.commit()
-        self._last_cleanup = time.monotonic()
+        self._last_cleanup = now  # claim the slot immediately to prevent races
+        thread = threading.Thread(target=self._do_cleanup, name="awareness-cleanup", daemon=True)
+        thread.start()
+
+    def _do_cleanup(self) -> None:
+        """Run the actual DELETE on a dedicated connection (background thread)."""
+        try:
+            conn = sqlite3.connect(str(self.path))
+            now = datetime.now(timezone.utc).isoformat()
+            conn.execute(
+                "DELETE FROM entries WHERE expires IS NOT NULL AND expires <= ?",
+                (now,),
+            )
+            conn.commit()
+            conn.close()
+        except Exception:
+            pass  # best-effort cleanup — next debounce window will retry
 
     # Base filter for all normal reads — excludes soft-deleted entries
     _ACTIVE = "deleted IS NULL"
@@ -272,7 +287,6 @@ def get_entries(
         source: str | None = None,
         tags: list[str] | None = None,
     ) -> list[Entry]:
-        self._cleanup_expired()
         clauses: list[str] = []
         params: list[str] = []
         if entry_type is not None:
@@ -305,7 +319,6 @@ def get_latest_status(self, source: str) -> Entry | None:
         return self._row_to_entry(row) if row else None
 
     def get_active_alerts(self, source: str | None = None) -> list[Entry]:
-        self._cleanup_expired()
         clauses = ["type = ?"]
         params: list[str] = [EntryType.ALERT.value]
         if source:
@@ -316,7 +329,6 @@ def get_active_alerts(self, source: str | None = None) -> list[Entry]:
         return [a for a in alerts if not a.data.get("resolved")]
 
     def get_active_suppressions(self, source: str | None = None) -> list[Entry]:
-        self._cleanup_expired()
         entries = self._query_entries("type = ?", (EntryType.SUPPRESSION.value,))
         if source:
             entries = [s for s in entries if s.source == source or s.source == ""]
@@ -331,7 +343,6 @@ def get_patterns(self, source: str | None = None) -> list[Entry]:
         return self._query_entries("type = ?", (EntryType.PATTERN.value,))
 
     def count_active_suppressions(self) -> int:
-        self._cleanup_expired()
         cur = self._conn.execute(
             f"SELECT COUNT(*) FROM entries WHERE type = ? AND {self._ACTIVE}",
             (EntryType.SUPPRESSION.value,),

@@ -71,7 +71,6 @@ def _make_suppression(
             "metric": metric,
             "suppress_level": suppress_level,
             "escalation_override": escalation_override,
-            "tags": tags,
         },
     )