Skip to content

refactor(mcp): make canonical/LLM-facing tool name boundary type-enforced#1153

Merged
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:mcp-name-drift-cleanup
May 22, 2026
Merged

refactor(mcp): make canonical/LLM-facing tool name boundary type-enforced#1153
Aaronontheweb merged 3 commits into
netclaw-dev:devfrom
Aaronontheweb:mcp-name-drift-cleanup

Conversation

@Aaronontheweb

Copy link
Copy Markdown
Collaborator

Summary

PR #1134 introduced an Anthropic-safe sanitized alias (server__tool) for MCP tool names but stored both forms as bare strings. PR #1147 fixed one acute symptom (approval grant lookup mismatch) but the broader drift remained — audit log emitted sanitized while approval grants stored canonical, the LLM saw canonical in search_tools output but emitted sanitized in tool_use, and several touched-but-correct sites were one refactor away from re-introducing the same class of bug.

This PR moves the canonical-vs-LLM-facing distinction to a type boundary so future drift becomes a compile error.

Three commits, each independently reviewable:

  1. FoundationLlmFacingToolName value object with FromCanonical factory that sanitizes /__ and asserts the result satisfies Anthropic's ^[a-zA-Z0-9_-]{1,128}$. INetclawTool.LlmFacingName is now a required member; the source generator emits it for first-party tools (failing at type-init for any future tool whose name can't be made LLM-safe); McpToolAdapter replaces its SanitizedName string with the value-typed form. No behavior change — every call site that previously read SanitizedName now reads LlmFacingName.Value and gets the same string.

  2. LLM-actor boundary seamLlmSessionActor.HandleToolCallResponse rewrites incoming FunctionCallContent.Name from the sanitized alias to canonical before any internal consumer reads it. Persisted history carries canonical. ChatMessageConverter.ToAiMessages gains an optional toolNameResolver Func threaded from ContextAssemblyInput, which the session actor wires to _toolRegistry.ToLlmFacingName so the outbound LLM request gets the sanitized form back on the wire. Internal re-drive paths (RedriveToolBatchForApproval) leave the resolver null and round-trip canonical, exactly what their two-form registry lookup wants. Result: audit log, ToolCallOutput event stream, duplicate-tool nudge, and memory ledger all now record notion/notion-create-pages instead of notion__notion-create-pages — matching the Slack approval prompt and the persisted approval grant.

  3. LLM-facing surface alignmentsearch_tools, load_tool, and discovered-tool activation logging all surfaced canonical names while the LLM emits sanitized in tool_use. The model saw the same tool under two distinct strings within a single turn. Now those surfaces emit LlmFacingName consistently; TryActivateDiscoveredTool still uses the registry's two-form lookup (no behavior change for older LLM strings) but caches and logs under the canonical name. Operator-facing surfaces from commit (2) and LLM-facing surfaces from this commit are now consistent within their respective audiences.

Test plan

  • dotnet build — clean
  • dotnet test src/Netclaw.Actors.Tests — 1956 tests pass
  • New unit tests:
    • LlmFacingToolNameTests — sanitization, idempotence, regex enforcement, length cap
    • ToolRegistryTestsToCanonicalName / ToLlmFacingName round-trip for MCP and first-party tools
    • ChatMessageConverterTests — resolver applied / not applied paths
    • McpToolAdapterTestsLlmFacingName replaces the prior SanitizedName getter
  • dotnet slopwatch analyze — no new violations
  • ./scripts/Add-FileHeaders.ps1 -Verify — all files have headers

Out of scope (potential further follow-up)

  • Configuration / CLI surfaces (ToolApprovalConfig.TryGetExplicitMode, netclaw approvals revoke --tool, ToolAudienceProfilesDoctorCheck) still match on exact strings. An operator who writes the sanitized alias into a config key would silently get no effect. Worth either teaching those surfaces to normalize via the registry, or adding a doctor check / schema-fix resolver to catch the misuse. Independent of this PR.

PR netclaw-dev#1134 created two name forms for MCP tools (canonical `server/tool`
for everything internal, sanitized `server__tool` for the LLM wire)
but stored both as bare strings. The follow-on bug (netclaw-dev#1147) showed
how easy it is to grab the wrong one — the approval lookup keyed on
sanitized while the grant store keyed on canonical, no compiler signal.

Make the LLM-facing form a type so misuse is a build error, not a
runtime regression:

- New `LlmFacingToolName` value object in Netclaw.Tools.Abstractions.
  `FromCanonical(string)` sanitizes `/` -> `__` and asserts the result
  satisfies Anthropic's tool-name regex; throws for anything that
  can't be expressed (spaces, dots, colons, >128 chars). No implicit
  string conversion — per the constitution rule on value objects.
- `INetclawTool.LlmFacingName` becomes a required member. Renames
  the documented intent of `Name` to "canonical, operator-facing"
  (the prior doc lied for MCP tools).
- NetclawToolGenerator emits a static readonly `_generatedLlmFacingName`
  via `FromCanonical`, so first-party tools with malformed names fail
  at type-init (i.e. at registry warm-up on daemon start) rather than
  at the Anthropic API boundary on the user's first call.
- McpToolAdapter swaps its `SanitizedName` string for `LlmFacingName`.
  `SanitizedAIFunction` is constructed from `LlmFacingName.Value`.
- ToolRegistry's inner `AIToolAdapter` and the two hand-written test
  fakes (FakeNetclawTool, RecordingContextTool) implement the new
  member; the registry's two-form `FindRegistration` now matches
  against `LlmFacingName.Value`.

No behavioral changes — every call site that previously read
`SanitizedName` now reads `LlmFacingName.Value` and gets the same
string. This is foundation for the follow-up cleanup that pushes the
canonical-vs-LLM-facing distinction out to the actual boundary
(LlmSessionActor ingress/egress) instead of having every consumer
guess which form to use.

586 Tools/Approval/Mcp tests pass; slopwatch clean; headers verified.
Adds the inbound/outbound seam that pushes the canonical-vs-LLM-facing
distinction out to the actor boundary, so internal consumers
(audit log, ToolCallOutput events, duplicate-tool fingerprint,
persisted assistant message, memory ledger) stop guessing which form
to use.

Inbound (LlmSessionActor.HandleToolCallResponse): rewrites every
FunctionCallContent in the incoming AiChatMessage and the toolCalls
list from the LLM-facing alias (`server__tool`) to the canonical
name (`server/tool`) before any consumer reads `tc.Name`. The
persisted SerializableChatMessage that flows into history therefore
carries the canonical form. Tools that don't resolve (unknown name)
pass through unchanged and the executor rejects them downstream as
it did before.

Outbound (SessionMessageAssembler.Assemble -> ChatMessageConverter.
ToAiMessages): an optional `toolNameResolver` Func is threaded into
the converter so it maps the persisted canonical name back to the
LLM-facing alias when reconstructing FunctionCallContent for the
wire. Without this hop Anthropic returns 400 on the `/` character.
The session actor passes `_toolRegistry.ToLlmFacingName`; tests and
internal re-drive paths leave the resolver null and get the persisted
name back unchanged.

Two new ToolRegistry helpers — `ToCanonicalName` and
`ToLlmFacingName` — wrap the registry's existing two-form lookup.
First-party tool names round-trip as identity in both directions
(name is already LLM-safe by construction).

Tests:
- ToolRegistry: round-trip and idempotence for MCP and first-party.
- ChatMessageConverter: resolver applied / not applied paths.
- Full Actors test suite (1956 tests) still green.

Behavioral consequence: for MCP tools the audit log, ToolCallOutput
event stream, duplicate-tool nudge, and memory ledger now record
`notion/notion-create-pages` instead of `notion__notion-create-pages`
— matching what the operator sees in the Slack approval prompt and
the persisted approval grant. No change for first-party tools.
search_tools / load_tool / discovered-tool activation all surfaced
the canonical name (server/tool) while the LLM emits and receives
the sanitized alias (server__tool) in tool definitions and tool_use
callbacks. The model saw the same tool under two distinct strings
within a single turn — confusing for weaker models, and a cross-
reference hazard between the system-prompt index, the search_tools
suggestion list, the load_tool result, and the tool definitions
Anthropic actually returns.

- SearchToolsTool: emit `tool.LlmFacingName` in both the exact-match
  list and the "did you mean" suggestion list.
- LoadToolTool: return `tool.LlmFacingName.Value` as the confirmation
  string the model echoes back.
- LlmSessionActor.TryActivateDiscoveredTool: resolve via the
  registry's two-form lookup (no change in behavior — the registry
  already accepted either form) and then cache + log under the
  canonical name so per-tool lease accounting and audit lines are
  unambiguous regardless of which form the LLM sent.

Operator-facing surfaces (audit, ToolCallOutput, Slack approval
prompt) are still canonical via the inbound seam from the previous
commit. The LLM and the operator now have separate consistent
identifiers, one form per audience.

Test expectations updated for SearchToolsToolTests and
McpToolAudienceGrantsTests where they pinned the prior (canonical)
LLM-facing emission. 1956 Actors tests pass; slopwatch clean.
@Aaronontheweb Aaronontheweb force-pushed the mcp-name-drift-cleanup branch from 15ac306 to 7c4f201 Compare May 22, 2026 22:47
@Aaronontheweb Aaronontheweb merged commit 36203cc into netclaw-dev:dev May 22, 2026
14 checks passed
Aaronontheweb added a commit that referenced this pull request May 23, 2026
)

PR #1153 split the canonical/LLM-facing tool name distinction along
the runtime audience boundary, but three operator-facing surfaces
still matched the exact stored string and would silently ignore a
mis-typed form:

- `ToolApprovalConfig.TryGetExplicitMode` (runtime override lookup):
  if the operator pasted `notion__create-pages` into ToolOverrides
  (the LLM-facing alias seen in audit logs or transcripts), the
  runtime queried with the canonical `notion/create-pages`, missed
  the entry, and the tool fell through to DefaultMode — a silent
  security misconfiguration.
- `netclaw approvals revoke --tool <name>` (and `list --tool`, and
  `trust-verb --tool`): the on-disk store is keyed by canonical
  names. A `--tool notion__create-pages` flag matched no grant and
  the CLI reported "no approvals found" despite the grant being
  there.
- `ToolAudienceProfilesDoctorCheck`: the missing-approval-default
  scan only recognized per-tool overrides with the `{server}/`
  prefix. An operator with `notion__create-pages` overrides was
  warned to add an approval default they had already configured —
  except the entries themselves were dead (see TryGetExplicitMode
  above).

Fix: introduce `LlmFacingToolName.TryReverseSanitizedToCanonical` —
a pure-string heuristic that maps `{server}__{tool}` back to
`{server}/{tool}`. By convention first-party tool names never
contain `__`, so the heuristic is unambiguous in practice; a future
first-party tool that legitimately needs `__` would require the
helper to grow a registry-aware overload (documented in remarks).

- ToolApprovalConfig.TryGetExplicitMode inlines the equivalent
  reverse-resolution (no Tools.Abstractions dep from Configuration)
  so the runtime override lookup tries both forms.
- ApprovalsCommand normalizes the `--tool` flag through
  TryReverseSanitizedToCanonical for all four call sites: list
  filter, revoke single-pattern filter, revoke-all store call, and
  trust-verb persistence. The trust-verb path persists under the
  canonical name so the runtime gate finds the grant.
- ToolAudienceProfilesDoctorCheck accepts both `{server}/` and
  `{server}__` prefixes when detecting per-tool overrides.

Tests added:
- `LlmFacingToolName.TryReverseSanitizedToCanonical_*` — sanitized
  → canonical mapping, identity for first-party names and already-
  canonical inputs, null for malformed `__foo` / `foo__`.
- `TryGetExplicitMode_finds_override_written_with_LlmFacing_key` —
  the runtime lookup honors the alias key.
- `TryGetExplicitMode_canonical_override_still_wins_when_both_forms_present`
  — exact match always takes precedence.
- `Revoke_all_resolves_LlmFacing_alias_to_canonical_stored_key` and
  `List_filter_by_LlmFacing_alias_matches_canonical_stored_key`.
- `TrustVerb_persists_under_canonical_name_when_passed_LlmFacing_alias`
  — operators get a grant that the runtime gate can actually use.
- `McpServerWithPerToolOverrideUnderLlmFacingKey_DoesNotTriggerWarning`
  — doctor matches the same shape the runtime accepts.

1968 Actors + 743 Cli + 333 Configuration tests pass; slopwatch
clean; headers verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant