fix(memory): add composite-score floor to recall ranker (#582)#585
Merged
Conversation
Automatic recall was unconditionally injecting the top-N candidates on every turn regardless of how weak the match was, so sparse or eval-seeded DBs polluted unrelated sessions with operational trivia. Three related fixes land together: 1. Apply a minimum composite-score floor in SQLiteMemoryRecallCoordinator so recall can legitimately return zero items when nothing clears the bar. Default floor is 10.0 (rejects single-lexical-only matches; accepts two-lexical, lexical+facet, or lexical+anchor matches). Power-user override: Session.Tuning.MinimumRecallCompositeScore (nullable, no value written in config by default). 2. Filter sentence-start capitalized stopwords out of the planner's anchor-hint regex. Words like "How", "Can", "Which", "The", "My" were being treated as semantic anchors worth +8 selector points plus +3.5 soft-scope points, pulling unrelated ops/eval docs into recall on any English question. Adds a planner-local AnchorHintStopWords set that is broader than the tokenizer's narrow stopword list so noun meanings of "can" and "will" remain lexically matchable. 3. Disable DomainAffinityWeight in DeterministicCandidateSelector. The concept is half-implemented (#584): Protocol.SessionId.ToMemoryDomain() unconditionally returns project:default, so affinity was a coin flip rather than a real "prefer same-project memories" signal, and the +5 boost made the floor unable to discriminate in-domain single-lexical collisions from legitimate two-lexical cross-domain matches. New test coverage in Netclaw.Actors.Tests: - MemoryRecallScenarioTests: 17-scenario theory driven by a 16-memory corpus that mirrors the #582 pollution shape (ops/eval band + two topical bands). Seeds into project:default so it actually reproduces the production coordinator's hard-scope normalization, not a cross-domain regime that silently measures different math. - DeterministicCandidateSelectorTests: four score-geometry cases documenting the lexical/facet/anchor gradient as ordered inequalities so the floor can be tuned against stable reference points. memory_retrieval_final log now reports filteredByFloor and appliedFloor so the floor's behavior is observable in daemon logs. Closes #582.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SQLiteMemoryRecallCoordinatorso automatic recall can return zero items when nothing is a strong enough match (default 10.0; nullableSession.Tuning.MinimumRecallCompositeScoreoverride for power users).How,Can,Which,The,My, etc.) out of the planner's anchor-hint regex — they were being treated as semantic anchors worth +8 selector points plus +3.5 soft-scope points, pulling unrelated ops docs into recall on any English question.DomainAffinityWeightinDeterministicCandidateSelector. The domain scoping concept is half-implemented (tracked in Memory domain scoping is half-implemented: all sessions collapse to project:default #584 —Protocol.SessionId.ToMemoryDomain()always returnsproject:default), and the +5 affinity boost was making the floor unable to discriminate in-domain single-lexical collisions from legitimate two-lexical cross-domain matches.New test coverage:
MemoryRecallScenarioTests(new): 17-scenario theory against a 16-memory corpus modeled on the Memory recall unconditionally injects top-N items with no score floor, polluting unrelated sessions #582 pollution shape (ops/eval trivia + two topical bands). Seeded inproject:defaultso it actually reproduces the production coordinator's hard-scope normalization.DeterministicCandidateSelectorTests: four score-geometry cases documenting the lexical/facet/anchor gradient as ordered inequalities.The
memory_retrieval_finallog now reportsfilteredByFloor={Count} appliedFloor={Floor:F1}so the floor's behavior is observable in daemon logs.Closes #582. Opens #584 for the domain-scoping follow-up.
Test plan
dotnet test src/Netclaw.Actors.Tests/Netclaw.Actors.Tests.csproj— 843 passed / 0 faileddotnet slopwatch analyze— 0 issues