Skip to content

fix(auth-profiles): bound agentOwner / agentOrg caches at 1024 entries#855

Merged
buremba merged 1 commit into
mainfrom
fix/auth-profiles-cache-sweep
May 18, 2026
Merged

fix(auth-profiles): bound agentOwner / agentOrg caches at 1024 entries#855
buremba merged 1 commit into
mainfrom
fix/auth-profiles-cache-sweep

Conversation

@buremba
Copy link
Copy Markdown
Member

@buremba buremba commented May 18, 2026

Summary

Hardening pass on the auth-profile resolver caches in AuthProfilesManager, motivated by #782's leak hunt.

Root cause

AuthProfilesManager has two short-lived caches:

private readonly agentOwnerCache = new Map<string, { ownerUserId; expiresAt }>();
private readonly agentOrgCache   = new Map<string, { organizationId; expiresAt }>();

Both use "lazy refresh on read" — when an entry is looked up and expired, the resolver is called again and set() updates the entry. But for agentIds that are looked up once and never re-queried, the entry stays in the Map forever. The 60-second TTL only refreshes values, not map size.

In practice this is small (~200 bytes per distinct agentId, hundreds-to-thousands of agents per day). Almost certainly not the cause of the 1 Gi OOM in #782 — but a real Map that should clean up after itself.

Fix

Adds a tiny cacheSet(map, key, value) helper that enforces a 1024-entry hard cap. When set() would push the size over the cap, evict the oldest insertion first (Maps iterate in insertion order, so map.keys().next().value + map.delete() is O(1)).

Both call sites updated to use the helper.

Reproducer (e2e gate per AGENTS.md)

$ bun test packages/server/src/gateway/auth/settings/__tests__/auth-profiles-manager-cache-cap.test.ts

Before fix (RED)

(fail) AuthProfilesManager: bounded auth-resolver caches > agentOwner cache stays bounded under many distinct one-shot lookups
 0 pass | 1 fail | 1 expect() calls

The test inserts 2048 distinct lookups and asserts cache.size <= 1024. Without the cap, size hits 2048.

After fix (GREEN)

 1 pass | 0 fail | 2 expect() calls
Ran 1 test across 1 file. [91.00ms]

Refs

Summary by CodeRabbit

  • Tests

    • Added test verification for bounded cache sizes.
  • Chores

    • Implemented fixed maximum entry count for internal caches with automatic eviction of oldest entries when limit exceeded.

Review Change Stack

Both caches were "lazy refresh on read" — they update an entry's
expiresAt when the same agentId is looked up again, but never delete
entries for agentIds that are never re-queried. Net: cache size
grows monotonically with distinct agentIds the gateway has ever
seen, bounded only by the pod's lifetime.

In practice the growth rate is small (~200 bytes per distinct
agentId, hundreds-to-thousands of agents per day) and almost
certainly NOT the cause of the 1 Gi OOM that prompted #782 — but
it's still a genuine bound-less Map that ideally cleans up after
itself.

Adds a tiny cacheSet helper with a 1024-entry cap that evicts the
oldest insertion (Maps iterate in insertion order, so size-1
peek-and-delete is O(1)). Test exercises 2048 distinct lookups and
asserts both caches stay <= 1024.

Refs #782 — hardening, not root-cause fix. SSE keepalive teardown
(#833) and in-memory pending-interactions removal (#834) remain
the most likely actual OOM fixes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e3b32b9c-5e68-4ceb-a9a6-85f449b90528

📥 Commits

Reviewing files that changed from the base of the PR and between be50b20 and 323a4e5.

📒 Files selected for processing (2)
  • packages/server/src/gateway/auth/settings/__tests__/auth-profiles-manager-cache-cap.test.ts
  • packages/server/src/gateway/auth/settings/auth-profiles-manager.ts

📝 Walkthrough

Walkthrough

This PR adds bounded caching to AuthProfilesManager's internal agent resolver caches. A new cacheSet helper enforces a maximum entry count with FIFO eviction, which both the owner and organization resolver methods now use. A test verifies the caches remain bounded when resolving many distinct agent IDs.

Changes

Bounded Agent Cache for AuthProfilesManager

Layer / File(s) Summary
Bounded cache mechanism
packages/server/src/gateway/auth/settings/auth-profiles-manager.ts
Introduces AGENT_CACHE_MAX_ENTRIES constant and a cacheSet helper that evicts the oldest Map entry when the cache reaches 1024 entries and a new key is being added.
Resolver integration
packages/server/src/gateway/auth/settings/auth-profiles-manager.ts
resolveAgentOwnerUserId and resolveAgentOrgId methods now write resolved values through the bounded cacheSet helper instead of direct Map.set.
Cache cap test
packages/server/src/gateway/auth/settings/__tests__/auth-profiles-manager-cache-cap.test.ts
New test suite instantiates AuthProfilesManager and resolves 2048 distinct agent IDs, asserting both internal resolver caches remain bounded at ≤ 1024 entries.

🎯 2 (Simple) | ⏱️ ~10 minutes

🐰 A bounded cache bounds so tight,
No endless growth in memory's night,
FIFO evicts with graceful ease,
Two resolvers cache memories please,
Tests confirm the cap holds strong and right! 🔒

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/auth-profiles-cache-sweep

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@buremba buremba merged commit 7b0c819 into main May 18, 2026
16 of 18 checks passed
@buremba buremba deleted the fix/auth-profiles-cache-sweep branch May 18, 2026 02:42
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants