Skip to content

fix: resolve multiple memory leaks causing unbounded growth#16695

Open
binarydoubling wants to merge 6 commits intoanomalyco:devfrom
binarydoubling:fix/memory-leaks
Open

fix: resolve multiple memory leaks causing unbounded growth#16695
binarydoubling wants to merge 6 commits intoanomalyco:devfrom
binarydoubling:fix/memory-leaks

Conversation

@binarydoubling
Copy link

@binarydoubling binarydoubling commented Mar 9, 2026

Issue for this PR

Closes #16697

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Fixes multiple sources of unbounded memory growth across the TUI, core subsystems, and server-side components. These leaks compound during extended usage causing RAM to climb monotonically.

TUI event listener leaks:

  • app.tsx: 6 sdk.event.on() calls had no cleanup — listeners accumulated on re-render. Now collected and unsubscribed via onCleanup.
  • session/index.tsx: message.part.updated listener accumulated every time a session was opened. Added onCleanup.
  • prompt/index.tsx: PromptAppend listener leaked on prompt remount. Added onCleanup.
  • theme.tsx: process.on("SIGUSR2") handler never removed. Now cleaned up via onCleanup.
  • keybind.tsx: Leader mode timeout never cleared on provider unmount. Added onCleanup.
  • sdk.tsx: Event queue not cleared on context cleanup. Now zeroed out.

Core memory leaks:

  • sdk.tsx: Event queue grew unbounded — capped at 1000 with oldest-event eviction.
  • sync.tsx: Message trimming only removed 1 message at a time and leaked associated parts. Now removes all excess and cleans up parts. Session switching now frees previous session's data.
  • lsp/client.ts: Diagnostics map grew without bound — added 200-file cap with LRU eviction, early return on empty diagnostics, and clear on shutdown.
  • session/prompt.ts: Pending callbacks not rejected on cancel — now rejected to prevent promise/closure leaks.
  • project/state.ts: Warning timeout in disposal never cleared — now cleared after disposal completes.
  • provider/models.ts: Module-level setInterval for model refresh never cleared — now cleared on process exit.
  • share/share-next.ts: Added dispose() to clear pending sync timeouts.

Collection cleanup:

  • bus/index.ts: Empty subscription arrays left as tombstones after unsubscribe — now deleted.
  • util/rpc.ts: Empty listener Sets left in map after handler removal — now deleted.
  • pty/index.ts: Session buffer not cleared on removal — now zeroed to free memory immediately.

How did you verify your code works?

Ran the TUI against a large monorepo, switched between sessions repeatedly, and monitored RSS over ~30 minutes. Memory stabilized instead of growing monotonically. Typechecks pass cleanly.

Screenshots / recordings

N/A — not a UI change

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

- Cap SDK event queue to prevent unbounded growth during high event throughput
- Clean up message parts when trimming excess messages in sync store
- Evict previous session data from memory when switching sessions
- Bound LSP diagnostics map with LRU eviction and clear on shutdown
- Reject pending callbacks on session cancel to prevent promise/closure leaks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions bot added the needs:compliance This means the issue will auto-close after 2 hours. label Mar 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

The following comment was made by an LLM, it may be inaccurate:

Based on my search, I found several related PRs that address memory leaks:

Potentially Related PRs:

  1. PR fix(opencode): unbounded memory growth during active usage #16346 - fix(opencode): unbounded memory growth during active usage

    • Directly addresses unbounded memory growth similar to this PR
  2. PR fix(lsp): ignore stderr on LSP server spawn to prevent unbounded memory growth #16241 - fix(lsp): ignore stderr on LSP server spawn to prevent unbounded memory growth

    • Addresses memory growth in LSP specifically
  3. PR fix: add LRU eviction to LSP client file and diagnostics tracking #7050 - fix: add LRU eviction to LSP client file and diagnostics tracking

    • Implements LRU eviction for LSP diagnostics (similar to the diagnostics map bounding in this PR)
  4. PR feat: add startup cleanup and maintenance system #16628 - feat: add startup cleanup and maintenance system

    • Related to cleanup/maintenance similar to the cleanup work in this PR
  5. PR fix: resolve multiple memory leaks causing unbounded RAM growth ( Now passing: && Fixed failing tests for windows ) #13514 - fix: resolve multiple memory leaks causing unbounded RAM growth

    • Very similar title suggesting it addresses related memory leak issues
  6. PR fix: add idle-timeout Instance disposal for serve mode memory #16616 - fix: add idle-timeout Instance disposal for serve mode memory

    • Related to instance disposal and memory management

These PRs may be addressing overlapping memory leak issues. You should verify if PR #16695 duplicates work from any of these, particularly #16346, #16241, #7050, or #13514.

@github-actions github-actions bot removed the needs:compliance This means the issue will auto-close after 2 hours. label Mar 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

- Add onCleanup for all sdk.event.on() listeners in app.tsx, session route,
  and prompt component to prevent listener accumulation on re-render
- Add onCleanup for process SIGUSR2 handler in theme provider
- Add onCleanup for leader timeout in keybind provider
- Clear event queue on SDK context cleanup
- Remove empty subscription arrays in Bus and RPC listener maps
- Clear warning timeout in state disposal after completion
- Clear models refresh interval on process exit
- Add dispose() to ShareNext to clear pending sync timeouts
- Clear PTY session buffer on removal to free memory immediately

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Addresses the remaining memory leaks identified in anomalyco#16697 by
consolidating the best fixes from 23+ open community PRs into
a single coherent changeset.

Fixes consolidated from PRs: anomalyco#16695, anomalyco#16346, anomalyco#14650, anomalyco#15646,
anomalyco#13186, anomalyco#10392, anomalyco#7914, anomalyco#9145, anomalyco#9146, anomalyco#7049, anomalyco#16616, anomalyco#16241

- Plugin subscriber stacking: unsub before re-subscribing in init()
- Subagent deallocation: Session.remove() after task completion
- SSE stream cleanup: centralized cleanup with done guard (3 endpoints)
- Compaction data trimming: clear output/attachments on prune
- Process exit cleanup: Instance.disposeAll() with 5s timeout
- Serve cmd: graceful shutdown instead of blocking forever
- Bash tool: ring buffer with 10MB cap instead of O(n²) concat
- LSP index teardown: clear clients/broken/spawning on dispose
- LSP open-files cap: evict oldest when >1000 tracked files
- Format subscription: store and cleanup unsub handle
- Permission/Question clearSession: reject pending on session delete
- Session.remove() cleanup chain: FileTime, Permission, Question
- ShareNext subscription cleanup: store unsub handles, cleanup on dispose
- OAuth transport: close existing before replacing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@binarydoubling
Copy link
Author

Update: Consolidated fixes from 23+ community PRs

I've reviewed every open memory leak PR referenced in #16697 and consolidated the best version of each fix into this branch. The latest commit (b395d19) adds 17 more files on top of the original 15, bringing the total to 30 files changed.

New fixes added (from community PRs)

Fix Source PRs Impact
Plugin subscriber stackingBus.subscribeAll() unsub before re-init #16346, #7914 Prevents permanent wildcard listener accumulation
Subagent deallocationSession.remove() after task completion #14650 GB-scale savings in long sessions with many subagents
SSE stream cleanup — centralized cleanup() + done guard on 3 endpoints #15646 Prevents leaked handlers/intervals on ungraceful disconnect
Compaction data trimming — clear output and attachments on prune #14650, #7049 Frees 200KB+ per compacted tool call
Process exit cleanupInstance.disposeAll() with 5s timeout in finally block #13186, #15646 Prevents zombie LSP/MCP/PTY processes
Serve command — graceful shutdown instead of await new Promise(() => {}) #13186 Fixes unreachable cleanup code in serve mode
Bash tool ring buffer — chunks array with 10MB cap instead of output += chunk #13186, #7914 Eliminates O(n²) string growth for long commands
LSP index teardown — clear clients/broken/spawning on dispose #16346 Prevents stale client references after teardown
LSP open-files cap — evict oldest when >1000 tracked files #14650 Prevents unbounded files object growth
Format subscription cleanup — store and cleanup unsub handle on re-init #10392, #7914 Prevents listener stacking
Permission clearSession() — reject pending promises on session delete #14650, #10392 Prevents leaked promises holding closures
Question clearSession() — reject pending promises on session delete #10392, #13186 Same as above
Session.remove() cleanup chain — calls FileTime, Permission, Question cleanup #10392 Per-session state freed on deletion
ShareNext subscription cleanup — store unsub handles, cleanup on dispose/re-init #10392, #7914 Prevents 4 Bus listeners stacking per init cycle
OAuth transport close — close existing transport before replacing #9145 Prevents abandoned HTTP transports accumulating

What's NOT in this PR (and why)

Some fixes from the community PRs were intentionally excluded:

Verification

  • tsgo --noEmit passes cleanly across all 19 packages
  • Pre-commit typecheck hook passed on push

binarydoubling and others added 3 commits March 9, 2026 12:23
Port robust process exit detection from PR anomalyco#15757 to fix zombie/stuck
child processes in containers where Bun fails to deliver exit events.

- Add polling watchdog to bash tool and Process.spawn that detects
  process exit via kill(pid, 0) when event-loop events are missed
- Add process registry (active map) with stale/reap exports for
  server-level watchdog to detect and clean up stuck bash processes
- Improve Shell.killTree with alive() helper and proper SIGKILL
  escalation after SIGTERM timeout
- Add session-level watchdog interval in prompt loop to periodically
  reap stale bash processes

Based on the work in anomalyco#15757.

Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Complete the port of PR anomalyco#15757 with remaining pieces:

- Add stdio end event redundancy as third fallback for exit detection
  (fires when pipe file descriptors close, independent of exit events)
- Add diagnostic log.info calls at spawn, abort, timeout, and each
  exit detection path for debugging container issues
- Add comprehensive tests: defensive patterns, polling watchdog
  isolation, Shell.killTree, server-level watchdog (stale/reap),
  stdio end events, and Process.spawn defensive patterns
- Skip truncation tests on Windows (matching upstream)

Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
On Windows, stdio pipe end events can fire before the exit event
populates proc.exitCode, causing it to be null in the result metadata.
Fall back to 0 (or 1 if signalCode is set) when exitCode is null,
matching the same pattern used in Process.spawn.

Co-Authored-By: Nacho F. Lizaur <NachoFLizaur@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple memory leaks cause unbounded RAM growth during extended TUI usage

1 participant