perf(server): SIGUSR2 writes V8 heap snapshot by buremba · Pull Request #768 · lobu-ai/lobu

buremba · 2026-05-16T16:53:25Z

Why

Post-incident measurement (see lobu#767): with the queue healthy again, the app pod still grows from baseline toward the 1Gi limit. Without an inspector port we can't see what's allocating.

Sample taken from `summaries-app-lobu-app-77756ccdd7-dkh2l`:

T+	RSS	cgroup.memory.current
90 min	649 MB	690 MB
90.5 min	652 MB	693 MB

That's ~3 MB / 30s of slow growth. Pre-fix the same pod hit 1Gi in 70 min from the same baseline (driven by the schema-mismatch error pile-up). Post-fix it's slower but not zero — there's a residual leak.

What

`process.on('SIGUSR2', () => v8.writeHeapSnapshot('/tmp/...'))` so we can dump on demand:

```
POD=$(kubectl get pod -n summaries-prod -l app.kubernetes.io/component=api -o name | head -1)
kubectl exec -n summaries-prod $POD -- kill -USR2 1

wait for "snapshot written" log line, then:

kubectl cp summaries-prod/$(basename $POD):/tmp/.heapsnapshot ./lobu.heapsnapshot

open in Chrome DevTools → Memory → Load

```

Notes

`writeHeapSnapshot` is synchronous and blocks the event loop for seconds proportional to heap size. Only trigger manually when investigating.
SIGUSR2 is free on Node (SIGUSR1 is the one Node reserves for the inspector).
No security risk — triggering requires `kubectl exec` access.

Test plan

`make typecheck` clean
`make build-packages` builds
After merge: deploy, send SIGUSR2 once, confirm a .heapsnapshot lands in /tmp and is openable in DevTools.

Summary by CodeRabbit

Chores
- Server now supports on-demand heap snapshot diagnostics when enabled.
- Snapshots are written to a temporary location and logged on success or failure.
- Additional snapshot requests are ignored while a snapshot is in progress to avoid conflicts.

Adds a `process.on('SIGUSR2', ...)` handler that calls `v8.writeHeapSnapshot('/tmp/lobu-<pid>-<ts>.heapsnapshot')`. Lets us profile the leak that survived lobu#767: post-fix the app pod still grows ~3 MB/30s toward the 1Gi limit even with the queue healthy. Usage: POD=$(kubectl get pod -n summaries-prod \ -l app.kubernetes.io/component=api -o name | head -1) kubectl exec -n summaries-prod $POD -- kill -USR2 1 # wait for "snapshot written" log line, then copy out: kubectl cp summaries-prod/$(basename $POD):/tmp/<file>.heapsnapshot \ ./lobu.heapsnapshot # open in Chrome DevTools → Memory → Load Notes: * writeHeapSnapshot is synchronous and blocks the event loop for several seconds proportional to heap size — only trigger manually when investigating, never wire to an automated source. * SIGUSR2 is free on Node (SIGUSR1 is the one reserved for the inspector). * Snapshot goes to /tmp which is the container's writable tmpfs.

coderabbitai · 2026-05-16T16:53:39Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 02f009a9-bfab-4cb4-be54-59a51350676c

📥 Commits

Reviewing files that changed from the base of the PR and between 5ebb98b and 4614db3.

📒 Files selected for processing (1)

packages/server/src/server.ts

📝 Walkthrough

Walkthrough

Adds an environment-gated SIGUSR2 handler (when ALLOW_HEAP_SNAPSHOT=1) that imports Node's v8 and writes a heap snapshot to /tmp/lobu.heapsnapshot, with logging and a guard to prevent concurrent snapshots.

Changes

Heap Snapshot Signal Handler

Layer / File(s)	Summary
Import `v8` and SIGUSR2 handler `packages/server/src/server.ts`	Adds ESM import of `node:v8` and registers a `SIGUSR2` handler (gated by `ALLOW_HEAP_SNAPSHOT=1`) that writes a heap snapshot to the fixed path `/tmp/lobu.heapsnapshot`, prevents concurrent writes with an `inProgress` flag, and logs received/ignored/start/success/error events.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A whistle in the server night,
SIGUSR2 brings memory light,
I press my paw, the snapshot's cast,
/tmp/lobu.heapsnapshot holds the past 🥕📸

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a SIGUSR2 signal handler to write V8 heap snapshots, which is the primary focus of the PR.
Description check	✅ Passed	The description includes all required sections: a comprehensive 'Why' explaining the memory growth issue, a 'What' detailing the implementation with usage instructions, and a partially completed test plan. Two test items are checked (typecheck and build-packages), though post-merge deployment testing is noted as pending.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/heap-snapshot-on-sigusr2

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-05-16T16:54:55Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Three findings from pi on PR #768; all addressed: 1. **Secrets in snapshots** — gate the SIGUSR2 handler behind ALLOW_HEAP_SNAPSHOT=1. Default off in prod. Operator must explicitly opt the pod in, capture, then unset and roll. Workers run under the same UID (Dockerfile sets no separate USER), so on-disk snapshots aren't isolated from a same-UID exec path. 2. **No rate limit / cleanup** — single-flight via an in-progress flag; subsequent SIGUSR2s during a write are dropped with a log line. Use a single rolling path /tmp/lobu.heapsnapshot so a stuck-on flag can't fill the writable layer. 3. **Probe interaction** — documented in the handler comment: trigger needs cgroup-limit headroom (writeHeapSnapshot allocates ~heap size while running) and blocks /health/ready (DB SELECT 1). Caller-side; nothing programmatic to fix without an already-multi-replica deploy.

buremba · 2026-05-16T16:57:56Z

pi review — addressed

Three findings, all in 4614db3:

Secrets in snapshots — handler now gated on `ALLOW_HEAP_SNAPSHOT=1`. Default off in prod. Operator opts the pod in, captures, copies out, then unsets and rolls. Workers run as the same UID (Dockerfile sets no separate `USER`), so on-disk snapshots aren't isolated from same-UID processes — gating is the only mitigation that holds.
No rate limit / cleanup — single-flight (in-progress flag drops subsequent signals with a log line) + fixed rolling path `/tmp/lobu.heapsnapshot` (overwrite each time, no growth). Stuck-on flag can't fill the writable layer.
Probe interaction — documented in the handler comment. `writeHeapSnapshot` needs ~heap-size extra memory while running and blocks `/health/ready` (the new DB `SELECT 1`). Caller-side concern; only programmatic fix would be temporarily marking unready, which requires multi-replica to avoid 503-on-the-service. Today's prod is 1-replica, so the operator playbook is: bump memory headroom / scale to 2 replicas first, then dump.

codex-approver

Auto-approved: Codex left a 👍 reaction (no suggestions).

codex-approver Bot approved these changes May 16, 2026

View reviewed changes

buremba merged commit e5c93a3 into main May 16, 2026
23 of 24 checks passed

buremba deleted the perf/heap-snapshot-on-sigusr2 branch May 16, 2026 17:36

buremba mentioned this pull request May 16, 2026

chore(main): release lobu 7.1.0 #724

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(server): SIGUSR2 writes V8 heap snapshot#768

perf(server): SIGUSR2 writes V8 heap snapshot#768
buremba merged 2 commits into
mainfrom
perf/heap-snapshot-on-sigusr2

buremba commented May 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

codecov-commenter commented May 16, 2026

Uh oh!

buremba commented May 16, 2026

Uh oh!

codex-approver Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

buremba commented May 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

wait for "snapshot written" log line, then:

open in Chrome DevTools → Memory → Load

Notes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

codecov-commenter commented May 16, 2026

Codecov Report

Uh oh!

buremba commented May 16, 2026

pi review — addressed

Uh oh!

codex-approver Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

buremba commented May 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 16, 2026 •

edited

Loading