feat: supervisor session memory monitoring with respawn detection (t264) #1040

marcusquinn · 2026-02-11T01:19:24Z

Summary

Add per-process memory footprint measurement using macOS footprint(1) / Linux /proc/PID/status
Add Phase 11 to supervisor pulse cycle: detects when the parent OpenCode session exceeds memory threshold and writes a respawn marker
Add supervisor-helper.sh mem-check CLI command for manual inspection
Document in AGENTS.md

Problem

OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never returned to the OS. A single session observed at 25GB+ phys_footprint (30GB total). Workers are short-lived and already cleaned up by Phase 4, but the long-running supervisor session needs monitoring.

Design Decisions

Workers are NOT killed for memory — they may legitimately need more for large tasks, and Phase 4 already handles dead/hung workers
Concurrency is NOT throttled by worker memory — existing system-wide memory_pressure detection already handles this
Focus is solely on the supervisor session — the one long-lived process that accumulates WebKit malloc over hours
Respawn marker file (respawn-recommended) enables external tooling to detect and act on the recommendation
SUPERVISOR_SELF_MEM_LIMIT env var (default 8192MB) controls the threshold

Testing

get_process_footprint_mb verified against known 29GB OpenCode process (returned 30406MB, matching Activity Monitor)
mem-check command tested — correctly detects parent OpenCode session, shows footprint and uptime
ShellCheck clean (zero new warnings)

Closes t264

Summary by CodeRabbit

Release Notes

New Features
- Added automatic memory monitoring for long-running sessions to detect when memory usage exceeds thresholds
- New memory inspection capability to check session memory footprint and receive respawn recommendations
- Configurable memory limit threshold for supervisor sessions
Documentation
- Added documentation explaining session memory behavior, monitoring mechanisms, and how to use memory inspection utilities

… (t264) OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never returned to the OS. A single session can grow to 25GB+ phys_footprint. Workers are short-lived and already cleaned up by Phase 4, but the long-running supervisor session needs monitoring. Changes: - get_process_footprint_mb(): cross-platform function using macOS footprint(1) or Linux /proc/PID/status to get true physical memory footprint - check_supervisor_memory(): walks process tree to find parent OpenCode session, writes respawn-recommended marker when threshold exceeded - Phase 11 in cmd_pulse(): checks supervisor memory each pulse cycle - cmd_mem_check: CLI command to inspect supervisor session memory - Respawn marker file for external tooling to detect and act on Design: concurrency is already managed by existing system memory pressure detection. Workers are already short-lived. This focuses solely on the supervisor session that accumulates memory over hours of operation. Discovered via OpenCode issue anomalyco/opencode#13041.

gemini-code-assist · 2026-02-11T01:19:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

coderabbitai · 2026-02-11T01:19:42Z

Walkthrough

This PR adds session memory monitoring capabilities to the supervisor helper script, introducing a new mem-check command that computes the supervisor process footprint, compares it against a configurable threshold (SUPERVISOR_SELF_MEM_LIMIT, default 8192 MB), and logs respawn recommendations when thresholds are exceeded.

Changes

Cohort / File(s)	Summary
Documentation - Memory Monitoring `.agents/AGENTS.md`	Adds documentation describing session memory monitoring (t264) for long-running OpenCode/Bun sessions, Phase 11 pulse cycle memory checks, respawn marker behavior, and inspection via supervisor-helper.sh mem-check.
CLI Extension - Memory Check Command `.agents/scripts/supervisor-helper.sh`	Adds new `cmd_mem_check()` function to monitor supervisor memory footprint against threshold, introduces `SUPERVISOR_SELF_MEM_LIMIT` environment variable (default 8192 MB), and extends CLI dispatcher to recognize and route `mem-check` subcommand.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fix: supervisor integration testing fixes (t128.7) #384: Modifies supervisor-helper.sh's CLI dispatcher and command routing surface, extending public command patterns.
perf: supervisor auth caching and auto model tier routing #794: Adds new commands and functions to supervisor-helper.sh while extending the CLI dispatcher infrastructure.

Poem

📊 When sessions grow fat with malloc's delight,
A helper arrives in the dead of the night,
Checks footprints with care, knows the mem-check command,
Respawns when needed—automation so grand! 🚀

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and specifically describes the main change: adding supervisor session memory monitoring with respawn detection, matching the core functionality added in both documentation and the helper script.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/t264-supervisor-auto-respawn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-02-11T01:20:01Z

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 46 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 11 01:19:56 UTC 2026: Code review monitoring started
Wed Feb 11 01:19:57 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 46

📈 Current Quality Metrics

BUGS: 0
CODE SMELLS: 46
VULNERABILITIES: 0

Generated on: Wed Feb 11 01:20:00 UTC 2026

Generated by AI DevOps Framework Code Review Monitoring

sonarqubecloud · 2026-02-11T01:20:31Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @.agents/scripts/supervisor-helper.sh:
- Around line 820-846: The parent-process detection using ps -o comm= (assigned
to parent_cmd) misses wrapped launches like "bun opencode" or "node opencode";
update the assignment of parent_cmd to prefer the full command line (ps -o
args=) with a fallback to ps -o comm= so pattern checks like [[ "$parent_cmd" ==
*"opencode"* ]] succeed; apply the same change in the other occurrence around
the block that uses check_pid/depth (~4876) so both parent_cmd usages detect
wrapped Bun/Node invocations and still fall back safely when args isn’t
available.
- Around line 726-752: The fallback parsing of the Footprint line (variables
fp_line, fp_val, fp_unit) only extracts integer digits with grep -oE '[0-9]+'
which truncates decimal values (e.g., "1.5 GB"); update the fallback to extract
the numeric value including decimals and perform unit conversion using a
float-capable tool like awk (or bc) instead of bash integer arithmetic so values
like 1.5 GB are converted to ~1536 MB; specifically, change the extraction of
fp_val to capture decimals (e.g., via awk or grep -oE '[0-9]+(\.[0-9]+)?'),
compute MB by multiplying/dividing as needed in awk (GB*1024, MB*1, KB/1024) and
echo the integer MB result (rounded or floored consistently) while keeping the
existing fp_line, fp_unit and return behavior.

🧹 Nitpick comments (1)

.agents/AGENTS.md (1)
262-263: Consider adding cross-reference to detailed documentation.

While the documentation references supervisor-helper.sh mem-check, it would be helpful to confirm that the "Full docs" reference at line 264 includes detailed coverage of the mem-check command, SUPERVISOR_SELF_MEM_LIMIT configuration, and the Phase 11 behavior. Additionally, consider briefly explaining what "Phase 11" means for readers unfamiliar with the pulse cycle phases.
💡 Optional enhancement for clarity
-**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory.
+**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory. See supervisor docs for Phase 11 details.
Alternatively, verify that the "Full docs" reference at line 264 already covers these details comprehensively.

coderabbitai · 2026-02-11T01:25:43Z

.agents/scripts/supervisor-helper.sh

+    if [[ "$(uname)" == "Darwin" ]]; then
+        # macOS: footprint --pid gives phys_footprint (dirty + swapped + compressed)
+        # This matches what Activity Monitor displays
+        local fp_output
+        fp_output=$(footprint --pid "$pid" -f bytes --noCategories 2>/dev/null || echo "")
+        if [[ -n "$fp_output" ]]; then
+            local fp_bytes
+            fp_bytes=$(echo "$fp_output" | grep -oE 'phys_footprint: [0-9]+' | grep -oE '[0-9]+' || echo "")
+            if [[ -n "$fp_bytes" && "$fp_bytes" -gt 0 ]] 2>/dev/null; then
+                echo "$((fp_bytes / 1048576))"
+                return 0
+            fi
+            # Fallback: parse the Footprint line (e.g., "Footprint: 30 GB" or "Footprint: 500 MB")
+            local fp_line
+            fp_line=$(echo "$fp_output" | grep -E 'Footprint:' | head -1)
+            if [[ -n "$fp_line" ]]; then
+                local fp_val fp_unit
+                fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1)
+                fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1)
+                case "$fp_unit" in
+                    GB) echo "$((fp_val * 1024))" ;;
+                    MB) echo "$fp_val" ;;
+                    KB) echo "$((fp_val / 1024))" ;;
+                    *) echo "0" ;;
+                esac
+                return 0
+            fi


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Read the supervisor-helper.sh file around lines 726-752 to verify the code head -755 .agents/scripts/supervisor-helper.sh | tail -35 | cat -n

Repository: marcusquinn/aidevops

Length of output: 1849

🏁 Script executed:

# Also get a wider context to see line numbers sed -n '720,755p' .agents/scripts/supervisor-helper.sh | cat -n

Repository: marcusquinn/aidevops

Length of output: 1884

🏁 Script executed:

# Check the ShellCheck compliance for the file to see if there are any existing issues if command -v shellcheck >/dev/null 2>&1; then shellcheck .agents/scripts/supervisor-helper.sh 2>&1 | head -50 else echo "ShellCheck not available in sandbox" fi

Repository: marcusquinn/aidevops

Length of output: 2223

Decimal values in fallback footprint parsing are truncated to integers.

Line 24 uses grep -oE '[0-9]+' which captures only integer digits. For output like Footprint: 1.5 GB, this extracts 1 and reports 1024 MB instead of ~1536 MB, potentially missing thresholds in memory-bound contexts.

The primary path (lines 13-16) parses phys_footprint in bytes and is unaffected. The fallback triggers only when that structured output is unavailable. Bash arithmetic $(()) cannot handle decimals, so parse decimals with awk for both extraction and float calculations:

Suggested fix (decimal-safe parsing)

- fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1) + fp_val=$(echo "$fp_line" | awk '{for (i=1;i<=NF;i++) if ($i ~ /^[0-9]+(\.[0-9]+)?$/) {print $i; exit}}') fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1) case "$fp_unit" in - GB) echo "$((fp_val * 1024))" ;; - MB) echo "$fp_val" ;; - KB) echo "$((fp_val / 1024))" ;; + GB) awk -v v="$fp_val" 'BEGIN {printf "%d", v * 1024}' ;; + MB) awk -v v="$fp_val" 'BEGIN {printf "%d", v}' ;; + KB) awk -v v="$fp_val" 'BEGIN {printf "%d", v / 1024}' ;; *) echo "0" ;; esac

🤖 Prompt for AI Agents

In @.agents/scripts/supervisor-helper.sh around lines 726 - 752, The fallback parsing of the Footprint line (variables fp_line, fp_val, fp_unit) only extracts integer digits with grep -oE '[0-9]+' which truncates decimal values (e.g., "1.5 GB"); update the fallback to extract the numeric value including decimals and perform unit conversion using a float-capable tool like awk (or bc) instead of bash integer arithmetic so values like 1.5 GB are converted to ~1536 MB; specifically, change the extraction of fp_val to capture decimals (e.g., via awk or grep -oE '[0-9]+(\.[0-9]+)?'), compute MB by multiplying/dividing as needed in awk (GB*1024, MB*1, KB/1024) and echo the integer MB result (rounded or floored consistently) while keeping the existing fp_line, fp_unit and return behavior.

coderabbitai · 2026-02-11T01:25:43Z

.agents/scripts/supervisor-helper.sh

+        local parent_cmd
+        parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
+        if [[ "$parent_cmd" == *"opencode"* ]]; then
+            local parent_footprint
+            parent_footprint=$(get_process_footprint_mb "$parent_pid")
+            if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then
+                log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"
+                log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"
+                log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"
+
+                # Write a respawn marker file for external tooling to detect
+                local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"
+                {
+                    echo "pid=$parent_pid"
+                    echo "footprint_mb=$parent_footprint"
+                    echo "threshold_mb=$threshold_mb"
+                    echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+                    echo "reason=webkit_malloc_accumulation"
+                } > "$respawn_marker"
+
+                return 1
+            fi
+        fi
+
+        check_pid="$parent_pid"
+        depth=$((depth + 1))
+    done


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 106

🏁 Script executed:

# First, let's check the file and see the specific sections sed -n '820,846p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1351

🏁 Script executed:

# Check the second location mentioned sed -n '4873,4899p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1298

🏁 Script executed:

# Let's also run ShellCheck to see if there are any violations shellcheck -f json .agents/scripts/supervisor-helper.sh | head -100

Repository: marcusquinn/aidevops

Length of output: 11250

Parent OpenCode detection may miss Bun/Node process wrappers.

Lines 821 and 4876 use ps -o comm=, which returns only the executable name. When OpenCode is launched via bun opencode or node opencode, the comm field shows bun or node, causing the pattern check to fail silently.

Use ps -o args= (with fallback to ps -o comm=) to capture the full command line and detect wrapped launches:

Suggested implementation

- parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "") + parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "") + [[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "") if [[ "$parent_cmd" == *"opencode"* ]]; then

Apply to both locations (lines ~821 and ~4876).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

local parent_cmd

parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")

if [[ "$parent_cmd" == *"opencode"* ]]; then

local parent_footprint

parent_footprint=$(get_process_footprint_mb "$parent_pid")

if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then

log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"

log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"

log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"

# Write a respawn marker file for external tooling to detect

local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"

{

echo "pid=$parent_pid"

echo "footprint_mb=$parent_footprint"

echo "threshold_mb=$threshold_mb"

echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"

echo "reason=webkit_malloc_accumulation"

} > "$respawn_marker"

return 1

fi

fi

check_pid="$parent_pid"

depth=$((depth + 1))

done

local parent_cmd

parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "")

[[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")

if [[ "$parent_cmd" == *"opencode"* ]]; then

local parent_footprint

parent_footprint=$(get_process_footprint_mb "$parent_pid")

if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then

log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"

log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"

log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"

# Write a respawn marker file for external tooling to detect

local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"

{

echo "pid=$parent_pid"

echo "footprint_mb=$parent_footprint"

echo "threshold_mb=$threshold_mb"

echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"

echo "reason=webkit_malloc_accumulation"

} > "$respawn_marker"

return 1

fi

fi

check_pid="$parent_pid"

depth=$((depth + 1))

done

🤖 Prompt for AI Agents

In @.agents/scripts/supervisor-helper.sh around lines 820 - 846, The parent-process detection using ps -o comm= (assigned to parent_cmd) misses wrapped launches like "bun opencode" or "node opencode"; update the assignment of parent_cmd to prefer the full command line (ps -o args=) with a fallback to ps -o comm= so pattern checks like [[ "$parent_cmd" == *"opencode"* ]] succeed; apply the same change in the other occurrence around the block that uses check_pid/depth (~4876) so both parent_cmd usages detect wrapped Bun/Node invocations and still fall back safely when args isn’t available.

marcusquinn added 2 commits February 11, 2026 01:19

docs: add session memory monitoring to AGENTS.md supervisor section

dc4fe2e

marcusquinn merged commit 202d6bf into main Feb 11, 2026
8 of 10 checks passed

marcusquinn added a commit that referenced this pull request Feb 11, 2026

chore: mark t264 complete (pr:#1040 merged)

a7e1616

coderabbitai bot requested changes Feb 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: supervisor session memory monitoring with respawn detection (t264) #1040

feat: supervisor session memory monitoring with respawn detection (t264) #1040

Uh oh!

marcusquinn commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

sonarqubecloud bot commented Feb 11, 2026

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

coderabbitai bot Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: supervisor session memory monitoring with respawn detection (t264) #1040

feat: supervisor session memory monitoring with respawn detection (t264) #1040

Uh oh!

Conversation

marcusquinn commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Design Decisions

Testing

Summary by CodeRabbit

Release Notes

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

github-actions bot commented Feb 11, 2026

🔍 Code Quality Report

📈 Current Quality Metrics

Uh oh!

sonarqubecloud bot commented Feb 11, 2026

Quality Gate passed

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

marcusquinn commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading