Skip to content

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 11, 2026

Summary

  • Add per-process memory footprint measurement using macOS footprint(1) / Linux /proc/PID/status
  • Add Phase 11 to supervisor pulse cycle: detects when the parent OpenCode session exceeds memory threshold and writes a respawn marker
  • Add supervisor-helper.sh mem-check CLI command for manual inspection
  • Document in AGENTS.md

Problem

OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never returned to the OS. A single session observed at 25GB+ phys_footprint (30GB total). Workers are short-lived and already cleaned up by Phase 4, but the long-running supervisor session needs monitoring.

Design Decisions

  • Workers are NOT killed for memory — they may legitimately need more for large tasks, and Phase 4 already handles dead/hung workers
  • Concurrency is NOT throttled by worker memory — existing system-wide memory_pressure detection already handles this
  • Focus is solely on the supervisor session — the one long-lived process that accumulates WebKit malloc over hours
  • Respawn marker file (respawn-recommended) enables external tooling to detect and act on the recommendation
  • SUPERVISOR_SELF_MEM_LIMIT env var (default 8192MB) controls the threshold

Testing

  • get_process_footprint_mb verified against known 29GB OpenCode process (returned 30406MB, matching Activity Monitor)
  • mem-check command tested — correctly detects parent OpenCode session, shows footprint and uptime
  • ShellCheck clean (zero new warnings)

Closes t264

Summary by CodeRabbit

Release Notes

  • New Features

    • Added automatic memory monitoring for long-running sessions to detect when memory usage exceeds thresholds
    • New memory inspection capability to check session memory footprint and receive respawn recommendations
    • Configurable memory limit threshold for supervisor sessions
  • Documentation

    • Added documentation explaining session memory behavior, monitoring mechanisms, and how to use memory inspection utilities

… (t264)

OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never
returned to the OS. A single session can grow to 25GB+ phys_footprint.
Workers are short-lived and already cleaned up by Phase 4, but the
long-running supervisor session needs monitoring.

Changes:
- get_process_footprint_mb(): cross-platform function using macOS footprint(1)
  or Linux /proc/PID/status to get true physical memory footprint
- check_supervisor_memory(): walks process tree to find parent OpenCode
  session, writes respawn-recommended marker when threshold exceeded
- Phase 11 in cmd_pulse(): checks supervisor memory each pulse cycle
- cmd_mem_check: CLI command to inspect supervisor session memory
- Respawn marker file for external tooling to detect and act on

Design: concurrency is already managed by existing system memory pressure
detection. Workers are already short-lived. This focuses solely on the
supervisor session that accumulates memory over hours of operation.

Discovered via OpenCode issue anomalyco/opencode#13041.
@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Walkthrough

This PR adds session memory monitoring capabilities to the supervisor helper script, introducing a new mem-check command that computes the supervisor process footprint, compares it against a configurable threshold (SUPERVISOR_SELF_MEM_LIMIT, default 8192 MB), and logs respawn recommendations when thresholds are exceeded.

Changes

Cohort / File(s) Summary
Documentation - Memory Monitoring
.agents/AGENTS.md
Adds documentation describing session memory monitoring (t264) for long-running OpenCode/Bun sessions, Phase 11 pulse cycle memory checks, respawn marker behavior, and inspection via supervisor-helper.sh mem-check.
CLI Extension - Memory Check Command
.agents/scripts/supervisor-helper.sh
Adds new cmd_mem_check() function to monitor supervisor memory footprint against threshold, introduces SUPERVISOR_SELF_MEM_LIMIT environment variable (default 8192 MB), and extends CLI dispatcher to recognize and route mem-check subcommand.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

📊 When sessions grow fat with malloc's delight,
A helper arrives in the dead of the night,
Checks footprints with care, knows the mem-check command,
Respawns when needed—automation so grand! 🚀

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding supervisor session memory monitoring with respawn detection, matching the core functionality added in both documentation and the helper script.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/t264-supervisor-auto-respawn

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 46 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 11 01:19:56 UTC 2026: Code review monitoring started
Wed Feb 11 01:19:57 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 46

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 46
  • VULNERABILITIES: 0

Generated on: Wed Feb 11 01:20:00 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit 202d6bf into main Feb 11, 2026
8 of 10 checks passed
marcusquinn added a commit that referenced this pull request Feb 11, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 820-846: The parent-process detection using ps -o comm= (assigned
to parent_cmd) misses wrapped launches like "bun opencode" or "node opencode";
update the assignment of parent_cmd to prefer the full command line (ps -o
args=) with a fallback to ps -o comm= so pattern checks like [[ "$parent_cmd" ==
*"opencode"* ]] succeed; apply the same change in the other occurrence around
the block that uses check_pid/depth (~4876) so both parent_cmd usages detect
wrapped Bun/Node invocations and still fall back safely when args isn’t
available.
- Around line 726-752: The fallback parsing of the Footprint line (variables
fp_line, fp_val, fp_unit) only extracts integer digits with grep -oE '[0-9]+'
which truncates decimal values (e.g., "1.5 GB"); update the fallback to extract
the numeric value including decimals and perform unit conversion using a
float-capable tool like awk (or bc) instead of bash integer arithmetic so values
like 1.5 GB are converted to ~1536 MB; specifically, change the extraction of
fp_val to capture decimals (e.g., via awk or grep -oE '[0-9]+(\.[0-9]+)?'),
compute MB by multiplying/dividing as needed in awk (GB*1024, MB*1, KB/1024) and
echo the integer MB result (rounded or floored consistently) while keeping the
existing fp_line, fp_unit and return behavior.
🧹 Nitpick comments (1)
.agents/AGENTS.md (1)

262-263: Consider adding cross-reference to detailed documentation.

While the documentation references supervisor-helper.sh mem-check, it would be helpful to confirm that the "Full docs" reference at line 264 includes detailed coverage of the mem-check command, SUPERVISOR_SELF_MEM_LIMIT configuration, and the Phase 11 behavior. Additionally, consider briefly explaining what "Phase 11" means for readers unfamiliar with the pulse cycle phases.

💡 Optional enhancement for clarity
-**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory.
+**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory. See supervisor docs for Phase 11 details.

Alternatively, verify that the "Full docs" reference at line 264 already covers these details comprehensively.

Comment on lines +726 to +752
if [[ "$(uname)" == "Darwin" ]]; then
# macOS: footprint --pid gives phys_footprint (dirty + swapped + compressed)
# This matches what Activity Monitor displays
local fp_output
fp_output=$(footprint --pid "$pid" -f bytes --noCategories 2>/dev/null || echo "")
if [[ -n "$fp_output" ]]; then
local fp_bytes
fp_bytes=$(echo "$fp_output" | grep -oE 'phys_footprint: [0-9]+' | grep -oE '[0-9]+' || echo "")
if [[ -n "$fp_bytes" && "$fp_bytes" -gt 0 ]] 2>/dev/null; then
echo "$((fp_bytes / 1048576))"
return 0
fi
# Fallback: parse the Footprint line (e.g., "Footprint: 30 GB" or "Footprint: 500 MB")
local fp_line
fp_line=$(echo "$fp_output" | grep -E 'Footprint:' | head -1)
if [[ -n "$fp_line" ]]; then
local fp_val fp_unit
fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1)
fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1)
case "$fp_unit" in
GB) echo "$((fp_val * 1024))" ;;
MB) echo "$fp_val" ;;
KB) echo "$((fp_val / 1024))" ;;
*) echo "0" ;;
esac
return 0
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Read the supervisor-helper.sh file around lines 726-752 to verify the code
head -755 .agents/scripts/supervisor-helper.sh | tail -35 | cat -n

Repository: marcusquinn/aidevops

Length of output: 1849


🏁 Script executed:

# Also get a wider context to see line numbers
sed -n '720,755p' .agents/scripts/supervisor-helper.sh | cat -n

Repository: marcusquinn/aidevops

Length of output: 1884


🏁 Script executed:

# Check the ShellCheck compliance for the file to see if there are any existing issues
if command -v shellcheck >/dev/null 2>&1; then
  shellcheck .agents/scripts/supervisor-helper.sh 2>&1 | head -50
else
  echo "ShellCheck not available in sandbox"
fi

Repository: marcusquinn/aidevops

Length of output: 2223


Decimal values in fallback footprint parsing are truncated to integers.

Line 24 uses grep -oE '[0-9]+' which captures only integer digits. For output like Footprint: 1.5 GB, this extracts 1 and reports 1024 MB instead of ~1536 MB, potentially missing thresholds in memory-bound contexts.

The primary path (lines 13-16) parses phys_footprint in bytes and is unaffected. The fallback triggers only when that structured output is unavailable. Bash arithmetic $(()) cannot handle decimals, so parse decimals with awk for both extraction and float calculations:

Suggested fix (decimal-safe parsing)
-            fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1)
+            fp_val=$(echo "$fp_line" | awk '{for (i=1;i<=NF;i++) if ($i ~ /^[0-9]+(\.[0-9]+)?$/) {print $i; exit}}')
             fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1)
             case "$fp_unit" in
-                GB) echo "$((fp_val * 1024))" ;;
-                MB) echo "$fp_val" ;;
-                KB) echo "$((fp_val / 1024))" ;;
+                GB) awk -v v="$fp_val" 'BEGIN {printf "%d", v * 1024}' ;;
+                MB) awk -v v="$fp_val" 'BEGIN {printf "%d", v}' ;;
+                KB) awk -v v="$fp_val" 'BEGIN {printf "%d", v / 1024}' ;;
                 *) echo "0" ;;
             esac
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 726 - 752, The fallback
parsing of the Footprint line (variables fp_line, fp_val, fp_unit) only extracts
integer digits with grep -oE '[0-9]+' which truncates decimal values (e.g., "1.5
GB"); update the fallback to extract the numeric value including decimals and
perform unit conversion using a float-capable tool like awk (or bc) instead of
bash integer arithmetic so values like 1.5 GB are converted to ~1536 MB;
specifically, change the extraction of fp_val to capture decimals (e.g., via awk
or grep -oE '[0-9]+(\.[0-9]+)?'), compute MB by multiplying/dividing as needed
in awk (GB*1024, MB*1, KB/1024) and echo the integer MB result (rounded or
floored consistently) while keeping the existing fp_line, fp_unit and return
behavior.

Comment on lines +820 to +846
local parent_cmd
parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
if [[ "$parent_cmd" == *"opencode"* ]]; then
local parent_footprint
parent_footprint=$(get_process_footprint_mb "$parent_pid")
if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then
log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"
log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"
log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"

# Write a respawn marker file for external tooling to detect
local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"
{
echo "pid=$parent_pid"
echo "footprint_mb=$parent_footprint"
echo "threshold_mb=$threshold_mb"
echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "reason=webkit_malloc_accumulation"
} > "$respawn_marker"

return 1
fi
fi

check_pid="$parent_pid"
depth=$((depth + 1))
done
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

wc -l .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 106


🏁 Script executed:

# First, let's check the file and see the specific sections
sed -n '820,846p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1351


🏁 Script executed:

# Check the second location mentioned
sed -n '4873,4899p' .agents/scripts/supervisor-helper.sh

Repository: marcusquinn/aidevops

Length of output: 1298


🏁 Script executed:

# Let's also run ShellCheck to see if there are any violations
shellcheck -f json .agents/scripts/supervisor-helper.sh | head -100

Repository: marcusquinn/aidevops

Length of output: 11250


Parent OpenCode detection may miss Bun/Node process wrappers.

Lines 821 and 4876 use ps -o comm=, which returns only the executable name. When OpenCode is launched via bun opencode or node opencode, the comm field shows bun or node, causing the pattern check to fail silently.

Use ps -o args= (with fallback to ps -o comm=) to capture the full command line and detect wrapped launches:

Suggested implementation
-        parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
+        parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "")
+        [[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
         if [[ "$parent_cmd" == *"opencode"* ]]; then

Apply to both locations (lines ~821 and ~4876).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
local parent_cmd
parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
if [[ "$parent_cmd" == *"opencode"* ]]; then
local parent_footprint
parent_footprint=$(get_process_footprint_mb "$parent_pid")
if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then
log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"
log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"
log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"
# Write a respawn marker file for external tooling to detect
local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"
{
echo "pid=$parent_pid"
echo "footprint_mb=$parent_footprint"
echo "threshold_mb=$threshold_mb"
echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "reason=webkit_malloc_accumulation"
} > "$respawn_marker"
return 1
fi
fi
check_pid="$parent_pid"
depth=$((depth + 1))
done
local parent_cmd
parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "")
[[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
if [[ "$parent_cmd" == *"opencode"* ]]; then
local parent_footprint
parent_footprint=$(get_process_footprint_mb "$parent_pid")
if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then
log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB"
log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed"
log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB"
# Write a respawn marker file for external tooling to detect
local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended"
{
echo "pid=$parent_pid"
echo "footprint_mb=$parent_footprint"
echo "threshold_mb=$threshold_mb"
echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
echo "reason=webkit_malloc_accumulation"
} > "$respawn_marker"
return 1
fi
fi
check_pid="$parent_pid"
depth=$((depth + 1))
done
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 820 - 846, The
parent-process detection using ps -o comm= (assigned to parent_cmd) misses
wrapped launches like "bun opencode" or "node opencode"; update the assignment
of parent_cmd to prefer the full command line (ps -o args=) with a fallback to
ps -o comm= so pattern checks like [[ "$parent_cmd" == *"opencode"* ]] succeed;
apply the same change in the other occurrence around the block that uses
check_pid/depth (~4876) so both parent_cmd usages detect wrapped Bun/Node
invocations and still fall back safely when args isn’t available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant