-
Notifications
You must be signed in to change notification settings - Fork 5
feat: supervisor session memory monitoring with respawn detection (t264) #1040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… (t264) OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never returned to the OS. A single session can grow to 25GB+ phys_footprint. Workers are short-lived and already cleaned up by Phase 4, but the long-running supervisor session needs monitoring. Changes: - get_process_footprint_mb(): cross-platform function using macOS footprint(1) or Linux /proc/PID/status to get true physical memory footprint - check_supervisor_memory(): walks process tree to find parent OpenCode session, writes respawn-recommended marker when threshold exceeded - Phase 11 in cmd_pulse(): checks supervisor memory each pulse cycle - cmd_mem_check: CLI command to inspect supervisor session memory - Respawn marker file for external tooling to detect and act on Design: concurrency is already managed by existing system memory pressure detection. Workers are already short-lived. This focuses solely on the supervisor session that accumulates memory over hours of operation. Discovered via OpenCode issue anomalyco/opencode#13041.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
WalkthroughThis PR adds session memory monitoring capabilities to the supervisor helper script, introducing a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Wed Feb 11 01:20:00 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @.agents/scripts/supervisor-helper.sh:
- Around line 820-846: The parent-process detection using ps -o comm= (assigned
to parent_cmd) misses wrapped launches like "bun opencode" or "node opencode";
update the assignment of parent_cmd to prefer the full command line (ps -o
args=) with a fallback to ps -o comm= so pattern checks like [[ "$parent_cmd" ==
*"opencode"* ]] succeed; apply the same change in the other occurrence around
the block that uses check_pid/depth (~4876) so both parent_cmd usages detect
wrapped Bun/Node invocations and still fall back safely when args isn’t
available.
- Around line 726-752: The fallback parsing of the Footprint line (variables
fp_line, fp_val, fp_unit) only extracts integer digits with grep -oE '[0-9]+'
which truncates decimal values (e.g., "1.5 GB"); update the fallback to extract
the numeric value including decimals and perform unit conversion using a
float-capable tool like awk (or bc) instead of bash integer arithmetic so values
like 1.5 GB are converted to ~1536 MB; specifically, change the extraction of
fp_val to capture decimals (e.g., via awk or grep -oE '[0-9]+(\.[0-9]+)?'),
compute MB by multiplying/dividing as needed in awk (GB*1024, MB*1, KB/1024) and
echo the integer MB result (rounded or floored consistently) while keeping the
existing fp_line, fp_unit and return behavior.
🧹 Nitpick comments (1)
.agents/AGENTS.md (1)
262-263: Consider adding cross-reference to detailed documentation.While the documentation references
supervisor-helper.sh mem-check, it would be helpful to confirm that the "Full docs" reference at line 264 includes detailed coverage of the mem-check command, SUPERVISOR_SELF_MEM_LIMIT configuration, and the Phase 11 behavior. Additionally, consider briefly explaining what "Phase 11" means for readers unfamiliar with the pulse cycle phases.💡 Optional enhancement for clarity
-**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory. +**Session memory monitoring** (t264): Long-running OpenCode/Bun sessions accumulate WebKit malloc dirty pages that are never returned to the OS (25GB+ observed). Phase 11 of the pulse cycle checks the parent OpenCode session's `phys_footprint` and writes a respawn marker when it exceeds `SUPERVISOR_SELF_MEM_LIMIT` (default: 8192MB). Cron-based pulses are already fresh processes. Use `supervisor-helper.sh mem-check` to inspect session memory. See supervisor docs for Phase 11 details.Alternatively, verify that the "Full docs" reference at line 264 already covers these details comprehensively.
| if [[ "$(uname)" == "Darwin" ]]; then | ||
| # macOS: footprint --pid gives phys_footprint (dirty + swapped + compressed) | ||
| # This matches what Activity Monitor displays | ||
| local fp_output | ||
| fp_output=$(footprint --pid "$pid" -f bytes --noCategories 2>/dev/null || echo "") | ||
| if [[ -n "$fp_output" ]]; then | ||
| local fp_bytes | ||
| fp_bytes=$(echo "$fp_output" | grep -oE 'phys_footprint: [0-9]+' | grep -oE '[0-9]+' || echo "") | ||
| if [[ -n "$fp_bytes" && "$fp_bytes" -gt 0 ]] 2>/dev/null; then | ||
| echo "$((fp_bytes / 1048576))" | ||
| return 0 | ||
| fi | ||
| # Fallback: parse the Footprint line (e.g., "Footprint: 30 GB" or "Footprint: 500 MB") | ||
| local fp_line | ||
| fp_line=$(echo "$fp_output" | grep -E 'Footprint:' | head -1) | ||
| if [[ -n "$fp_line" ]]; then | ||
| local fp_val fp_unit | ||
| fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1) | ||
| fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1) | ||
| case "$fp_unit" in | ||
| GB) echo "$((fp_val * 1024))" ;; | ||
| MB) echo "$fp_val" ;; | ||
| KB) echo "$((fp_val / 1024))" ;; | ||
| *) echo "0" ;; | ||
| esac | ||
| return 0 | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Read the supervisor-helper.sh file around lines 726-752 to verify the code
head -755 .agents/scripts/supervisor-helper.sh | tail -35 | cat -nRepository: marcusquinn/aidevops
Length of output: 1849
🏁 Script executed:
# Also get a wider context to see line numbers
sed -n '720,755p' .agents/scripts/supervisor-helper.sh | cat -nRepository: marcusquinn/aidevops
Length of output: 1884
🏁 Script executed:
# Check the ShellCheck compliance for the file to see if there are any existing issues
if command -v shellcheck >/dev/null 2>&1; then
shellcheck .agents/scripts/supervisor-helper.sh 2>&1 | head -50
else
echo "ShellCheck not available in sandbox"
fiRepository: marcusquinn/aidevops
Length of output: 2223
Decimal values in fallback footprint parsing are truncated to integers.
Line 24 uses grep -oE '[0-9]+' which captures only integer digits. For output like Footprint: 1.5 GB, this extracts 1 and reports 1024 MB instead of ~1536 MB, potentially missing thresholds in memory-bound contexts.
The primary path (lines 13-16) parses phys_footprint in bytes and is unaffected. The fallback triggers only when that structured output is unavailable. Bash arithmetic $(()) cannot handle decimals, so parse decimals with awk for both extraction and float calculations:
Suggested fix (decimal-safe parsing)
- fp_val=$(echo "$fp_line" | grep -oE '[0-9]+' | head -1)
+ fp_val=$(echo "$fp_line" | awk '{for (i=1;i<=NF;i++) if ($i ~ /^[0-9]+(\.[0-9]+)?$/) {print $i; exit}}')
fp_unit=$(echo "$fp_line" | grep -oE '(GB|MB|KB)' | head -1)
case "$fp_unit" in
- GB) echo "$((fp_val * 1024))" ;;
- MB) echo "$fp_val" ;;
- KB) echo "$((fp_val / 1024))" ;;
+ GB) awk -v v="$fp_val" 'BEGIN {printf "%d", v * 1024}' ;;
+ MB) awk -v v="$fp_val" 'BEGIN {printf "%d", v}' ;;
+ KB) awk -v v="$fp_val" 'BEGIN {printf "%d", v / 1024}' ;;
*) echo "0" ;;
esac🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 726 - 752, The fallback
parsing of the Footprint line (variables fp_line, fp_val, fp_unit) only extracts
integer digits with grep -oE '[0-9]+' which truncates decimal values (e.g., "1.5
GB"); update the fallback to extract the numeric value including decimals and
perform unit conversion using a float-capable tool like awk (or bc) instead of
bash integer arithmetic so values like 1.5 GB are converted to ~1536 MB;
specifically, change the extraction of fp_val to capture decimals (e.g., via awk
or grep -oE '[0-9]+(\.[0-9]+)?'), compute MB by multiplying/dividing as needed
in awk (GB*1024, MB*1, KB/1024) and echo the integer MB result (rounded or
floored consistently) while keeping the existing fp_line, fp_unit and return
behavior.
| local parent_cmd | ||
| parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "") | ||
| if [[ "$parent_cmd" == *"opencode"* ]]; then | ||
| local parent_footprint | ||
| parent_footprint=$(get_process_footprint_mb "$parent_pid") | ||
| if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then | ||
| log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB" | ||
| log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed" | ||
| log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB" | ||
|
|
||
| # Write a respawn marker file for external tooling to detect | ||
| local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended" | ||
| { | ||
| echo "pid=$parent_pid" | ||
| echo "footprint_mb=$parent_footprint" | ||
| echo "threshold_mb=$threshold_mb" | ||
| echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" | ||
| echo "reason=webkit_malloc_accumulation" | ||
| } > "$respawn_marker" | ||
|
|
||
| return 1 | ||
| fi | ||
| fi | ||
|
|
||
| check_pid="$parent_pid" | ||
| depth=$((depth + 1)) | ||
| done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
wc -l .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 106
🏁 Script executed:
# First, let's check the file and see the specific sections
sed -n '820,846p' .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 1351
🏁 Script executed:
# Check the second location mentioned
sed -n '4873,4899p' .agents/scripts/supervisor-helper.shRepository: marcusquinn/aidevops
Length of output: 1298
🏁 Script executed:
# Let's also run ShellCheck to see if there are any violations
shellcheck -f json .agents/scripts/supervisor-helper.sh | head -100Repository: marcusquinn/aidevops
Length of output: 11250
Parent OpenCode detection may miss Bun/Node process wrappers.
Lines 821 and 4876 use ps -o comm=, which returns only the executable name. When OpenCode is launched via bun opencode or node opencode, the comm field shows bun or node, causing the pattern check to fail silently.
Use ps -o args= (with fallback to ps -o comm=) to capture the full command line and detect wrapped launches:
Suggested implementation
- parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
+ parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "")
+ [[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "")
if [[ "$parent_cmd" == *"opencode"* ]]; thenApply to both locations (lines ~821 and ~4876).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| local parent_cmd | |
| parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "") | |
| if [[ "$parent_cmd" == *"opencode"* ]]; then | |
| local parent_footprint | |
| parent_footprint=$(get_process_footprint_mb "$parent_pid") | |
| if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then | |
| log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB" | |
| log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed" | |
| log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB" | |
| # Write a respawn marker file for external tooling to detect | |
| local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended" | |
| { | |
| echo "pid=$parent_pid" | |
| echo "footprint_mb=$parent_footprint" | |
| echo "threshold_mb=$threshold_mb" | |
| echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" | |
| echo "reason=webkit_malloc_accumulation" | |
| } > "$respawn_marker" | |
| return 1 | |
| fi | |
| fi | |
| check_pid="$parent_pid" | |
| depth=$((depth + 1)) | |
| done | |
| local parent_cmd | |
| parent_cmd=$(ps -o args= -p "$parent_pid" 2>/dev/null || echo "") | |
| [[ -z "$parent_cmd" ]] && parent_cmd=$(ps -o comm= -p "$parent_pid" 2>/dev/null || echo "") | |
| if [[ "$parent_cmd" == *"opencode"* ]]; then | |
| local parent_footprint | |
| parent_footprint=$(get_process_footprint_mb "$parent_pid") | |
| if [[ "$parent_footprint" -gt "$threshold_mb" ]] 2>/dev/null; then | |
| log_warn "Parent OpenCode session (PID $parent_pid) footprint ${parent_footprint}MB exceeds ${threshold_mb}MB" | |
| log_warn "WebKit/Bun malloc accumulates dirty pages that are never freed" | |
| log_warn "Recommendation: save session state and restart OpenCode to reclaim ${parent_footprint}MB" | |
| # Write a respawn marker file for external tooling to detect | |
| local respawn_marker="${SUPERVISOR_DIR}/respawn-recommended" | |
| { | |
| echo "pid=$parent_pid" | |
| echo "footprint_mb=$parent_footprint" | |
| echo "threshold_mb=$threshold_mb" | |
| echo "timestamp=$(date -u +%Y-%m-%dT%H:%M:%SZ)" | |
| echo "reason=webkit_malloc_accumulation" | |
| } > "$respawn_marker" | |
| return 1 | |
| fi | |
| fi | |
| check_pid="$parent_pid" | |
| depth=$((depth + 1)) | |
| done |
🤖 Prompt for AI Agents
In @.agents/scripts/supervisor-helper.sh around lines 820 - 846, The
parent-process detection using ps -o comm= (assigned to parent_cmd) misses
wrapped launches like "bun opencode" or "node opencode"; update the assignment
of parent_cmd to prefer the full command line (ps -o args=) with a fallback to
ps -o comm= so pattern checks like [[ "$parent_cmd" == *"opencode"* ]] succeed;
apply the same change in the other occurrence around the block that uses
check_pid/depth (~4876) so both parent_cmd usages detect wrapped Bun/Node
invocations and still fall back safely when args isn’t available.



Summary
footprint(1)/ Linux/proc/PID/statussupervisor-helper.sh mem-checkCLI command for manual inspectionProblem
OpenCode/Bun processes accumulate WebKit malloc dirty pages that are never returned to the OS. A single session observed at 25GB+
phys_footprint(30GB total). Workers are short-lived and already cleaned up by Phase 4, but the long-running supervisor session needs monitoring.Design Decisions
memory_pressuredetection already handles thisrespawn-recommended) enables external tooling to detect and act on the recommendationSUPERVISOR_SELF_MEM_LIMITenv var (default 8192MB) controls the thresholdTesting
get_process_footprint_mbverified against known 29GB OpenCode process (returned 30406MB, matching Activity Monitor)mem-checkcommand tested — correctly detects parent OpenCode session, shows footprint and uptimeCloses t264
Summary by CodeRabbit
Release Notes
New Features
Documentation