t1398: Fix runaway memory consumption — process guards, ShellCheck limits, session awareness#2855
Conversation
…398) Root-cause fix for runaway memory consumption (March 3 kernel panic): 1. Process guard in pulse-wrapper.sh: guard_child_processes() runs every 60s in the watchdog loop, killing any child process exceeding configurable RSS (2GB default) or runtime (10min default) limits. ShellCheck gets stricter limits (1GB/5min) due to known exponential expansion with --external-sources. 2. ShellCheck hardened: linters-local.sh now runs per-file with timeout (30s each) instead of batch mode. Prevents a single file with recursive source directives from consuming 5+ GB RAM. Quality sweep in pulse-wrapper.sh also gets per-file timeout. 3. Session count awareness: check_session_count() warns when >5 concurrent interactive sessions are open. Count displayed in the health issue dashboard with threshold warning. 4. Standalone process-guard-helper.sh: scan, kill-runaways, sessions, and status commands for manual/cron use. Monitors the right signals (individual RSS, runtime, process count) — replaces PR #2792 concept. Closes #2854
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical memory consumption issues and kernel panics by implementing robust process monitoring and resource management. It introduces proactive measures to identify and terminate runaway processes, particularly those with high memory usage or long execution times, and enhances the stability of the system by preventing resource exhaustion. The changes also improve visibility into system health by tracking interactive session counts. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughAdds a standalone process-guard utility and integrates child-process/resource guards and session-count checks into the pulse watchdog; per-file ShellCheck runs are capped with a timeout and results aggregated, with kill/runaway logic for processes exceeding RSS or runtime limits. Changes
Sequence DiagramsequenceDiagram
autonumber
participant Watchdog as "Pulse Watchdog"
participant GuardHelper as "process-guard-helper.sh"
participant GuardFn as "guard_child_processes()"
participant Child as "Child Processes\n(shellcheck, node, opencode)"
participant Dashboard as "Health Dashboard"
Watchdog->>GuardFn: invoke (periodic / startup)
GuardFn->>GuardHelper: cmd_scan()
GuardHelper->>Child: query RSS & runtime
Child-->>GuardHelper: return metrics
GuardHelper->>GuardHelper: evaluate limits (RSS, runtime)
alt violation detected
GuardHelper->>Child: kill violating PID(s)
GuardHelper->>GuardHelper: log event
end
GuardHelper-->>GuardFn: scan results
GuardFn->>Dashboard: update Interactive Sessions & guard status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 01:42:31 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request introduces robust measures to prevent runaway memory consumption by implementing process guards, hardening ShellCheck invocations with timeouts, and adding awareness for high session counts. However, a critical command injection vulnerability was identified in the new process-guard-helper.sh script due to the lack of validation for environment variables used in Bash arithmetic contexts; it is recommended to apply the same level of protection as in pulse-wrapper.sh. Additionally, the review also focuses on improving script efficiency, ensuring adherence to project error handling guidelines, and fixing a potential bug in process discovery.
| CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}" | ||
| CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}" | ||
| SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}" | ||
| SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}" | ||
| SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}" |
There was a problem hiding this comment.
The script uses several environment variables (CHILD_RSS_LIMIT_KB, CHILD_RUNTIME_LIMIT, SHELLCHECK_RSS_LIMIT_KB, SHELLCHECK_RUNTIME_LIMIT, SESSION_COUNT_WARN) directly in Bash arithmetic expansion $(( ... )) and arithmetic comparisons [[ ... -gt ... ]] without validation. In Bash, these constructs evaluate the contents of the variables as expressions, which can include command execution via $(...) or `...`. An attacker who can influence these environment variables can achieve arbitrary command execution.
This is a known risk that is correctly mitigated in the sibling script pulse-wrapper.sh using _validate_int, but it was missed here in the new process-guard-helper.sh script.
| CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}" | |
| CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}" | |
| SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}" | |
| SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}" | |
| SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}" | |
| CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}" | |
| CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}" | |
| SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}" | |
| SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}" | |
| SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}" | |
| # Validate numeric configuration to prevent command injection via arithmetic expansion | |
| for var in CHILD_RSS_LIMIT_KB CHILD_RUNTIME_LIMIT SHELLCHECK_RSS_LIMIT_KB SHELLCHECK_RUNTIME_LIMIT SESSION_COUNT_WARN; do | |
| if [[ ! "${!var}" =~ ^[0-9]+$ ]]; then | |
| echo "Error: $var must be a positive integer. Got: ${!var}" >&2 | |
| exit 1 | |
| fi | |
| done |
| fi | ||
|
|
||
| printf "%-8s %-6s %-10s %-12s %-8s %s\n" "$pid" "${rss_mb}MB" "$etime" "$comm" "$status" "$detail" | ||
| done < <(ps axo pid,rss,etime,comm 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true) |
There was a problem hiding this comment.
The ps command uses the comm field, which provides only the command name (e.g., node). However, your grep pattern node.*opencode suggests an intent to match against the full command line. With comm, this grep will likely fail to find the intended node processes. Consider using the command field instead of comm to match against the full command line. Note that this change will require adjusting the line parsing logic in the while loop, as the command output can contain spaces.
| done < <(ps axo pid,rss,etime,comm 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true) | |
| done < <(ps axo pid,rss,etime,command 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true) |
| SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}" | ||
| LOGFILE="${HOME}/.aidevops/logs/process-guard.log" | ||
|
|
||
| mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors, such as permission issues when creating the directory. This is contrary to the project's general rules which advise against blanket error suppression. The || true is sufficient to prevent the script from exiting on failure, while still allowing underlying errors to be visible for debugging.
| mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true | |
| mkdir -p "$(dirname "$LOGFILE")" || true |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
- In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.
| pid=$(echo "$line" | awk '{print $1}') | ||
| rss=$(echo "$line" | awk '{print $2}') | ||
| etime=$(echo "$line" | awk '{print $3}') | ||
| comm=$(echo "$line" | awk '{print $4}') |
There was a problem hiding this comment.
Using multiple awk and echo commands inside this while loop is inefficient, as it spawns several new processes for each line of input. You can parse the fields more efficiently using the read built-in. This same pattern appears in cmd_kill_runaways and cmd_status and should be updated there as well.
| pid=$(echo "$line" | awk '{print $1}') | |
| rss=$(echo "$line" | awk '{print $2}') | |
| etime=$(echo "$line" | awk '{print $3}') | |
| comm=$(echo "$line" | awk '{print $4}') | |
| read -r pid rss etime comm _ <<<"$line" |
References
- When parsing multiple key-value pairs from a single source in shell scripts, use a single 'while IFS='=' read -r key value' loop instead of repeated 'grep | cut' pipelines to improve efficiency and readability.
- In shell scripts, use pure Bash parameter expansion for string manipulations like calculating indentation within loops to avoid spawning subshells and improve performance.
.agents/scripts/pulse-wrapper.sh
Outdated
| pid=$(echo "$line" | awk '{print $1}') | ||
| rss=$(echo "$line" | awk '{print $3}') | ||
| etime=$(echo "$line" | awk '{print $4}') | ||
| comm=$(echo "$line" | awk '{print $5}') |
There was a problem hiding this comment.
This loop spawns multiple awk and echo processes for each line, which is inefficient. You can parse the line more efficiently using the read built-in.
| pid=$(echo "$line" | awk '{print $1}') | |
| rss=$(echo "$line" | awk '{print $3}') | |
| etime=$(echo "$line" | awk '{print $4}') | |
| comm=$(echo "$line" | awk '{print $5}') | |
| read -r pid ppid rss etime comm _ <<<"$line" |
References
- When parsing multiple key-value pairs from a single source in shell scripts, use a single 'while IFS='=' read -r key value' loop instead of repeated 'grep | cut' pipelines to improve efficiency and readability.
- In shell scripts, use pure Bash parameter expansion for string manipulations like calculating indentation within loops to avoid spawning subshells and improve performance.
.agents/scripts/pulse-wrapper.sh
Outdated
| local result | ||
| result=$(shellcheck -f gcc "$shfile" 2>/dev/null || true) | ||
| # t1398: timeout each shellcheck invocation to prevent exponential expansion | ||
| result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" 2>/dev/null || true) |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important errors, such as issues with the timeout command itself or from shellcheck if it fails for reasons other than finding lint issues (e.g., invalid arguments). The || true already prevents the script from exiting on failure. It's better to allow potential errors to be visible for debugging, in line with project guidelines.
| result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" 2>/dev/null || true) | |
| result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" || true) |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
- In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.
…e subshells (t1398) Security: add _validate_int() to process-guard-helper.sh to prevent command injection via arithmetic expansion of env vars (CHILD_RSS_LIMIT_KB, etc.). Mirrors the same pattern already used in pulse-wrapper.sh. Bug: change ps 'comm' field to 'command' so grep pattern 'node.*opencode' can match the full command line instead of just the executable name. Efficiency: replace awk/echo subshell spawning in while loops with read builtin for field parsing in both process-guard-helper.sh and pulse-wrapper.sh. Style: remove 2>/dev/null from mkdir -p and shellcheck invocations — the || true is sufficient to prevent set -e exits while keeping errors visible.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 01:51:14 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/linters-local.sh:
- Around line 349-351: The fallback branch currently runs shellcheck without a
timeout (the line assigning file_result from shellcheck -x -P SCRIPTDIR ...),
risking long-running expansions; replace that direct call by invoking a small
portable timeout wrapper (e.g., start shellcheck in the background, capture its
PID, launch a watcher that sleeps a configured TIMEOUT and kills the PID if
still running) and assign its stdout/stderr to file_result the same way; ensure
the wrapper is used only when neither timeout nor gtimeout is available and
references the same shellcheck invocation (the shellcheck -x -P SCRIPTDIR
--severity=warning --format=gcc "$file") and preserves exit handling (|| true).
In @.agents/scripts/process-guard-helper.sh:
- Line 153: The ps invocation uses the comm field so the 'node.*opencode'
pattern never matches; update the three occurrences (the ps calls at the shown
snippet and the other two similar sites) to use the full command/args column
(e.g., replace "ps axo pid,rss,etime,comm" with "ps axo pid,rss,etime,args" or
"ps -eo pid,rss,etime,cmd" depending on portability) so grep
'opencode|shellcheck|node.*opencode' can match node processes wrapping opencode;
keep the same grep logic and ensure all three occurrences are changed
consistently.
- Around line 41-45: The env-driven numeric limits (CHILD_RSS_LIMIT_KB,
CHILD_RUNTIME_LIMIT, SHELLCHECK_RSS_LIMIT_KB, SHELLCHECK_RUNTIME_LIMIT,
SESSION_COUNT_WARN) are used directly and may contain non-numeric values; add a
small sanitizer that validates each value is an integer (e.g. regex check like
^[0-9]+$ or use arithmetic-safe parsing) and if invalid or empty, resets the
variable to its default fallback; implement a helper function (e.g.
sanitize_numeric_env) and call it for each of the five variables before any
arithmetic/comparisons to ensure safe numeric usage and avoid set -e failures.
- Around line 183-223: The process-guard currently doesn't check whether a
process is attached to a TTY and so may kill interactive user sessions; modify
the loop that reads ps output (where pid, rss, comm are parsed and
_get_process_age is called) to also capture the TTY column (add tty to the ps
axo fields), and skip any PID whose TTY is not "?" (or matches pts/*/tty
devices) before applying CHILD_RSS_LIMIT_KB or CHILD_RUNTIME_LIMIT (and the
SHELLCHECK_* overrides); only evaluate and potentially kill processes with no
controlling terminal. Ensure you reference the same variables (pid, comm, rss,
age_seconds, violation) and leave killing logic unchanged for non-interactive
processes.
In @.agents/scripts/pulse-wrapper.sh:
- Around line 899-914: The runtime-kill branch currently applies
CHILD_RUNTIME_LIMIT to all processes (except shellcheck) causing the primary
pulse process to be killed around 10 minutes; update the logic to skip applying
CHILD_RUNTIME_LIMIT for the main pulse process by checking the process id (pid)
against the primary pulse PID variable (PULSE_PID) and instead allow it to be
governed by PULSE_STALE_THRESHOLD (or skip runtime violation for pid ==
PULSE_PID), i.e., add a condition around the age_seconds > runtime_limit check
that ignores the CHILD_RUNTIME_LIMIT when pid == "$PULSE_PID" (while keeping the
existing comm, rss, rss_limit, SHELLCHECK_* handling intact).
- Around line 1847-1859: The loop unconditionally runs shellcheck with
"$sc_timeout_cmd" which is empty when neither timeout nor gtimeout exist,
leaving an unbounded execution; update the while-loop around the shellcheck
invocation in the same block (references: sc_timeout_cmd, result, the while IFS=
read -r shfile loop and the shellcheck -f gcc invocation) to check if
sc_timeout_cmd is non-empty before executing shellcheck and otherwise skip or
mark the file as unchecked (e.g., log a warning and continue), so that
shellcheck is never invoked without a timeout tool present.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 67c1ab7d-28aa-4716-98ad-018b7c5be5f3
📒 Files selected for processing (3)
.agents/scripts/linters-local.sh.agents/scripts/process-guard-helper.sh.agents/scripts/pulse-wrapper.sh
|
Worker PID 16612 killed by supervisor pulse — struggle_ratio 10620 (10620 messages, 0 commits) after 3h27m. The worker was thrashing with no output. Re-dispatching with a fresh worker to address the CodeRabbit CHANGES_REQUESTED feedback. Issue #2854 reset to available for re-dispatch. |
|
Worker PID 41971 killed by supervisor pulse — struggle_ratio 10649 (10649 messages, 0 commits, 4h13m runtime). This indicates the worker was thrashing without making progress. Re-dispatching with a fresh worker on the same branch. Resolves #2854. |
…idation (t1398) - linters-local.sh: portable background+watcher timeout for shellcheck fallback when neither timeout nor gtimeout is available - process-guard-helper.sh: add tty field to ps, skip TTY-attached (interactive) processes from kill logic; _validate_int() sanitizes all numeric env vars before arithmetic expansion - pulse-wrapper.sh: skip CHILD_RUNTIME_LIMIT for primary pulse PID (governed by PULSE_STALE_THRESHOLD); skip shellcheck entirely when no timeout utility available to prevent runaway expansion
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:11:53 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/process-guard-helper.sh:
- Around line 196-200: The session counting currently excludes only macOS '??'
TTYs and still counts Linux headless '?' TTYs; update the exclusion to filter
both single '?' and '??' TTY entries by replacing the grep -v '??' usage with a
regex-based exclusion such as grep -v -E '(^|[[:space:]])\?{1,2}([[:space:]]|$)'
wherever session_count is computed (reference the session_count assignment) and
apply the same change in the cmd_sessions() and cmd_status() code paths (and the
other occurrence noted around line 335) so non-interactive '?' and '??' entries
are correctly ignored.
In @.agents/scripts/pulse-wrapper.sh:
- Around line 953-957: The check_session_count() pipeline misclassifies headless
Linux processes because it only excludes the macOS '??' tty; update the pipeline
that computes interactive_count to exclude both '?' and '??' ttys (or better,
filter by the TTY column using awk to only count rows where the TTY is not "?"
and not "??") so headless processes aren't counted as interactive; apply this
change to the interactive_count computation that currently uses ps axo
pid,tty,command and the grep chain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c2eab22e-6741-4468-81eb-5418d0d4a017
📒 Files selected for processing (3)
.agents/scripts/linters-local.sh.agents/scripts/process-guard-helper.sh.agents/scripts/pulse-wrapper.sh
- Replace grep|wc-l session counting with awk that excludes both '?' (Linux headless) and '??' (macOS headless) TTYs — fixes misclassification of headless processes as interactive sessions on Linux - Eliminates SC2126 (grep|wc-l) violations flagged by CodeFactor - Applied consistently across all 4 session counting locations: process-guard-helper.sh (cmd_scan, cmd_sessions, cmd_status) and pulse-wrapper.sh (check_session_count) Addresses CodeRabbit second review and CodeFactor CI failure.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:25:35 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:27:55 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
Worker PID 92634 killed by supervisor pulse — thrashing detected (struggle_ratio: 23450, 7h19m uptime, 0 commits). Re-dispatching Issue #2854 for a fresh attempt. |
|
@coderabbitai review All 8 findings from both review rounds have been addressed:
Please re-review to verify and clear the CHANGES_REQUESTED status. |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:49:20 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
|
@coderabbitai review All findings from both review rounds have been addressed: Round 1 (6 actionable comments):
Round 2 (2 actionable comments):
Additional improvements (Gemini Code Assist):
Verification:
|
|
🧠 Learnings used✅ Actions performedFull review triggered. |
…ell reduction (t1398) - Fix session counting to filter both '?' (Linux) and '??' (macOS) headless TTY entries using awk instead of grep -v '??' (4 locations across process-guard-helper.sh and pulse-wrapper.sh check_session_count) - Remove blanket 2>/dev/null on ps commands in process-guard while loops and pulse-wrapper guard_child_processes; keep || true for error handling - Replace echo|tr|wc subshell pipeline with parameter expansion for colon counting in _get_process_age (zero subshell overhead)
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:54:15 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
@coderabbitai resolve |
✅ Actions performedComments resolved and changes approved. |
…<, remove 2>/dev/null (t1398) pulse-wrapper.sh: - guard_child_processes: use 'command' field instead of 'comm' for full command line matching (node.*opencode pattern now works) - Extract cmd_base from full command path for limit selection - Replace echo|grep -c and echo|head with <<<$result herestrings in quality sweep loop (avoids spawning subshells per iteration) - Remove blanket 2>/dev/null from ps axo commands in prefetch_active_workers and _update_health_issue_for_repo (keep || true for pipeline failure tolerance)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 02:59:19 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
…arning When no timeout/gtimeout utility is available, the portable fallback wrapper now: 1. Prints a one-time warning that degraded protection is active 2. Uses process group kill (kill -- -PID) to clean up ShellCheck and all its child processes, falling back to single-process kill if the subshell isn't a process group leader 3. Simplifies the per-file timeout warning (redundant 'no timeout utility' detail already covered by the one-time warning) Addresses CodeRabbit review feedback from PR #2855 about the fallback path reopening the runaway ShellCheck risk. Closes #2900
…arning (#2923) When no timeout/gtimeout utility is available, the portable fallback wrapper now: 1. Prints a one-time warning that degraded protection is active 2. Uses process group kill (kill -- -PID) to clean up ShellCheck and all its child processes, falling back to single-process kill if the subshell isn't a process group leader 3. Simplifies the per-file timeout warning (redundant 'no timeout utility' detail already covered by the one-time warning) Addresses CodeRabbit review feedback from PR #2855 about the fallback path reopening the runaway ShellCheck risk. Closes #2900



Summary
Root-cause fix for the March 3 kernel panic and ongoing memory pressure caused by aidevops itself. Replaces the concept from PR #2792 (external contributor) which monitored the wrong signals.
pulse-wrapper.sh:guard_child_processes()runs every 60s in the watchdog loop, killing any child process exceeding configurable RSS (2GB) or runtime (10min) limits. ShellCheck gets stricter limits (1GB/5min) due to known exponential expansionlinters-local.shnow runs per-file with 30s timeout instead of batch mode, preventing a single file from consuming 5+ GB RAM. Quality sweep inpulse-wrapper.shalso gets per-file timeoutcheck_session_count()warns when >5 concurrent interactive sessions are open, displayed in the health issue dashboardprocess-guard-helper.sh: scan, kill-runaways, sessions, and status commands for manual/cron use — monitors individual RSS, runtime, and process count (the right signals)Acceptance Criteria
pulse-wrapper.shthat kills child processes exceeding RSS/runtime limitsprocess-guard-helper.sh) monitoring the right signalsVerification
bash -nsyntax check passes on all 3 modified/new filesshellcheck --severity=warningpasses on all files (without-xwhich itself causes the exponential expansion)process-guard-helper.sh help/status/sessions/scanall produce correct outputpulse-wrapper.shis sourcedCloses #2854
Summary by CodeRabbit
Improvements
New Features