Skip to content

t1398: Fix runaway memory consumption — process guards, ShellCheck limits, session awareness#2855

Merged
marcusquinn merged 8 commits intomainfrom
bugfix/t1398-memory-guards
Mar 5, 2026
Merged

t1398: Fix runaway memory consumption — process guards, ShellCheck limits, session awareness#2855
marcusquinn merged 8 commits intomainfrom
bugfix/t1398-memory-guards

Conversation

@alex-solovyev
Copy link
Collaborator

@alex-solovyev alex-solovyev commented Mar 5, 2026

Summary

Root-cause fix for the March 3 kernel panic and ongoing memory pressure caused by aidevops itself. Replaces the concept from PR #2792 (external contributor) which monitored the wrong signals.

  • Process guard in pulse-wrapper.sh: guard_child_processes() runs every 60s in the watchdog loop, killing any child process exceeding configurable RSS (2GB) or runtime (10min) limits. ShellCheck gets stricter limits (1GB/5min) due to known exponential expansion
  • ShellCheck hardened: linters-local.sh now runs per-file with 30s timeout instead of batch mode, preventing a single file from consuming 5+ GB RAM. Quality sweep in pulse-wrapper.sh also gets per-file timeout
  • Session count awareness: check_session_count() warns when >5 concurrent interactive sessions are open, displayed in the health issue dashboard
  • Standalone process-guard-helper.sh: scan, kill-runaways, sessions, and status commands for manual/cron use — monitors individual RSS, runtime, and process count (the right signals)

Acceptance Criteria

  • Process guard function in pulse-wrapper.sh that kills child processes exceeding RSS/runtime limits
  • ShellCheck invocation hardened against exponential expansion (per-file timeout)
  • Zombie pulse self-watchdog enhanced with process guard in watchdog loop
  • Session count awareness (warning when >5 concurrent sessions detected)
  • Memory pressure monitor script (process-guard-helper.sh) monitoring the right signals

Verification

  • bash -n syntax check passes on all 3 modified/new files
  • shellcheck --severity=warning passes on all files (without -x which itself causes the exponential expansion)
  • process-guard-helper.sh help/status/sessions/scan all produce correct output
  • Functions properly exported when pulse-wrapper.sh is sourced

Closes #2854

Summary by CodeRabbit

  • Improvements

    • Per-file linting with timeouts to avoid hangs and surface partial results.
    • Automated process-guarding to limit runaway memory/runtime and reduce system overload.
    • Health/dashboard now shows interactive session counts with threshold warnings.
  • New Features

    • Watchdog integration to enforce child-process limits and optionally skip linting when timeout tooling is unavailable.

…398)

Root-cause fix for runaway memory consumption (March 3 kernel panic):

1. Process guard in pulse-wrapper.sh: guard_child_processes() runs every
   60s in the watchdog loop, killing any child process exceeding
   configurable RSS (2GB default) or runtime (10min default) limits.
   ShellCheck gets stricter limits (1GB/5min) due to known exponential
   expansion with --external-sources.

2. ShellCheck hardened: linters-local.sh now runs per-file with timeout
   (30s each) instead of batch mode. Prevents a single file with
   recursive source directives from consuming 5+ GB RAM. Quality sweep
   in pulse-wrapper.sh also gets per-file timeout.

3. Session count awareness: check_session_count() warns when >5
   concurrent interactive sessions are open. Count displayed in the
   health issue dashboard with threshold warning.

4. Standalone process-guard-helper.sh: scan, kill-runaways, sessions,
   and status commands for manual/cron use. Monitors the right signals
   (individual RSS, runtime, process count) — replaces PR #2792 concept.

Closes #2854
@github-actions github-actions bot added the bug Auto-created from TODO.md tag label Mar 5, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical memory consumption issues and kernel panics by implementing robust process monitoring and resource management. It introduces proactive measures to identify and terminate runaway processes, particularly those with high memory usage or long execution times, and enhances the stability of the system by preventing resource exhaustion. The changes also improve visibility into system health by tracking interactive session counts.

Highlights

  • Process Guard Implementation: Introduced a process guard in pulse-wrapper.sh that runs every 60 seconds. It monitors child processes, killing any that exceed configurable RSS (2GB default) or runtime (10min default) limits. ShellCheck processes are given stricter limits (1GB/5min) due to their potential for exponential expansion.
  • ShellCheck Hardening: Modified linters-local.sh to run ShellCheck in a per-file mode with a 30-second timeout, preventing a single problematic file from consuming excessive resources. The quality sweep in pulse-wrapper.sh also received per-file timeouts for ShellCheck.
  • Session Count Awareness: Added a mechanism to pulse-wrapper.sh to check and warn if more than 5 concurrent interactive sessions are open, displaying this information in the health issue dashboard.
  • Standalone Process Guard Helper: Created a new script, process-guard-helper.sh, which provides standalone commands for scanning processes, killing runaways, reporting session counts, and providing a full status report in JSON format. This script can be used manually or via cron.
Changelog
  • .agents/scripts/linters-local.sh
    • Refactored ShellCheck execution from batch mode to per-file mode.
    • Implemented a 30-second timeout for each ShellCheck invocation to prevent resource exhaustion.
    • Added logic to detect and report files that cause ShellCheck to time out.
  • .agents/scripts/process-guard-helper.sh
    • Added a new script to provide standalone process monitoring and control functionalities.
    • Included functions to get process age, scan processes for violations, kill runaway processes, report interactive session counts, and provide a JSON status report.
    • Defined configurable limits for child process RSS and runtime, with specific overrides for ShellCheck.
  • .agents/scripts/pulse-wrapper.sh
    • Added new configuration variables for process guard limits (RSS, runtime) and session count warnings.
    • Implemented guard_child_processes function to monitor and kill child processes exceeding defined resource limits.
    • Implemented check_session_count function to count interactive sessions and log warnings.
    • Integrated guard_child_processes into the main watchdog loop, executing every 60 seconds.
    • Updated the health dashboard to include interactive session count and warnings.
    • Added a 30-second timeout to ShellCheck invocations within the quality sweep function.
    • Ensured new configuration variables are validated.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

Warning

Rate limit exceeded

@alex-solovyev has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca1dd81f-91a0-4d72-a8c3-03e1405be4c4

📥 Commits

Reviewing files that changed from the base of the PR and between 959e12d and 28b9e02.

📒 Files selected for processing (3)
  • .agents/scripts/linters-local.sh
  • .agents/scripts/process-guard-helper.sh
  • .agents/scripts/pulse-wrapper.sh

Walkthrough

Adds a standalone process-guard utility and integrates child-process/resource guards and session-count checks into the pulse watchdog; per-file ShellCheck runs are capped with a timeout and results aggregated, with kill/runaway logic for processes exceeding RSS or runtime limits.

Changes

Cohort / File(s) Summary
Process Guard Utility
.agents/scripts/process-guard-helper.sh
New helper script providing cmd_scan(), cmd_kill_runaways(), cmd_sessions(), cmd_status(), and cmd_help(); configurable RSS/runtime/session thresholds, cross-platform process-age calculation, JSON status output, and logging of kill events.
Pulse Watchdog Integration
.agents/scripts/pulse-wrapper.sh
Adds guard_child_processes() and check_session_count(); calls guard in watchdog loop and startup; integrates session count into health dashboard; selects timeout/gtimeout and wraps shellcheck calls when available; numeric validation for new config vars.
ShellCheck Hardening Script
.agents/scripts/linters-local.sh
Switches from a single batch ShellCheck run to per-file checks guarded by a timeout (default 30s); detects and reports per-file timeouts/killed processes, caps/aggregates findings, and adjusts pass/fail messaging.

Sequence Diagram

sequenceDiagram
    autonumber
    participant Watchdog as "Pulse Watchdog"
    participant GuardHelper as "process-guard-helper.sh"
    participant GuardFn as "guard_child_processes()"
    participant Child as "Child Processes\n(shellcheck, node, opencode)"
    participant Dashboard as "Health Dashboard"

    Watchdog->>GuardFn: invoke (periodic / startup)
    GuardFn->>GuardHelper: cmd_scan()
    GuardHelper->>Child: query RSS & runtime
    Child-->>GuardHelper: return metrics
    GuardHelper->>GuardHelper: evaluate limits (RSS, runtime)
    alt violation detected
        GuardHelper->>Child: kill violating PID(s)
        GuardHelper->>GuardHelper: log event
    end
    GuardHelper-->>GuardFn: scan results
    GuardFn->>Dashboard: update Interactive Sessions & guard status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🛡️ ShellCheck trimmed, the watchdog wakes,
Tiny guards deployed for runaway fates,
Sessions counted, limits set in rows,
Memory tamed where wild usage grows,
Logs keep watch while calm repose.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: fixing runaway memory consumption through three complementary fixes (process guards, ShellCheck limits, session awareness).
Linked Issues check ✅ Passed All four coding requirements from #2854 are fully addressed: process guard kills child processes exceeding RSS/runtime [✓], ShellCheck hardened with per-file timeouts [✓], session count awareness with warnings [✓], memory pressure monitor script provided [✓].
Out of Scope Changes check ✅ Passed All changes are directly scoped to #2854: guard_child_processes() and check_session_count() in pulse-wrapper.sh, hardened linters-local.sh with timeouts, new process-guard-helper.sh monitor, and safety/portability fixes supporting the main objectives.
Docstring Coverage ✅ Passed Docstring coverage is 93.75% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/t1398-memory-guards

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 01:42:28 UTC 2026: Code review monitoring started
Thu Mar 5 01:42:28 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 01:42:31 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces robust measures to prevent runaway memory consumption by implementing process guards, hardening ShellCheck invocations with timeouts, and adding awareness for high session counts. However, a critical command injection vulnerability was identified in the new process-guard-helper.sh script due to the lack of validation for environment variables used in Bash arithmetic contexts; it is recommended to apply the same level of protection as in pulse-wrapper.sh. Additionally, the review also focuses on improving script efficiency, ensuring adherence to project error handling guidelines, and fixing a potential bug in process discovery.

Comment on lines +41 to +45
CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}"
CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}"
SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}"
SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}"
SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The script uses several environment variables (CHILD_RSS_LIMIT_KB, CHILD_RUNTIME_LIMIT, SHELLCHECK_RSS_LIMIT_KB, SHELLCHECK_RUNTIME_LIMIT, SESSION_COUNT_WARN) directly in Bash arithmetic expansion $(( ... )) and arithmetic comparisons [[ ... -gt ... ]] without validation. In Bash, these constructs evaluate the contents of the variables as expressions, which can include command execution via $(...) or `...`. An attacker who can influence these environment variables can achieve arbitrary command execution.

This is a known risk that is correctly mitigated in the sibling script pulse-wrapper.sh using _validate_int, but it was missed here in the new process-guard-helper.sh script.

Suggested change
CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}"
CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}"
SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}"
SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}"
SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}"
CHILD_RSS_LIMIT_KB="${CHILD_RSS_LIMIT_KB:-2097152}"
CHILD_RUNTIME_LIMIT="${CHILD_RUNTIME_LIMIT:-600}"
SHELLCHECK_RSS_LIMIT_KB="${SHELLCHECK_RSS_LIMIT_KB:-1048576}"
SHELLCHECK_RUNTIME_LIMIT="${SHELLCHECK_RUNTIME_LIMIT:-300}"
SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}"
# Validate numeric configuration to prevent command injection via arithmetic expansion
for var in CHILD_RSS_LIMIT_KB CHILD_RUNTIME_LIMIT SHELLCHECK_RSS_LIMIT_KB SHELLCHECK_RUNTIME_LIMIT SESSION_COUNT_WARN; do
if [[ ! "${!var}" =~ ^[0-9]+$ ]]; then
echo "Error: $var must be a positive integer. Got: ${!var}" >&2
exit 1
fi
done

fi

printf "%-8s %-6s %-10s %-12s %-8s %s\n" "$pid" "${rss_mb}MB" "$etime" "$comm" "$status" "$detail"
done < <(ps axo pid,rss,etime,comm 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The ps command uses the comm field, which provides only the command name (e.g., node). However, your grep pattern node.*opencode suggests an intent to match against the full command line. With comm, this grep will likely fail to find the intended node processes. Consider using the command field instead of comm to match against the full command line. Note that this change will require adjusting the line parsing logic in the while loop, as the command output can contain spaces.

Suggested change
done < <(ps axo pid,rss,etime,comm 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true)
done < <(ps axo pid,rss,etime,command 2>/dev/null | grep -E 'opencode|shellcheck|node.*opencode' | grep -v grep || true)

SESSION_COUNT_WARN="${SESSION_COUNT_WARN:-5}"
LOGFILE="${HOME}/.aidevops/logs/process-guard.log"

mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors, such as permission issues when creating the directory. This is contrary to the project's general rules which advise against blanket error suppression. The || true is sufficient to prevent the script from exiting on failure, while still allowing underlying errors to be visible for debugging.

Suggested change
mkdir -p "$(dirname "$LOGFILE")" 2>/dev/null || true
mkdir -p "$(dirname "$LOGFILE")" || true
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
  2. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

Comment on lines +118 to +121
pid=$(echo "$line" | awk '{print $1}')
rss=$(echo "$line" | awk '{print $2}')
etime=$(echo "$line" | awk '{print $3}')
comm=$(echo "$line" | awk '{print $4}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using multiple awk and echo commands inside this while loop is inefficient, as it spawns several new processes for each line of input. You can parse the fields more efficiently using the read built-in. This same pattern appears in cmd_kill_runaways and cmd_status and should be updated there as well.

Suggested change
pid=$(echo "$line" | awk '{print $1}')
rss=$(echo "$line" | awk '{print $2}')
etime=$(echo "$line" | awk '{print $3}')
comm=$(echo "$line" | awk '{print $4}')
read -r pid rss etime comm _ <<<"$line"
References
  1. When parsing multiple key-value pairs from a single source in shell scripts, use a single 'while IFS='=' read -r key value' loop instead of repeated 'grep | cut' pipelines to improve efficiency and readability.
  2. In shell scripts, use pure Bash parameter expansion for string manipulations like calculating indentation within loops to avoid spawning subshells and improve performance.

Comment on lines +887 to +890
pid=$(echo "$line" | awk '{print $1}')
rss=$(echo "$line" | awk '{print $3}')
etime=$(echo "$line" | awk '{print $4}')
comm=$(echo "$line" | awk '{print $5}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop spawns multiple awk and echo processes for each line, which is inefficient. You can parse the line more efficiently using the read built-in.

Suggested change
pid=$(echo "$line" | awk '{print $1}')
rss=$(echo "$line" | awk '{print $3}')
etime=$(echo "$line" | awk '{print $4}')
comm=$(echo "$line" | awk '{print $5}')
read -r pid ppid rss etime comm _ <<<"$line"
References
  1. When parsing multiple key-value pairs from a single source in shell scripts, use a single 'while IFS='=' read -r key value' loop instead of repeated 'grep | cut' pipelines to improve efficiency and readability.
  2. In shell scripts, use pure Bash parameter expansion for string manipulations like calculating indentation within loops to avoid spawning subshells and improve performance.

local result
result=$(shellcheck -f gcc "$shfile" 2>/dev/null || true)
# t1398: timeout each shellcheck invocation to prevent exponential expansion
result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" 2>/dev/null || true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Suppressing stderr with 2>/dev/null can hide important errors, such as issues with the timeout command itself or from shellcheck if it fails for reasons other than finding lint issues (e.g., invalid arguments). The || true already prevents the script from exiting on failure. It's better to allow potential errors to be visible for debugging, in line with project guidelines.

Suggested change
result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" 2>/dev/null || true)
result=$($sc_timeout_cmd shellcheck -f gcc "$shfile" || true)
References
  1. Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
  2. In shell scripts with 'set -e' enabled, use '|| true' to prevent the script from exiting when a command like 'jq' fails on an optional lookup. Do not suppress stderr with '2>/dev/null' so that actual syntax or system errors remain visible for debugging.

…e subshells (t1398)

Security: add _validate_int() to process-guard-helper.sh to prevent command
injection via arithmetic expansion of env vars (CHILD_RSS_LIMIT_KB, etc.).
Mirrors the same pattern already used in pulse-wrapper.sh.

Bug: change ps 'comm' field to 'command' so grep pattern 'node.*opencode'
can match the full command line instead of just the executable name.

Efficiency: replace awk/echo subshell spawning in while loops with read
builtin for field parsing in both process-guard-helper.sh and pulse-wrapper.sh.

Style: remove 2>/dev/null from mkdir -p and shellcheck invocations — the
|| true is sufficient to prevent set -e exits while keeping errors visible.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 01:51:12 UTC 2026: Code review monitoring started
Thu Mar 5 01:51:12 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 01:51:14 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/linters-local.sh:
- Around line 349-351: The fallback branch currently runs shellcheck without a
timeout (the line assigning file_result from shellcheck -x -P SCRIPTDIR ...),
risking long-running expansions; replace that direct call by invoking a small
portable timeout wrapper (e.g., start shellcheck in the background, capture its
PID, launch a watcher that sleeps a configured TIMEOUT and kills the PID if
still running) and assign its stdout/stderr to file_result the same way; ensure
the wrapper is used only when neither timeout nor gtimeout is available and
references the same shellcheck invocation (the shellcheck -x -P SCRIPTDIR
--severity=warning --format=gcc "$file") and preserves exit handling (|| true).

In @.agents/scripts/process-guard-helper.sh:
- Line 153: The ps invocation uses the comm field so the 'node.*opencode'
pattern never matches; update the three occurrences (the ps calls at the shown
snippet and the other two similar sites) to use the full command/args column
(e.g., replace "ps axo pid,rss,etime,comm" with "ps axo pid,rss,etime,args" or
"ps -eo pid,rss,etime,cmd" depending on portability) so grep
'opencode|shellcheck|node.*opencode' can match node processes wrapping opencode;
keep the same grep logic and ensure all three occurrences are changed
consistently.
- Around line 41-45: The env-driven numeric limits (CHILD_RSS_LIMIT_KB,
CHILD_RUNTIME_LIMIT, SHELLCHECK_RSS_LIMIT_KB, SHELLCHECK_RUNTIME_LIMIT,
SESSION_COUNT_WARN) are used directly and may contain non-numeric values; add a
small sanitizer that validates each value is an integer (e.g. regex check like
^[0-9]+$ or use arithmetic-safe parsing) and if invalid or empty, resets the
variable to its default fallback; implement a helper function (e.g.
sanitize_numeric_env) and call it for each of the five variables before any
arithmetic/comparisons to ensure safe numeric usage and avoid set -e failures.
- Around line 183-223: The process-guard currently doesn't check whether a
process is attached to a TTY and so may kill interactive user sessions; modify
the loop that reads ps output (where pid, rss, comm are parsed and
_get_process_age is called) to also capture the TTY column (add tty to the ps
axo fields), and skip any PID whose TTY is not "?" (or matches pts/*/tty
devices) before applying CHILD_RSS_LIMIT_KB or CHILD_RUNTIME_LIMIT (and the
SHELLCHECK_* overrides); only evaluate and potentially kill processes with no
controlling terminal. Ensure you reference the same variables (pid, comm, rss,
age_seconds, violation) and leave killing logic unchanged for non-interactive
processes.

In @.agents/scripts/pulse-wrapper.sh:
- Around line 899-914: The runtime-kill branch currently applies
CHILD_RUNTIME_LIMIT to all processes (except shellcheck) causing the primary
pulse process to be killed around 10 minutes; update the logic to skip applying
CHILD_RUNTIME_LIMIT for the main pulse process by checking the process id (pid)
against the primary pulse PID variable (PULSE_PID) and instead allow it to be
governed by PULSE_STALE_THRESHOLD (or skip runtime violation for pid ==
PULSE_PID), i.e., add a condition around the age_seconds > runtime_limit check
that ignores the CHILD_RUNTIME_LIMIT when pid == "$PULSE_PID" (while keeping the
existing comm, rss, rss_limit, SHELLCHECK_* handling intact).
- Around line 1847-1859: The loop unconditionally runs shellcheck with
"$sc_timeout_cmd" which is empty when neither timeout nor gtimeout exist,
leaving an unbounded execution; update the while-loop around the shellcheck
invocation in the same block (references: sc_timeout_cmd, result, the while IFS=
read -r shfile loop and the shellcheck -f gcc invocation) to check if
sc_timeout_cmd is non-empty before executing shellcheck and otherwise skip or
mark the file as unchecked (e.g., log a warning and continue), so that
shellcheck is never invoked without a timeout tool present.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 67c1ab7d-28aa-4716-98ad-018b7c5be5f3

📥 Commits

Reviewing files that changed from the base of the PR and between fcfc842 and 3e23e67.

📒 Files selected for processing (3)
  • .agents/scripts/linters-local.sh
  • .agents/scripts/process-guard-helper.sh
  • .agents/scripts/pulse-wrapper.sh

@alex-solovyev
Copy link
Collaborator Author

Worker PID 16612 killed by supervisor pulse — struggle_ratio 10620 (10620 messages, 0 commits) after 3h27m. The worker was thrashing with no output. Re-dispatching with a fresh worker to address the CodeRabbit CHANGES_REQUESTED feedback. Issue #2854 reset to available for re-dispatch.

@alex-solovyev
Copy link
Collaborator Author

Worker PID 41971 killed by supervisor pulse — struggle_ratio 10649 (10649 messages, 0 commits, 4h13m runtime). This indicates the worker was thrashing without making progress. Re-dispatching with a fresh worker on the same branch. Resolves #2854.

…idation (t1398)

- linters-local.sh: portable background+watcher timeout for shellcheck
  fallback when neither timeout nor gtimeout is available
- process-guard-helper.sh: add tty field to ps, skip TTY-attached
  (interactive) processes from kill logic; _validate_int() sanitizes
  all numeric env vars before arithmetic expansion
- pulse-wrapper.sh: skip CHILD_RUNTIME_LIMIT for primary pulse PID
  (governed by PULSE_STALE_THRESHOLD); skip shellcheck entirely when
  no timeout utility available to prevent runaway expansion
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:11:51 UTC 2026: Code review monitoring started
Thu Mar 5 02:11:51 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:11:53 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/scripts/process-guard-helper.sh:
- Around line 196-200: The session counting currently excludes only macOS '??'
TTYs and still counts Linux headless '?' TTYs; update the exclusion to filter
both single '?' and '??' TTY entries by replacing the grep -v '??' usage with a
regex-based exclusion such as grep -v -E '(^|[[:space:]])\?{1,2}([[:space:]]|$)'
wherever session_count is computed (reference the session_count assignment) and
apply the same change in the cmd_sessions() and cmd_status() code paths (and the
other occurrence noted around line 335) so non-interactive '?' and '??' entries
are correctly ignored.

In @.agents/scripts/pulse-wrapper.sh:
- Around line 953-957: The check_session_count() pipeline misclassifies headless
Linux processes because it only excludes the macOS '??' tty; update the pipeline
that computes interactive_count to exclude both '?' and '??' ttys (or better,
filter by the TTY column using awk to only count rows where the TTY is not "?"
and not "??") so headless processes aren't counted as interactive; apply this
change to the interactive_count computation that currently uses ps axo
pid,tty,command and the grep chain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c2eab22e-6741-4468-81eb-5418d0d4a017

📥 Commits

Reviewing files that changed from the base of the PR and between 3e23e67 and 959e12d.

📒 Files selected for processing (3)
  • .agents/scripts/linters-local.sh
  • .agents/scripts/process-guard-helper.sh
  • .agents/scripts/pulse-wrapper.sh

- Replace grep|wc-l session counting with awk that excludes both '?' (Linux
  headless) and '??' (macOS headless) TTYs — fixes misclassification of
  headless processes as interactive sessions on Linux
- Eliminates SC2126 (grep|wc-l) violations flagged by CodeFactor
- Applied consistently across all 4 session counting locations:
  process-guard-helper.sh (cmd_scan, cmd_sessions, cmd_status) and
  pulse-wrapper.sh (check_session_count)

Addresses CodeRabbit second review and CodeFactor CI failure.
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:25:32 UTC 2026: Code review monitoring started
Thu Mar 5 02:25:32 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:25:35 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:27:53 UTC 2026: Code review monitoring started
Thu Mar 5 02:27:53 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:27:55 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@marcusquinn
Copy link
Owner

Worker PID 92634 killed by supervisor pulse — thrashing detected (struggle_ratio: 23450, 7h19m uptime, 0 commits). Re-dispatching Issue #2854 for a fresh attempt.

@marcusquinn
Copy link
Owner

@coderabbitai review

All 8 findings from both review rounds have been addressed:

  1. _validate_int() in process-guard-helper.sh — Added (commit db8a514)
  2. ps command field instead of comm — Fixed (commit db8a514)
  3. TTY guard for interactive sessions — Added (commit 959e12d)
  4. Portable timeout fallback in linters-local.sh — Added (commit 959e12d)
  5. Pulse PID exemption from CHILD_RUNTIME_LIMIT — Added (commit 959e12d)
  6. ShellCheck skip when no timeout utility — Added (commit 959e12d)
  7. Linux ? TTY exclusion in session counting — Fixed (commit a6ddf5b)
  8. SC2034 ppid_unused renamed to _ppid — Fixed (commit 2399497)

Please re-review to verify and clear the CHANGES_REQUESTED status.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:49:17 UTC 2026: Code review monitoring started
Thu Mar 5 02:49:17 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:49:20 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

@marcusquinn: Thank you for the thorough summary — solid engineering discipline tracking all 8 findings to resolution across targeted commits. Triggering a full re-review now to verify each fix and assess the complete state of the PR.


🧠 Learnings used
Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

Learnt from: alex-solovyev
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-04T05:50:49.615Z
Learning: In the marcusquinn/aidevops repository: As of 2026-03-04, the chrome-webstore-helper.sh SC2119/SC2120 (load_credentials argument mismatch) and stash-audit-helper.sh SC2181 (indirect exit code checking) ShellCheck issues have been resolved. ShellCheck now reports 0 errors and 0 warnings for these files.
✅ Actions performed

Full review triggered.

@marcusquinn
Copy link
Owner

@coderabbitai review

All findings from both review rounds have been addressed:

Round 1 (6 actionable comments):

  • process-guard-helper.sh: Added _validate_int() for numeric env var sanitization (security/injection) — commit db8a514
  • process-guard-helper.sh: Changed ps comm to ps command in all 3 occurrences for node.*opencode matching — commit db8a514
  • process-guard-helper.sh: Added TTY check to skip interactive sessions in kill-runaways (critical) — commit 959e12d
  • pulse-wrapper.sh: Protected primary pulse PID from child runtime kill rule — commit 959e12d
  • pulse-wrapper.sh: Skip ShellCheck in quality sweep when no timeout tool available — commit 959e12d
  • linters-local.sh: Portable timeout wrapper for fallback path — commit 959e12d

Round 2 (2 actionable comments):

  • process-guard-helper.sh: Fixed session counting to exclude both ? and ?? TTYs (Linux compat) — commit a6ddf5b
  • pulse-wrapper.sh: Fixed check_session_count() to exclude both ? and ?? TTYs — commit a6ddf5b

Additional improvements (Gemini Code Assist):

  • Removed blanket 2>/dev/null from mkdir -p and shellcheck invocations
  • Replaced awk field parsing with read builtin in all process-scanning loops
  • Linux TTY compat applied consistently across cleanup_orphans(), health issue worker listing, and prefetch_active_workers()

Verification:

  • bash -n passes on all 3 files
  • shellcheck --severity=warning passes with zero violations on all 3 files
  • process-guard-helper.sh help/status/sessions/scan all produce correct output

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

@marcusquinn: Excellent work — that's a thorough and well-documented resolution across all 8 findings from both review rounds, plus the additional quality improvements from Gemini Code Assist. The progressive hardening across _validate_int(), TTY guards, portable timeout fallbacks, pulse PID exemption, and the read-builtin refactor all align well with the zero-technical-debt standard for this repo. Triggering a full re-review now to verify each fix and assess the final state of the PR.


🧠 Learnings used
Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

Learnt from: alex-solovyev
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-04T05:50:49.615Z
Learning: In the marcusquinn/aidevops repository: As of 2026-03-04, the chrome-webstore-helper.sh SC2119/SC2120 (load_credentials argument mismatch) and stash-audit-helper.sh SC2181 (indirect exit code checking) ShellCheck issues have been resolved. ShellCheck now reports 0 errors and 0 warnings for these files.
✅ Actions performed

Full review triggered.

…ell reduction (t1398)

- Fix session counting to filter both '?' (Linux) and '??' (macOS) headless
  TTY entries using awk instead of grep -v '??' (4 locations across
  process-guard-helper.sh and pulse-wrapper.sh check_session_count)
- Remove blanket 2>/dev/null on ps commands in process-guard while loops
  and pulse-wrapper guard_child_processes; keep || true for error handling
- Replace echo|tr|wc subshell pipeline with parameter expansion for
  colon counting in _get_process_age (zero subshell overhead)
@marcusquinn
Copy link
Owner

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:54:12 UTC 2026: Code review monitoring started
Thu Mar 5 02:54:12 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:54:15 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@marcusquinn
Copy link
Owner

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 5, 2026

✅ Actions performed

Comments resolved and changes approved.

…<, remove 2>/dev/null (t1398)

pulse-wrapper.sh:
- guard_child_processes: use 'command' field instead of 'comm' for full
  command line matching (node.*opencode pattern now works)
- Extract cmd_base from full command path for limit selection
- Replace echo|grep -c and echo|head with <<<$result herestrings
  in quality sweep loop (avoids spawning subshells per iteration)
- Remove blanket 2>/dev/null from ps axo commands in
  prefetch_active_workers and _update_health_issue_for_repo
  (keep || true for pipeline failure tolerance)
@marcusquinn marcusquinn merged commit ea9bada into main Mar 5, 2026
@marcusquinn marcusquinn deleted the bugfix/t1398-memory-guards branch March 5, 2026 02:58
@github-actions
Copy link

github-actions bot commented Mar 5, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 107 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Thu Mar 5 02:59:16 UTC 2026: Code review monitoring started
Thu Mar 5 02:59:17 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 107

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 107
  • VULNERABILITIES: 0

Generated on: Thu Mar 5 02:59:19 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 5, 2026

marcusquinn added a commit that referenced this pull request Mar 5, 2026
…arning

When no timeout/gtimeout utility is available, the portable fallback
wrapper now:

1. Prints a one-time warning that degraded protection is active
2. Uses process group kill (kill -- -PID) to clean up ShellCheck and
   all its child processes, falling back to single-process kill if the
   subshell isn't a process group leader
3. Simplifies the per-file timeout warning (redundant 'no timeout
   utility' detail already covered by the one-time warning)

Addresses CodeRabbit review feedback from PR #2855 about the fallback
path reopening the runaway ShellCheck risk.

Closes #2900
marcusquinn added a commit that referenced this pull request Mar 5, 2026
…arning (#2923)

When no timeout/gtimeout utility is available, the portable fallback
wrapper now:

1. Prints a one-time warning that degraded protection is active
2. Uses process group kill (kill -- -PID) to clean up ShellCheck and
   all its child processes, falling back to single-process kill if the
   subshell isn't a process group leader
3. Simplifies the per-file timeout warning (redundant 'no timeout
   utility' detail already covered by the one-time warning)

Addresses CodeRabbit review feedback from PR #2855 about the fallback
path reopening the runaway ShellCheck risk.

Closes #2900
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Auto-created from TODO.md tag

Projects

None yet

Development

Successfully merging this pull request may close these issues.

t1398: fix: Runaway memory consumption

2 participants