fix: add process guard to setup.sh and enable pulse RunAtLoad for reboot survival#2940
Conversation
…oot survival Two changes to prevent kernel panics from runaway processes: 1. Pulse plist: RunAtLoad=true so the supervisor pulse restarts after reboot/panic. Previously RunAtLoad=false meant the pulse never restarted, leaving the system unmonitored after crashes. 2. Process guard: Added to setup.sh so all users get it on install/update. Previously the guard plist was manually created and not managed by setup. Now installs automatically: - macOS: launchd plist, 30s interval (was 120s), RunAtLoad=true - Linux: cron entry, every minute (cron minimum granularity) - No consent needed (safety net, not autonomous action) - Kills ShellCheck processes >512MB RSS or >120s runtime - Kills other AI processes >8GB RSS or >2h runtime Context: 5 kernel panics in one day from watchdog timeouts caused by ShellCheck memory bloat (2GB+ per instance) exhausting the compressor segment limit. The shellcheck-wrapper.sh (PR #2918) fixes the root cause; this ensures the guard catches any remaining edge cases.
WalkthroughThe setup.sh script adds a periodic process-guard that kills runaway processes on macOS (LaunchAgent every 30s) and Linux (cron every minute), writes logs, and enables RunAtLoad for the supervisor pulse LaunchAgent. The guard install logic is duplicated in the script. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant setup.sh as "setup.sh (installer)"
participant OS as "OS (macOS / Linux)"
participant Guard as "process-guard-helper.sh"
participant Scheduler as "LaunchAgent / cron"
User->>setup.sh: run installer
setup.sh->>OS: create plist or cron entry (with env limits, log path)
setup.sh->>Scheduler: unload/load or add cron entry
Scheduler->>Guard: invoke guard periodically
Guard->>OS: inspect processes, kill runaway PIDs, append to log
Guard->>Scheduler: exit
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances system stability by ensuring critical background processes, specifically the 'pulse' supervisor and a newly integrated process guard, automatically restart after system reboots or kernel panics. It integrates a cross-platform process guard into the Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
🧹 Nitpick comments (3)
setup.sh (3)
1015-1027: Consider addingHOMEenv var and using dynamicPATHfor consistency.The supervisor pulse plist (lines 889-898) exports both
HOMEand the user's dynamicPATH. This guard plist hardcodes a specificPATHand omitsHOME. While launchd typically providesHOMEfor user agents, explicitly setting it ensures parity with the pulse plist and avoids surprises if the helper script needs it for fallback paths.♻️ Proposed fix for environment parity
<key>EnvironmentVariables</key> <dict> <key>PATH</key> - <string>/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string> + <string>${PATH}</string> + <key>HOME</key> + <string>${HOME}</string> <key>SHELLCHECK_RSS_LIMIT_KB</key> <string>524288</string>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 1015 - 1027, The plist EnvironmentVariables block hardcodes PATH and omits HOME; update the EnvironmentVariables dict to include a HOME entry (set to the install user's home variable, e.g. $HOME or /Users/youruser) and make PATH dynamic instead of fixed by constructing it from the current environment (use the existing dynamic PATH approach used in the supervisor pulse plist) so keys like PATH and HOME are consistent with the supervisor's plist; modify the EnvironmentVariables block (referencing EnvironmentVariables, PATH, and HOME) to mirror the supervisor pulse pattern and ensure SHELLCHECK_* and CHILD_* keys remain unchanged.
1043-1048: Minor: Redundant removal guard in cron logic.Line 1043 checks if the entry doesn't exist, then line 1045 removes any existing entry anyway. This is defensive but logically redundant — if line 1043's grep succeeds (entry exists), the inner block never runs.
This is harmless (defensive coding), but you could simplify by always replacing:
♻️ Simplified cron replacement pattern
- if ! crontab -l 2>/dev/null | grep -qF "aidevops: process-guard" 2>/dev/null; then - ( - crontab -l 2>/dev/null | grep -v 'aidevops: process-guard' - echo "* * * * * SHELLCHECK_RSS_LIMIT_KB=524288 SHELLCHECK_RUNTIME_LIMIT=120 CHILD_RSS_LIMIT_KB=8388608 CHILD_RUNTIME_LIMIT=7200 /bin/bash ${guard_script} kill-runaways >> \$HOME/.aidevops/logs/process-guard.log 2>&1 # aidevops: process-guard" - ) | crontab - 2>/dev/null || true - fi + # Always replace to pick up any config changes (matches macOS behavior) + ( + crontab -l 2>/dev/null | grep -v 'aidevops: process-guard' + echo "* * * * * SHELLCHECK_RSS_LIMIT_KB=524288 SHELLCHECK_RUNTIME_LIMIT=120 CHILD_RSS_LIMIT_KB=8388608 CHILD_RUNTIME_LIMIT=7200 /bin/bash ${guard_script} kill-runaways >> \$HOME/.aidevops/logs/process-guard.log 2>&1 # aidevops: process-guard" + ) | crontab - 2>/dev/null || trueThis also creates parity with macOS behavior (lines 993-994) where the plist is always regenerated to pick up config changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 1043 - 1048, Remove the redundant existence check and always regenerate the crontab entry for the process-guard: drop the outer if ! crontab -l ... grep -qF "aidevops: process-guard" guard and instead run the block that filters out any existing "aidevops: process-guard" lines and appends the new cron line referencing ${guard_script} (the echo line containing "aidevops: process-guard"), so the script always replaces the cron entry and mirrors the macOS plist regeneration behavior.
1009-1010: No concurrency protection in process-guard-helper.sh, but low-risk given function simplicity.The
cmd_kill_runawaysfunction contains no flock, lockfile, or PID checks. With a 30-secondStartInterval, overlapping executions are possible if the script runs longer than 30 seconds. However, the function is lightweight—it only scanspsoutput and issues kill commands—so concurrent instances would simply attempt duplicate kills on the same processes, which is harmless since kill operations on already-terminated processes fail gracefully.Risk remains low unless process termination becomes a bottleneck. Consider adding a simple PID lockfile only if future expansion introduces complex operations or file modifications that genuinely require mutual exclusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 1009 - 1010, The cmd_kill_runaways function currently has no concurrency protection; add a simple PID lockfile or use flock to prevent overlapping runs: at the start of cmd_kill_runaways (in process-guard-helper.sh) attempt to acquire an exclusive lock (e.g., create a lockfile or use flock on a dedicated lock descriptor) and if the lock cannot be acquired, exit early; ensure the lock is removed or released on normal exit and in traps for SIGINT/SIGTERM/EXIT so the lock isn't left stale; keep StartInterval behavior unchanged but guard against concurrent executions by wrapping cmd_kill_runaways with this lock acquisition/release logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@setup.sh`:
- Around line 1015-1027: The plist EnvironmentVariables block hardcodes PATH and
omits HOME; update the EnvironmentVariables dict to include a HOME entry (set to
the install user's home variable, e.g. $HOME or /Users/youruser) and make PATH
dynamic instead of fixed by constructing it from the current environment (use
the existing dynamic PATH approach used in the supervisor pulse plist) so keys
like PATH and HOME are consistent with the supervisor's plist; modify the
EnvironmentVariables block (referencing EnvironmentVariables, PATH, and HOME) to
mirror the supervisor pulse pattern and ensure SHELLCHECK_* and CHILD_* keys
remain unchanged.
- Around line 1043-1048: Remove the redundant existence check and always
regenerate the crontab entry for the process-guard: drop the outer if ! crontab
-l ... grep -qF "aidevops: process-guard" guard and instead run the block that
filters out any existing "aidevops: process-guard" lines and appends the new
cron line referencing ${guard_script} (the echo line containing "aidevops:
process-guard"), so the script always replaces the cron entry and mirrors the
macOS plist regeneration behavior.
- Around line 1009-1010: The cmd_kill_runaways function currently has no
concurrency protection; add a simple PID lockfile or use flock to prevent
overlapping runs: at the start of cmd_kill_runaways (in process-guard-helper.sh)
attempt to acquire an exclusive lock (e.g., create a lockfile or use flock on a
dedicated lock descriptor) and if the lock cannot be acquired, exit early;
ensure the lock is removed or released on normal exit and in traps for
SIGINT/SIGTERM/EXIT so the lock isn't left stale; keep StartInterval behavior
unchanged but guard against concurrent executions by wrapping cmd_kill_runaways
with this lock acquisition/release logic.
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 23:19:56 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request enhances system stability by ensuring the supervisor pulse survives reboots and by adding a new process guard to prevent runaway processes from causing kernel panics. However, the implementation in setup.sh introduces several security vulnerabilities due to insecure command and configuration generation. Specifically, unquoted variables in crontab entries and unsanitized input in launchd plist generation can lead to command injection or malformed configurations. Beyond security, there are opportunities to improve consistency and debuggability by adjusting PATH handling and removing unnecessary error suppression, aligning with project conventions. Addressing these points will improve both the security and maintainability of the new process guard.
setup.sh
Outdated
| if ! crontab -l 2>/dev/null | grep -qF "aidevops: process-guard" 2>/dev/null; then | ||
| ( | ||
| crontab -l 2>/dev/null | grep -v 'aidevops: process-guard' | ||
| echo "* * * * * SHELLCHECK_RSS_LIMIT_KB=524288 SHELLCHECK_RUNTIME_LIMIT=120 CHILD_RSS_LIMIT_KB=8388608 CHILD_RUNTIME_LIMIT=7200 /bin/bash ${guard_script} kill-runaways >> \$HOME/.aidevops/logs/process-guard.log 2>&1 # aidevops: process-guard" |
There was a problem hiding this comment.
This crontab entry generation on line 1046 is vulnerable to command injection. The unquoted variable ${guard_script} and the redirection path \$HOME/.aidevops/logs/process-guard.log can lead to command injection or a malformed crontab entry if paths contain spaces or shell metacharacters. Beyond this specific line, the overall cron job setup lacks a defined PATH, potentially causing 'command not found' errors, and suppresses stderr with 2>/dev/null, hindering debuggability. For improved security, consistency with the macOS launchd agent, and better reliability, ensure all variables are properly quoted, a PATH is explicitly set, and error suppression is reviewed.
| echo "* * * * * SHELLCHECK_RSS_LIMIT_KB=524288 SHELLCHECK_RUNTIME_LIMIT=120 CHILD_RSS_LIMIT_KB=8388608 CHILD_RUNTIME_LIMIT=7200 /bin/bash ${guard_script} kill-runaways >> \$HOME/.aidevops/logs/process-guard.log 2>&1 # aidevops: process-guard" | |
| echo "* * * * * SHELLCHECK_RSS_LIMIT_KB=524288 SHELLCHECK_RUNTIME_LIMIT=120 CHILD_RSS_LIMIT_KB=8388608 CHILD_RUNTIME_LIMIT=7200 /bin/bash \"${guard_script}\" kill-runaways >> \"\$HOME/.aidevops/logs/process-guard.log\" 2>&1 # aidevops: process-guard" |
| <string>${guard_label}</string> | ||
| <key>ProgramArguments</key> | ||
| <array> | ||
| <string>${guard_script}</string> |
There was a problem hiding this comment.
The script generates a launchd plist file for the process guard by inserting the ${guard_script} variable directly into an XML template. If the variable contains XML special characters like </string>, it can lead to plist injection. An attacker who can control the directory name where the application is installed could potentially modify the plist structure to execute arbitrary commands via launchd.
setup.sh
Outdated
|
|
||
| # Unload old plist if upgrading | ||
| if _launchd_has_agent "$guard_label"; then | ||
| launchctl unload "$guard_plist" 2>/dev/null || true |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null can hide important error messages from launchctl, making debugging difficult if unloading the agent fails for reasons other than it not being loaded. The || true already prevents the script from exiting on failure. Removing the redirection aligns with the project's general rule to avoid blanket error suppression.
| launchctl unload "$guard_plist" 2>/dev/null || true | |
| launchctl unload "$guard_plist" || true |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
setup.sh
Outdated
| <key>EnvironmentVariables</key> | ||
| <dict> | ||
| <key>PATH</key> | ||
| <string>/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string> |
There was a problem hiding this comment.
The PATH is hardcoded in the launchd plist. This can be brittle if tools are installed in non-standard locations. Other plists in this script, like the supervisor pulse agent, use ${PATH} to capture the user's environment at setup time. For consistency and robustness, it would be better to use ${PATH} here as well.
| <string>/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin</string> | |
| <string>${PATH}</string> |
setup.sh
Outdated
| </plist> | ||
| GUARD_PLIST | ||
|
|
||
| if launchctl load "$guard_plist" 2>/dev/null; then |
There was a problem hiding this comment.
Suppressing stderr with 2>/dev/null here prevents you from seeing why launchctl load might have failed. The if statement will still correctly detect failure, but the diagnostic message explaining the cause will be lost. Removing the redirection will make debugging much easier.
| if launchctl load "$guard_plist" 2>/dev/null; then | |
| if launchctl load "$guard_plist"; then |
References
- Avoid using '2>/dev/null' for blanket suppression of command errors in shell scripts to ensure that authentication, syntax, or system issues remain visible for debugging.
…ths, remove stderr suppression
- Use dynamic ${PATH} and add ${HOME} to guard plist env vars for
consistency with the pulse plist pattern (CodeRabbit, Gemini)
- Remove 2>/dev/null from launchctl unload/load — || true handles
errors, stderr aids debugging (Gemini)
- Quote paths in cron entry to handle spaces in paths (Gemini)
- Always regenerate cron entry to pick up config changes, matching
macOS plist behavior (CodeRabbit)
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 5 23:56:42 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
There was a problem hiding this comment.
🧹 Nitpick comments (2)
setup.sh (2)
1004-1008: Consider using explicit/bin/bashwrapper for consistency.The pulse plist (lines 879-882) uses
/bin/bashfollowed by the script path, but the guard plist invokes the script directly. While this works if the script is executable with a valid shebang, using/bin/bashexplicitly is more robust (e.g., survives accidental permission changes).♻️ Optional: Match pulse pattern
<key>ProgramArguments</key> <array> + <string>/bin/bash</string> <string>${guard_script}</string> <string>kill-runaways</string> </array>🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 1004 - 1008, The ProgramArguments array for the guard plist currently invokes the script directly; modify it to call /bin/bash explicitly so it matches the pulse pattern and is robust to permission/shebang issues: update the ProgramArguments entries for the guard plist (the array that uses ${guard_script} and "kill-runaways") to instead use "/bin/bash" as the first argument, then the ${guard_script} path, then "kill-runaways" so the helper always runs under /bin/bash.
980-986: Consider adding a feature toggle for the process guard.The guard is unconditionally installed when the script exists. While the PR rationale (safety net, not autonomous) is sound, power users may want to disable it in specific scenarios (e.g., intentionally long-running processes, different memory constraints).
For consistency with other schedulers (auto-update, repo-sync, pulse), consider:
if [[ -x "$guard_script" ]] && is_feature_enabled process_guard 2>/dev/null; thenThis would allow
aidevops config set safety.process_guard falseto opt out.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@setup.sh` around lines 980 - 986, The process-guard is installed unconditionally; change the conditional around guard_script and guard_label to also check the feature toggle by calling is_feature_enabled process_guard so power users can opt out (e.g., via `aidevops config set safety.process_guard false`); update the `if [[ -x "$guard_script" ]]` test to `if [[ -x "$guard_script" ]] && is_feature_enabled process_guard` (or equivalent) and ensure any downstream logic that references guard_label/guard_script remains unchanged when the feature is disabled.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@setup.sh`:
- Around line 1004-1008: The ProgramArguments array for the guard plist
currently invokes the script directly; modify it to call /bin/bash explicitly so
it matches the pulse pattern and is robust to permission/shebang issues: update
the ProgramArguments entries for the guard plist (the array that uses
${guard_script} and "kill-runaways") to instead use "/bin/bash" as the first
argument, then the ${guard_script} path, then "kill-runaways" so the helper
always runs under /bin/bash.
- Around line 980-986: The process-guard is installed unconditionally; change
the conditional around guard_script and guard_label to also check the feature
toggle by calling is_feature_enabled process_guard so power users can opt out
(e.g., via `aidevops config set safety.process_guard false`); update the `if [[
-x "$guard_script" ]]` test to `if [[ -x "$guard_script" ]] &&
is_feature_enabled process_guard` (or equivalent) and ensure any downstream
logic that references guard_label/guard_script remains unchanged when the
feature is disabled.
…ove debuggability Address PR #2940 review feedback (GH#2947): - Add _xml_escape() helper to sanitize paths for XML plist embedding - XML-escape all path variables in both pulse and guard plist heredocs - Add PATH to cron entries for reliable command resolution - Quote all variables in cron entries to prevent word splitting - Remove 2>/dev/null from launchctl load/unload — || true handles failure, stderr now visible for debugging - Keep 2>/dev/null on crontab -l (expected when no crontab exists) and pkill (expected when no process matches)
…ove debuggability Address all HIGH and MEDIUM severity findings from PR #2940 review: - Add _xml_escape() to sanitize paths for XML/plist embedding, preventing plist injection if paths contain &, <, >, or quote characters - Apply XML escaping to all variable interpolations in pulse and guard plists - Remove blanket 2>/dev/null from launchctl load/unload calls so diagnostic errors remain visible (|| true still prevents script abort) - Quote all variables in cron entries and add explicit PATH to prevent command-not-found errors in minimal cron environments - Remove 2>/dev/null from crontab pipe writes for debuggability - Add /bin/bash wrapper to guard plist ProgramArguments for consistency with pulse plist pattern Closes #2947
…ove debuggability Address all HIGH and MEDIUM severity findings from PR #2940 review: - Add _xml_escape() to sanitize paths for XML/plist embedding, preventing plist injection if paths contain &, <, >, or quote characters - Apply XML escaping to all variable interpolations in pulse and guard plists - Remove blanket 2>/dev/null from launchctl load/unload calls so diagnostic errors remain visible (|| true still prevents script abort) - Quote all variables in cron entries and add explicit PATH to prevent command-not-found errors in minimal cron environments - Remove 2>/dev/null from crontab pipe writes for debuggability - Add /bin/bash wrapper to guard plist ProgramArguments for consistency with pulse plist pattern Closes #2947
…improve debuggability (#2996) * fix: harden setup.sh plist/cron generation against injection and improve debuggability Address all HIGH and MEDIUM severity findings from PR #2940 review: - Add _xml_escape() to sanitize paths for XML/plist embedding, preventing plist injection if paths contain &, <, >, or quote characters - Apply XML escaping to all variable interpolations in pulse and guard plists - Remove blanket 2>/dev/null from launchctl load/unload calls so diagnostic errors remain visible (|| true still prevents script abort) - Quote all variables in cron entries and add explicit PATH to prevent command-not-found errors in minimal cron environments - Remove 2>/dev/null from crontab pipe writes for debuggability - Add /bin/bash wrapper to guard plist ProgramArguments for consistency with pulse plist pattern Closes #2947 * fix: harden cron entries against shell injection and complete XML escaping Address Gemini review feedback on PR #2996: - Add _cron_escape() to wrap cron entry paths in single quotes, preventing command injection via $(…) or backticks in paths (HIGH severity) - XML-escape $PATH in both pulse and guard launchd plists — PATH can contain & which breaks XML structure (MEDIUM severity) - Remove unnecessary 2>/dev/null from grep after crontab -l pipes — crontab -l already handles the no-crontab case, and suppressing grep stderr hides diagnostic errors (MEDIUM severity)



Summary
RunAtLoadfromfalsetotrueso the supervisor pulse restarts automatically after reboot/kernel panicsetup.sh(was previously a manually-created plist not managed by setup)RunAtLoad=trueContext
5 kernel panics in one day, all the same signature:
The shellcheck-wrapper.sh (PR #2918) fixes the root cause. This PR ensures:
RunAtLoad=true)setup.shrunLive fix applied
Both plists updated on the local system immediately — no need to wait for merge.
Summary by CodeRabbit
New Features
Chores