Skip to content

fix: prevent ShellCheck memory explosion with RSS watchdog, rate limiting, and PATH fix#3146

Merged
marcusquinn merged 4 commits intomainfrom
bugfix/shellcheck-memory-watchdog
Mar 8, 2026
Merged

fix: prevent ShellCheck memory explosion with RSS watchdog, rate limiting, and PATH fix#3146
marcusquinn merged 4 commits intomainfrom
bugfix/shellcheck-memory-watchdog

Conversation

@marcusquinn
Copy link
Owner

Summary

Rebased version of #3105 — all CodeRabbit and Gemini Code Assist review feedback addressed. Rebased onto current main to eliminate stale version/planning file diffs.

Fixes the recurring ShellCheck memory explosion that caused a system crash on Mar 7 (20 GB -> 48 GB+ in minutes, 18.5 GB single shellcheck process).

Root cause analysis:

  • ulimit -v is a complete no-op on macOS ARM (setrlimit failed: invalid argument) — the wrapper's memory cap was doing nothing
  • The memory-pressure monitor polls every 60s, but shellcheck grows from 0 to 18 GB in under 60s
  • bash-language-server immediately respawns killed shellcheck processes, creating a kill-respawn-grow cycle
  • PATH shim was at the end of PATH in some contexts, so the real shellcheck was found first

Three-layer fix:

1. shellcheck-wrapper.sh — RSS watchdog + rate limiter

  • Background RSS watchdog replaces broken ulimit -v: polls child process RSS every 2s, kills at 1 GB
  • Respawn rate limiter with exponential backoff (5s -> 10s -> 20s -> ... -> 300s max) prevents kill-respawn-grow cycles
  • Hard timeout of 120s as additional safety net
  • Input validation (_validate_int) prevents tight loops from invalid env vars
  • Atomic lock in _record_kill prevents race conditions from concurrent kills
  • Preserves .real fast-path from main for binary discovery

2. memory-pressure-monitor.sh v2.1.0 — faster detection

  • Lower thresholds: warn at 1 GB, kill at 2 GB (was 2/4 GB) — secondary defense behind wrapper's 1 GB limit
  • Faster polling: 30s launchd interval (was 60s), adaptive 10s when shellcheck detected
  • Fix zsh false positives: match patterns against command basename only
  • Lower shellcheck runtime: 5 min max (was 10 min)

3. setup-modules/shell-env.sh — PATH ordering fix

  • Strip existing ~/.aidevops/bin from PATH before prepending, guaranteeing first position
  • Sticky for all shells: sanitize-and-prepend logic in .zshenv, all rc files, and fish config
  • Security: empty-PATH guard prevents trailing colon (PATH injection vector)
  • Old case guard entries cleaned up on upgrade

Review feedback addressed (from #3105)

  1. Validate watchdog tunables_validate_int() with min bounds (128/1/10)
  2. Race condition in _record_killmkdir-based atomic lock
  3. PATH trailing colon security — empty-PATH guard
  4. Remove 2>/dev/null from mkdir -p — surface permission errors
  5. PATH sticky for rc-file layers — sanitize-and-prepend in all shell configs
  6. Docstring coverage — all functions documented

Testing

  • ShellCheck: zero violations on all 3 files (info level)
  • Rebased cleanly onto current main (v2.154.3)

Closes #2915

AI DevOps and others added 4 commits March 7, 2026 18:06
…ting, and PATH fix

Three defenses against the Mar 7 crash (18.5 GB shellcheck, 31 GB total):

shellcheck-wrapper.sh:
- Replace broken ulimit -v (no-op on macOS ARM — EINVAL) with background
  RSS watchdog that polls every 2s and kills at 1 GB RSS
- Add respawn rate limiter with exponential backoff (5s, 10s, 20s... 300s)
  to prevent kill-respawn-grow cycles from bash-language-server
- Add 120s hard timeout as additional safety net

memory-pressure-monitor.sh (v2.1.0):
- Lower thresholds: warn at 1 GB, kill at 2 GB (was 2/4 GB)
- Lower shellcheck runtime: 5 min (was 10 min)
- Fix zsh false positives: match MONITORED_PATTERNS against command
  basename only, not full command line with arguments
- Adaptive polling: 10s when shellcheck detected, 30s normal (was 60s)
- Launchd plist interval: 30s (was 60s)

setup-modules/shell-env.sh:
- Fix PATH ordering: strip existing shim_dir from PATH before prepending
  to guarantee ~/.aidevops/bin is FIRST, not appended at end
- Document ulimit -v macOS ARM limitation
… fix PATH security

Address CHANGES_REQUESTED review from CodeRabbit and Gemini Code Assist:

shellcheck-wrapper.sh:
- Add _validate_int() to sanitize RSS_LIMIT_MB, WATCHDOG_INTERVAL, HARD_TIMEOUT
  before use in arithmetic/sleep (prevents tight loops from typos like 'abc')
- Fix race condition in _record_kill() with mkdir-based atomic lock
- Remove blanket 2>/dev/null from mkdir -p (surface permission errors)
- Add explicit return 0 to _record_kill()

setup-modules/shell-env.sh:
- Fix PATH trailing colon security issue in launchctl setenv (empty clean_path
  produced 'shim_dir:' which resolves '.' — PATH injection vector)
- Replace presence-only case guard with sanitize-and-prepend logic that strips
  existing shim_dir from PATH before prepending (fixes upgrade path where shim
  was appended at end, bypassing wrapper in new shells)
- Clean up stale old-form case-guard entries from .zshenv and rc files
- Update fish shell PATH line to use same sanitize-and-prepend pattern

CI failures (SonarCloud, Label PR, Monitor) are all fork permission issues
(403 'Resource not accessible by integration') — expected for external PRs.
@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions bot added the bug Auto-created from TODO.md tag label Mar 7, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

Warning

Rate limit exceeded

@marcusquinn has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 29 minutes and 5 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 400b144c-57d8-4a8c-a001-7751458a6a78

📥 Commits

Reviewing files that changed from the base of the PR and between 31b0b6a and 25290b2.

📒 Files selected for processing (3)
  • .agents/scripts/memory-pressure-monitor.sh
  • .agents/scripts/shellcheck-wrapper.sh
  • setup-modules/shell-env.sh
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/shellcheck-memory-watchdog

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Mar 7, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 125 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Mar 7 18:08:47 UTC 2026: Code review monitoring started
Sat Mar 7 18:08:47 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 125

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 125
  • VULNERABILITIES: 0

Generated on: Sat Mar 7 18:08:50 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Mar 7, 2026

@marcusquinn
Copy link
Owner Author

@coderabbitai review

Review bots were rate-limited when this PR was created (affected: coderabbitai gemini-code-assist). Requesting a review retry.

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

@marcusquinn Sure! Triggering a full review of PR #3146 now. I'll analyse the three-layer fix across shellcheck-wrapper.sh, memory-pressure-monitor.sh, and setup-modules/shell-env.sh and provide detailed feedback shortly.


🧠 Learnings used
Learnt from: marcusquinn
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-03T16:33:18.313Z
Learning: For the marcusquinn/aidevops repository: The daily quality sweep system (pulse-wrapper.sh) should only mention coderabbitai for full codebase review when there are significant changes (Quality Gate failure, +10 issues in one day, etc.), not on every automated sweep with stable metrics.

Learnt from: alex-solovyev
Repo: marcusquinn/aidevops PR: 0
File: :0-0
Timestamp: 2026-03-04T05:50:49.615Z
Learning: In the marcusquinn/aidevops repository: As of 2026-03-04, the chrome-webstore-helper.sh SC2119/SC2120 (load_credentials argument mismatch) and stash-audit-helper.sh SC2181 (indirect exit code checking) ShellCheck issues have been resolved. ShellCheck now reports 0 errors and 0 warnings for these files.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@marcusquinn
Copy link
Owner Author

This PR appears orphaned — no active worker process found and no activity for 25+ hours. Flagging for re-dispatch. If work is still in progress, remove the status:orphaned label.

@marcusquinn
Copy link
Owner Author

Merging via pulse supervisor.

@marcusquinn marcusquinn merged commit c575fba into main Mar 8, 2026
19 checks passed
@marcusquinn marcusquinn deleted the bugfix/shellcheck-memory-watchdog branch March 8, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Auto-created from TODO.md tag status:orphaned

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: bash language server spawns unbounded ShellCheck processes (11 GB RAM, crash risk)

1 participant