t4231: harden critical stats-wrapper dedup age validation#4232
t4231: harden critical stats-wrapper dedup age validation#4232alex-solovyev merged 3 commits intomainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the stats-wrapper script's process deduplication mechanism. It addresses potential issues where corrupted or missing epoch data in PID files could lead to the incorrect termination of active processes. By introducing rigorous validation for epoch timestamps and a portable method for determining process elapsed time, the changes ensure that the script reliably identifies and manages stale processes across different operating systems, improving overall system stability. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughRefactored process age calculation in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 12 18:59:14 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens the process deduplication logic in stats-wrapper.sh. The added validation for the process start epoch from the pidfile is robust and correctly prevents the erroneous termination of healthy processes due to corrupt data. The introduction of a portable fallback mechanism using ps to calculate the process's elapsed time is an excellent improvement for cross-platform reliability. The new _stats_process_elapsed_seconds function is well-implemented with a sensible fallback from etimes to parsing etime. I have one minor suggestion to make the awk script within this new function more concise and idiomatically robust.
.agents/scripts/stats-wrapper.sh
Outdated
| elapsed=$(awk -v value="$etime" ' | ||
| BEGIN { | ||
| n = split(value, parts, /[-:]/) | ||
| if (index(value, "-") > 0) { | ||
| if (n != 4) { | ||
| print "" | ||
| exit | ||
| } | ||
| total = (parts[1] * 86400) + (parts[2] * 3600) + (parts[3] * 60) + parts[4] | ||
| } else if (n == 3) { | ||
| total = (parts[1] * 3600) + (parts[2] * 60) + parts[3] | ||
| } else if (n == 2) { | ||
| total = (parts[1] * 60) + parts[2] | ||
| } else { | ||
| print "" | ||
| exit | ||
| } | ||
| print total | ||
| } | ||
| ') |
There was a problem hiding this comment.
To improve maintainability and ensure the awk command is independently robust, using exit 1 on invalid input is a cleaner way to signal an error than print ""; exit. This aligns with the guideline to make awk commands robust and handle potential errors, even when outer logic provides safeguards. This also allows for removing duplicated error-handling blocks, improving overall maintainability.
| elapsed=$(awk -v value="$etime" ' | |
| BEGIN { | |
| n = split(value, parts, /[-:]/) | |
| if (index(value, "-") > 0) { | |
| if (n != 4) { | |
| print "" | |
| exit | |
| } | |
| total = (parts[1] * 86400) + (parts[2] * 3600) + (parts[3] * 60) + parts[4] | |
| } else if (n == 3) { | |
| total = (parts[1] * 3600) + (parts[2] * 60) + parts[3] | |
| } else if (n == 2) { | |
| total = (parts[1] * 60) + parts[2] | |
| } else { | |
| print "" | |
| exit | |
| } | |
| print total | |
| } | |
| ') | |
| elapsed=$(awk -v value="$etime" ' | |
| BEGIN { | |
| n = split(value, parts, /[-:]/) | |
| if (index(value, "-") > 0) { | |
| if (n != 4) { exit 1 } | |
| total = (parts[1] * 86400) + (parts[2] * 3600) + (parts[3] * 60) + parts[4] | |
| } else if (n == 3) { | |
| total = (parts[1] * 3600) + (parts[2] * 60) + parts[3] | |
| } else if (n == 2) { | |
| total = (parts[1] * 60) + parts[2] | |
| } else { | |
| exit 1 | |
| } | |
| print total | |
| } | |
| ') |
References
- To improve maintainability, ensure that commands like
awkare independently robust and handle potential errors such as division-by-zero, even when outer logic already provides a safeguard.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/scripts/stats-wrapper.sh:
- Around line 116-118: The current error path in the stats rotation logic
deletes STATS_PIDFILE whenever _stats_process_elapsed_seconds "$old_pid" fails,
which can falsely remove the pidfile for live processes when the elapsed-time
lookup fails; change the failure handling so you do NOT rm -f "$STATS_PIDFILE"
immediately — instead, on _stats_process_elapsed_seconds failure check process
liveness (e.g. test with kill -0 "$old_pid" or a portable ps check) and only
remove STATS_PIDFILE if the process is confirmed not running; otherwise preserve
the pidfile and return a non-destructive error/exit so the existing stats worker
isn’t restarted unnecessarily.
- Around line 43-52: The two ps probes that assign to elapsed and etime can fail
under set -euo pipefail and must be guarded so the script can continue to the
fallback; update the assignments to use a safe form like elapsed=$(ps -p "$pid"
-o etimes= 2>/dev/null || true) and etime=$(ps -p "$pid" -o etime= 2>/dev/null
|| true) (or wrap each ps call in an if/command substitution check) so a
non-zero exit from ps does not abort the function and the subsequent etime
fallback and PID-file cleanup still run.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fc3203d9-2338-4f2a-8140-3f4cda81d7d5
📒 Files selected for processing (1)
.agents/scripts/stats-wrapper.sh
|
Supervisor relaunch completed after launch-validation mismatch on the prior worker session.\n\n- Model: gpt-5.3-codex (openai/gpt-5.3-codex)\n- Action: replaced stale worker process with a fresh PR-advancement worker on the same branch\n- Reason: previous run showed no checkpoint progress and failed launch validation heuristics\n- Next: worker will address requested changes and post a fix summary in this PR |
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 12 20:20:42 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
Addressed review feedback in 80ee81f.
Local verification run:
CI status after push: all previously pending checks are now passing. |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review Both CHANGES_REQUESTED issues have been addressed in commit 80ee81f:
Please re-review. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 5305
Fix 1 — Fix 2 — non-destructive skip on live-but-unmeasurable PIDs (lines 116–122):
Additional correctness notes observed:
All previously requested changes are addressed. The implementation is portable, shellcheck-clean, and correct. [review] 🧠 Learnings used |
|
@coderabbitai review Both findings from your CHANGES_REQUESTED review are addressed in commit 80ee81f (already on this branch):
Please re-review to clear the CHANGES_REQUESTED state. |
|
Triggering a fresh review pass to update the PR state. 🧠 Learnings used✅ Actions performedReview triggered.
|
- (1) awk now uses exit 1 for invalid etime input (already present); add || true guard on the awk command substitution so set -euo pipefail cannot abort the function on parse failure before the ^[0-9]+$ guard runs - (2)+(3) ps fallback already distinguishes process-not-found from lookup-failure via kill -0 gate (commit 80ee81f); pidfile only deleted when process is confirmed gone, preserved with return 1 when live
🔍 Code Quality Report�[0;35m[MONITOR]�[0m Code Review Monitoring Report �[0;34m[INFO]�[0m Latest Quality Status: �[0;34m[INFO]�[0m Recent monitoring activity: 📈 Current Quality Metrics
Generated on: Thu Mar 12 22:18:32 UTC 2026 Generated by AI DevOps Framework Code Review Monitoring |
|
|
CodeRabbit findings addressed: (1) awk now uses exit 1 for invalid input (already present in 80ee81f); added |
|
@coderabbitai review All three CodeRabbit CHANGES_REQUESTED findings are now addressed in commit 268ad84:
Please re-review to clear the CHANGES_REQUESTED state. |
🧩 Analysis chain🏁 Script executed: Length of output: 5313
Finding 1 — awk Findings 2 + 3 —
Both The implementation is portable across macOS/Linux, shellcheck-clean, and fully addresses the original PR 🧠 Learnings used✅ Actions performedReview triggered.
|
…per.sh All 3 PR #4232 review findings are confirmed addressed: - CRITICAL (coderabbit): ps probes guarded with || true so set -euo pipefail cannot abort before etime fallback runs (commit 80ee81f) - HIGH (coderabbit): pidfile preserved for live PIDs via kill -0 check; only removed when process is confirmed gone (commit 80ee81f) - MEDIUM (gemini): awk uses exit 1 for invalid input; || true on command substitution is intentional to prevent set -e abort on parse failure (commit 268ad84) Adds inline comment explaining the || true design rationale so future reviewers understand it is not a suppression but a deliberate guard.
…er.sh (#4288) * fix: document GH#4271 quality-debt findings as verified in stats-wrapper.sh All 3 PR #4232 review findings are confirmed addressed: - CRITICAL (coderabbit): ps probes guarded with || true so set -euo pipefail cannot abort before etime fallback runs (commit 80ee81f) - HIGH (coderabbit): pidfile preserved for live PIDs via kill -0 check; only removed when process is confirmed gone (commit 80ee81f) - MEDIUM (gemini): awk uses exit 1 for invalid input; || true on command substitution is intentional to prevent set -e abort on parse failure (commit 268ad84) Adds inline comment explaining the || true design rationale so future reviewers understand it is not a suppression but a deliberate guard. * refactor: replace history-referencing comments with logic-focused comments in stats-wrapper.sh Replace GH issue/PR references in robustness notes with evergreen comments that explain the technical why (set -euo pipefail guard behaviour, awk exit semantics, empty-output handling) rather than the development history.



Summary
pselapsed-time lookup (etimesfirst, then parsedetime) so dedup checks remain reliable on macOS and LinuxValidation
shellcheck .agents/scripts/stats-wrapper.shbash -lc 'source ".agents/scripts/stats-wrapper.sh"; tmpdir=$(mktemp -d); STATS_PIDFILE="$tmpdir/stats.pid"; STATS_LOGFILE="$tmpdir/stats.log"; STATS_TIMEOUT=600; sleep 30 & pid=$!; printf "%s 0\n" "$pid" >"$STATS_PIDFILE"; if check_stats_dedup; then rc=$?; else rc=$?; fi; alive=0; if kill -0 "$pid" 2>/dev/null; then alive=1; fi; kill "$pid" 2>/dev/null || true; wait "$pid" 2>/dev/null || true; printf "invalid_epoch rc=%s alive=%s\n" "$rc" "$alive"'bash -lc 'source ".agents/scripts/stats-wrapper.sh"; tmpdir=$(mktemp -d); STATS_PIDFILE="$tmpdir/stats.pid"; STATS_LOGFILE="$tmpdir/stats.log"; STATS_TIMEOUT=2; sleep 30 & pid=$!; old_epoch=$(( $(date +%s) - 600 )); printf "%s %s\n" "$pid" "$old_epoch" >"$STATS_PIDFILE"; if check_stats_dedup; then rc=$?; else rc=$?; fi; alive=0; if kill -0 "$pid" 2>/dev/null; then alive=1; fi; kill "$pid" 2>/dev/null || true; wait "$pid" 2>/dev/null || true; printf "stale_epoch rc=%s alive=%s\n" "$rc" "$alive"'Closes #4231
Summary by CodeRabbit