Skip to content

Conversation

@marcusquinn
Copy link
Owner

@marcusquinn marcusquinn commented Feb 7, 2026

Summary

  • Workers launched by cron pulse were dying after ~1-2 minutes because background subshells get killed when the parent cron script exits. Fixed with nohup + disown.
  • Dispatch loop had SC2319 bug: $? after if cmd_dispatch captured the if-condition exit code, not the actual dispatch exit code. Refactored to cmd || exit=$? pattern.
  • Added pulse-level health check flag so the 8-second health probe only runs once per pulse invocation, not once per task.
  • Dispatch failures are now logged instead of swallowed by 2>/dev/null.

Before: 23 queued, 0 running, 0 dispatched (complete deadlock)
After: 21 queued, 3 running, 8 deployed

Resolves dispatch deadlock in batch quality-hardening-8h.

Summary by CodeRabbit

  • Chores
    • Internal infrastructure optimizations to improve system reliability and performance under scheduled execution environments.

…eadlock

- Use nohup + disown for worker processes so they survive parent (cron) exit
- Fix SC2319: capture cmd_dispatch exit code correctly (was using $? after if)
- Add pulse-level health check caching to avoid 8s probe per task per pulse
- Log dispatch failures instead of swallowing with 2>/dev/null
- Apply same nohup fix to cmd_reprompt worker launch

Root cause: workers launched as background subshells were killed when the
2-minute cron pulse script exited, causing all tasks to die at ~1-2 minutes.
The dispatch loop also had a bug where $? captured the if-condition exit
code instead of cmd_dispatch's actual exit code.
@marcusquinn marcusquinn merged commit 5901e1d into main Feb 7, 2026
5 of 6 checks passed
@gemini-code-assist
Copy link

Summary of Changes

Hello @marcusquinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses critical stability and performance issues within the supervisor script. It resolves a deadlock in the task dispatching mechanism and prevents premature termination of worker processes launched by cron. Additionally, it optimizes model health checks to reduce overhead and enhances error logging for better debugging. These changes significantly improve the reliability and efficiency of task execution, as demonstrated by the transition from a complete deadlock to successful task deployment.

Highlights

  • Worker Process Longevity: Implemented nohup and disown for background worker processes launched by cron, preventing them from being terminated prematurely when the parent cron script exits.
  • Dispatch Deadlock Resolution: Corrected a shell scripting bug (SC2319) where the exit code of cmd_dispatch was not correctly captured, leading to a dispatch deadlock. The logic now accurately captures and handles dispatch outcomes, including specific logging for concurrency limits and unavailable providers.
  • Optimized Health Checks: Introduced a pulse-level health verification flag (_PULSE_HEALTH_VERIFIED) to ensure that the 8-second model health probe runs only once per supervisor pulse invocation, rather than for every individual task, significantly reducing overhead.
  • Improved Error Logging: Dispatch failures are no longer silently suppressed (2>/dev/null) and are now explicitly logged, providing better visibility into issues and aiding in debugging.
Changelog
  • .agents/scripts/supervisor-helper.sh
    • Implemented nohup and disown for background worker processes to ensure they persist after the parent cron script exits.
    • Refactored cmd_dispatch calls to correctly capture and handle its exit status, resolving a dispatch deadlock.
    • Introduced _PULSE_HEALTH_VERIFIED flag to optimize model health checks, preventing redundant 8-second probes within a single pulse.
    • Removed 2>/dev/null from cmd_dispatch calls, enabling logging of dispatch failures.
    • Added specific log messages for concurrency limits, unavailable providers, and general dispatch failures.
  • TODO.md
    • Updated the BLOCKED status for task t140 to include ambiguous_ai_unavailable, reflecting a change in dispatch failure reasons.
Activity
  • No human activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 7, 2026

Caution

Review failed

The pull request is closed.

Walkthrough

This PR optimizes supervisor health checks using a pulse-scoped cache flag to prevent redundant probes within single pulses, replaces direct background task invocation with nohup and disown for robust process lifecycle management, and introduces explicit exit-code handling in dispatch loops to properly respond to concurrency limits and provider unavailability.

Changes

Cohort / File(s) Summary
Supervisor health & dispatch optimization
.agents/scripts/supervisor-helper.sh
Introduces _PULSE_HEALTH_VERIFIED flag for per-pulse health check caching to eliminate redundant probes; replaces direct background task invocation with nohup bash -c ... + disown for robust PID handling; adds explicit exit-code handling in dispatch loops (0=success, 2=concurrency limit, 3=provider unavailable) with appropriate control flow; resets cache flag at pulse initialization.
Backlog documentation update
TODO.md
Appends additional BLOCKED reason to item t140 noting an observed error mode ambiguous_ai_unavailable alongside existing backend_infrastructure_error.

Sequence Diagram(s)

sequenceDiagram
    participant Pulse as Pulse Executor
    participant Health as Health Checker
    participant Dispatch as Dispatch Handler
    participant Worker as Background Worker
    participant Provider as Model Provider

    Pulse->>Pulse: Reset _PULSE_HEALTH_VERIFIED flag
    Pulse->>Health: check_model_health()
    
    alt _PULSE_HEALTH_VERIFIED set
        Health-->>Pulse: Return success (cached)
    else Health not verified
        Health->>Health: Check cache (8-second probe)
        alt Cache miss or stale
            Health->>Provider: Probe health endpoint
            Provider-->>Health: Health status
        end
        Health->>Health: Set _PULSE_HEALTH_VERIFIED
        Health-->>Pulse: Return success/failure
    end
    
    alt Health check passed
        Pulse->>Dispatch: Execute dispatch loop
        Dispatch->>Dispatch: Start nohup bash -c worker command
        Dispatch->>Dispatch: Capture dispatch_exit code
        
        alt dispatch_exit == 0
            Dispatch->>Dispatch: Success, continue
        else dispatch_exit == 2
            Dispatch->>Dispatch: Concurrency limit, stop
        else dispatch_exit == 3
            Dispatch->>Dispatch: Provider unavailable, stop
        else other
            Dispatch->>Dispatch: Dispatch failure, log & handle
        end
        
        Dispatch->>Worker: disown process (background survival)
        Worker-->>Pulse: Process runs independently
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🧬 Pulse-Scoped Serenity

Health checks cached within each beat,
No probes repeated, crisp and sweet.
Background workers now disown with grace,
Exit codes guide the dispatch race.
Cron survives what once would fall—
Robust and ready, standing tall. 🚀

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bugfix/supervisor-dispatch-deadlock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link

github-actions bot commented Feb 7, 2026

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 25 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Sat Feb 7 14:39:02 UTC 2026: Code review monitoring started
Sat Feb 7 14:39:03 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 25
Sat Feb 7 14:39:03 UTC 2026: Qlty - 0 issues found, auto-formatting applied
Sat Feb 7 14:39:05 UTC 2026: Codacy analysis completed with auto-fixes

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 25
  • VULNERABILITIES: 0

Generated on: Sat Feb 7 14:39:08 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

sonarqubecloud bot commented Feb 7, 2026

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several important fixes and improvements to the supervisor script. The use of nohup and disown to prevent worker processes from dying when launched from cron is a solid solution. The refactoring of the dispatch loop to correctly capture exit codes and improve logging is a significant enhancement to the script's robustness and debuggability. Additionally, the introduction of a pulse-level health check flag is a smart optimization that will reduce unnecessary overhead. The code changes are well-implemented, and there is one suggestion regarding code duplication to improve maintainability.

Comment on lines 1969 to 1977
# Also start background process as fallback (Tabby may not support OSC 1337)
(cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &
# Use nohup + disown to survive parent (cron) exit
nohup bash -c "cd '${worktree_path}' && $(printf '%q ' "${cmd_parts[@]}") > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'" &>/dev/null &
else
# Headless: background process
(cd "$worktree_path" && "${cmd_parts[@]}" > "$log_file" 2>&1; echo "EXIT:$?" >> "$log_file") &
# Use nohup + disown to survive parent (cron) exit — without this,
# workers die after ~2 minutes when the cron pulse script exits
nohup bash -c "cd '${worktree_path}' && $(printf '%q ' "${cmd_parts[@]}") > '${log_file}' 2>&1; echo \"EXIT:\$?\" >> '${log_file}'" &>/dev/null &
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The command to launch the background worker is duplicated in both the if (line 1971) and else (line 1976) branches. A very similar command is also used in the cmd_reprompt function (line 2560). This makes it harder to maintain if the dispatch logic needs to be changed in the future.

Consider extracting this complex command into a local helper function to avoid repetition and improve readability.

For example, you could create a helper function like this:

_dispatch_background_worker() {
    local work_path="$1"
    local log_path="$2"
    shift 2
    local -a command_parts=("$@")

    nohup bash -c "cd '${work_path}' && $(printf '%q ' "${command_parts[@]}") > '${log_path}' 2>&1; echo \"EXIT:\$?\" >> '${log_path}'" &>/dev/null &
}

And then call it from both branches, which would simplify the main function's logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant