Skip to content

Comments

t247: Reduce clean_exit_no_signal retries#1038

Merged
marcusquinn merged 1 commit intomainfrom
feature/t247
Feb 11, 2026
Merged

t247: Reduce clean_exit_no_signal retries#1038
marcusquinn merged 1 commit intomainfrom
feature/t247

Conversation

@marcusquinn
Copy link
Owner

Summary

Reduces clean_exit_no_signal retries through three coordinated improvements that prevent context exhaustion, preserve worker output, and eliminate unnecessary retries. This meta-task combines t247.1 and t247.2 to address the root cause of 16 retry incidents (100% eventually succeeded, but each retry wasted ~300s).

Problem

Workers that exhaust their context window exit cleanly (EXIT:0) without emitting completion signals. The supervisor sees:

  • No FULL_LOOP_COMPLETE or TASK_COMPLETE signal
  • No PR URL in log or DB
  • Clean exit code (0)

Result: retry:clean_exit_no_signal → full re-dispatch → ~300s wasted per retry.

Solution (Three-Tier Defense)

1. Context-Remaining Guard (t247.1 - PR #987)

Location: .agents/scripts/loop-common.sh, ralph-loop-helper.sh

Mechanism: Proactive detection before context exhaustion:

  • Iteration threshold: Triggers at 80% of max_iterations
  • Explicit markers: Regex match for "context limit" / "token limit" in output
  • Empty output: Tool produced <100 bytes after first iteration
  • Output shrinkage: Output drops below 20% of rolling average (3+ iterations baseline)

Action: When triggered:

  1. loop_emergency_push() — Commits and pushes any uncommitted work
  2. loop_emit_completion_signal() — Emits <promise>FULL_LOOP_COMPLETE</promise> to stdout
  3. Clean exit before context runs out

Result: Worker signals completion before dying, supervisor sees FULL_LOOP_COMPLETE → no retry.

2. Supervisor Auto-PR Creation (t247.2 - PR #988)

Location: .agents/scripts/supervisor-helper.sh:5303 (new function auto_create_pr_for_task())

Mechanism: Git heuristic Tier 2.5 enhancement (line 6134):

if [[ "$branch_commits" -gt 0 ]]; then
    if [[ -n "$meta_pr_url" ]]; then
        echo "complete:${meta_pr_url}"
    else
        # NEW: Auto-create PR instead of returning task_only
        auto_pr_url=$(auto_create_pr_for_task "$task_id" "$git_dir" "$task_branch" "$repo_slug_detect")
        if [[ -n "$auto_pr_url" ]]; then
            echo "complete:${auto_pr_url}"
        else
            echo "complete:task_only"  # Graceful fallback
        fi
    fi
fi

Action: When worker has commits on branch but no PR:

  1. Push branch to remote if needed
  2. Create draft PR via gh pr create with commit summary
  3. Persist URL via link_pr_to_task() (t232)
  4. Return complete:<PR_URL> → normal PR lifecycle (review, merge, deploy)

Result: Preserves worker output for review instead of retrying. Saves ~300s per occurrence.

3. Git Heuristic Coverage (Already Comprehensive)

Location: .agents/scripts/supervisor-helper.sh:6093-6153 (Tier 2.5)

Current Coverage:

  • ✅ Commits + PR URL → complete:<PR_URL>
  • ✅ Commits + no PR → Auto-create PR (NEW in t247.2) → complete:<PR_URL>
  • ✅ No commits + uncommitted changes → retry:work_in_progress
  • ✅ No commits + no changes → Falls through to AI eval (appropriate)

Result: Catches all concrete evidence of work before expensive AI eval. Only truly ambiguous cases reach Tier 3.

Flow Comparison

Before (t247)

Worker exhausts context → exits cleanly (EXIT:0)
→ No signal, no PR in log
→ evaluate_worker: "retry:clean_exit_no_signal"
→ Full re-dispatch (~300s)
→ Retry succeeds (100% success rate, but wasteful)

After (t247.1 + t247.2)

Worker approaching context limit
→ loop_context_guard() triggers
→ Commits work, pushes, emits FULL_LOOP_COMPLETE
→ evaluate_worker: "complete:<PR_URL>"
→ Normal PR lifecycle (no retry)

OR (if guard missed):

Worker exhausts context → exits cleanly
→ No signal, but commits exist on branch
→ evaluate_worker Tier 2.5: auto_create_pr_for_task()
→ Draft PR created, URL persisted
→ evaluate_worker: "complete:<PR_URL>"
→ Normal PR lifecycle (no retry)

Impact

  • Retry reduction: 16 incidents → 0 expected (context guard prevents, auto-PR catches stragglers)
  • Time saved: ~300s per avoided retry × 16 incidents = ~80 minutes saved
  • Worker efficiency: Uncommitted work preserved instead of lost
  • Supervisor efficiency: Fewer AI eval calls (Tier 2.5 resolves most cases)

Testing

  • t247.1: ShellCheck zero violations, bash syntax verified, backward compatible
  • t247.2: ShellCheck zero violations, graceful fallback if gh unavailable
  • Integration: Both PRs merged and deployed, no regressions observed

Related

  • t175: Git heuristic signals (foundation for Tier 2.5)
  • t232: Centralized PR-to-task linking (used by auto-PR)
  • t171: PR URL seeding from DB (reduces gh API calls)
  • t161: PR discovery fallback (validates PR belongs to task)

Task

Closes t247. Combines t247.1 (PR #987) and t247.2 (PR #988).

@gemini-code-assist
Copy link

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions
Copy link

🔍 Code Quality Report

�[0;35m[MONITOR]�[0m Code Review Monitoring Report

�[0;34m[INFO]�[0m Latest Quality Status:
SonarCloud: 0 bugs, 0 vulnerabilities, 46 code smells

�[0;34m[INFO]�[0m Recent monitoring activity:
Wed Feb 11 00:46:57 UTC 2026: Code review monitoring started
Wed Feb 11 00:46:58 UTC 2026: SonarCloud - Bugs: 0, Vulnerabilities: 0, Code Smells: 46

📈 Current Quality Metrics

  • BUGS: 0
  • CODE SMELLS: 46
  • VULNERABILITIES: 0

Generated on: Wed Feb 11 00:47:01 UTC 2026


Generated by AI DevOps Framework Code Review Monitoring

@sonarqubecloud
Copy link

@marcusquinn marcusquinn merged commit 2b04d95 into main Feb 11, 2026
10 checks passed
@marcusquinn marcusquinn deleted the feature/t247 branch February 11, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant