Conversation
Automated sync from stranske/Workflows Template hash: 0331641bc992 Changes synced from sync-manifest.yml
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
🤖 Keepalive Loop StatusPR #140 | Agent: Codex | Iteration 0/5 Current State
🔍 Failure Classification| Error type | infrastructure | |
|
Status | ✅ no new diagnostics |
There was a problem hiding this comment.
Pull request overview
This PR syncs workflow templates from the central stranske/Workflows repository, adding quality metrics and evidence-based decision making to the keepalive loop system. The changes introduce BS detection capabilities, enhanced productivity tracking with multi-signal analysis, and early stopping logic based on diminishing returns patterns.
Key changes:
- Added quality metric parameters (raw confidence, quality warnings, data quality, effort score, analysis text length) to track analysis session health
- Enhanced productivity scoring system that considers file changes, task completion trends, and historical activity
- Implemented early stopping logic to detect diminishing returns patterns and prevent unnecessary iterations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
.github/workflows/agents-keepalive-loop.yml |
Added 5 new quality metric parameters passed to the keepalive loop summary script for BS detection and evidence-based decisions |
.github/scripts/keepalive_loop.js |
Implemented multi-signal productivity scoring, diminishing returns detection, quality metrics display with warnings, and data loss alerting |
| if (lastFilesChanged > 0) productivityScore += Math.min(40, lastFilesChanged * 10); | ||
| if (tasksCompletedSinceLastRound > 0) productivityScore += Math.min(40, tasksCompletedSinceLastRound * 20); | ||
| if (prevFilesChanged > 0 && iteration > 1) productivityScore += 10; // Recent historical activity | ||
| if (!hasRecentFailures) productivityScore += 10; // No failures is a positive signal |
There was a problem hiding this comment.
The productivity score calculation can exceed 100 in edge cases. For example, if lastFilesChanged is 4+ (40 points), tasksCompletedSinceLastRound is 2+ (40 points), there's historical activity (10 points), and no failures (10 points), the score would be 100. However, if lastFilesChanged is 5, the formula Math.min(40, lastFilesChanged * 10) would give 40, but the total could still be 100. The issue is that the score is described as 0-100 but the logic doesn't enforce a maximum of 100. Consider capping the final score with Math.min(100, productivityScore) after all additions.
| if (!hasRecentFailures) productivityScore += 10; // No failures is a positive signal | |
| if (!hasRecentFailures) productivityScore += 10; // No failures is a positive signal | |
| // Enforce the documented 0-100 range for the productivity score | |
| productivityScore = Math.min(100, productivityScore); |
| // Continue past max if productive work is happening | ||
| // But stop earlier if we detect diminishing returns pattern | ||
| const shouldStopForMaxIterations = iteration >= maxIterations && !isProductive; | ||
| const shouldStopEarly = diminishingReturns && iteration >= Math.ceil(maxIterations * 0.6); |
There was a problem hiding this comment.
The early stopping condition for diminishing returns uses Math.ceil(maxIterations * 0.6) which means it can trigger at 60% of max iterations. However, this doesn't account for the case where maxIterations is very small (e.g., 1 or 2). When maxIterations is 1, this evaluates to Math.ceil(0.6) = 1, which means it could stop on the first iteration if there's a diminishing returns pattern. Consider adding a minimum iteration threshold (e.g., iteration >= Math.max(2, Math.ceil(maxIterations * 0.6))) to prevent premature stopping.
| const shouldStopEarly = diminishingReturns && iteration >= Math.ceil(maxIterations * 0.6); | |
| const shouldStopEarly = | |
| diminishingReturns && | |
| iteration >= Math.max(2, Math.ceil(maxIterations * 0.6)); |
| } | ||
|
|
||
| // Analysis data health check | ||
| if (analysisTextLength > 0 && analysisTextLength < 200 && agentFilesChanged > 0) { |
There was a problem hiding this comment.
The data loss alert uses a hardcoded threshold of 200 characters, but this threshold isn't documented or justified. Different types of analysis might naturally have different text lengths, and 200 characters seems arbitrary. Consider making this threshold configurable or adding a comment explaining why 200 characters is the chosen threshold for detecting data loss.
|
|
||
| // Quality metrics for BS detection and evidence-based decisions | ||
| const llmRawConfidence = toNumber(inputs.llm_raw_confidence ?? inputs.llmRawConfidence, llmConfidence); | ||
| const llmConfidenceAdjusted = toBool(inputs.llm_confidence_adjusted ?? inputs.llmConfidenceAdjusted, false); |
There was a problem hiding this comment.
The llmConfidenceAdjusted parameter is extracted from inputs here but is never passed from the workflow file. The workflow only passes quality-related outputs like llm_raw_confidence, llm_quality_warnings, etc., but doesn't include llm_confidence_adjusted or llm-confidence-adjusted. This means the confidence adjustment warning at line 1289 will never be displayed since llmConfidenceAdjusted will always be false. Either add the missing input parameter in the workflow file or remove this unused variable.
| const llmConfidenceAdjusted = toBool(inputs.llm_confidence_adjusted ?? inputs.llmConfidenceAdjusted, false); | |
| const llmConfidenceAdjusted = | |
| Number.isFinite(llmRawConfidence) && | |
| Number.isFinite(llmConfidence) && | |
| llmRawConfidence !== llmConfidence; |
| // Parse warnings (could be JSON array or comma-separated) | ||
| let warnings = []; | ||
| try { | ||
| warnings = JSON.parse(llmQualityWarnings); | ||
| } catch { | ||
| warnings = llmQualityWarnings.split(';').filter(w => w.trim()); | ||
| } | ||
| for (const warning of warnings) { | ||
| summaryLines.push(`- ⚠️ ${warning.trim()}`); |
There was a problem hiding this comment.
The warning parsing logic assumes warnings are semicolon-delimited when JSON parsing fails, but there's no guarantee that the warnings string format uses semicolons. If warnings come in a different format (comma-separated, newline-separated, or plain text), this will fail to parse them correctly. Consider adding more robust parsing that handles multiple common delimiters or documenting the expected format more explicitly.
| // Parse warnings (could be JSON array or comma-separated) | |
| let warnings = []; | |
| try { | |
| warnings = JSON.parse(llmQualityWarnings); | |
| } catch { | |
| warnings = llmQualityWarnings.split(';').filter(w => w.trim()); | |
| } | |
| for (const warning of warnings) { | |
| summaryLines.push(`- ⚠️ ${warning.trim()}`); | |
| // Parse warnings (could be JSON array or delimited string) | |
| let warnings = []; | |
| let rawWarnings = String(llmQualityWarnings); | |
| try { | |
| const parsed = JSON.parse(llmQualityWarnings); | |
| if (Array.isArray(parsed)) { | |
| warnings = parsed; | |
| } else if (typeof parsed === 'string') { | |
| rawWarnings = parsed; | |
| } else if (parsed && typeof parsed === 'object') { | |
| rawWarnings = Object.values(parsed).join('\n'); | |
| } | |
| } catch { | |
| // Fall back to treating llmQualityWarnings as a raw delimited string | |
| } | |
| if (!warnings.length && rawWarnings) { | |
| // Support multiple common delimiters: semicolon, comma, newline | |
| warnings = String(rawWarnings) | |
| .split(/[\n;,]/) | |
| .map(w => w.trim()) | |
| .filter(Boolean); | |
| } | |
| for (const warning of warnings) { | |
| summaryLines.push(`- ⚠️ ${String(warning).trim()}`); |
Sync Summary
Files Updated
Files Skipped
Review Checklist
Source: stranske/Workflows
Manifest:
.github/sync-manifest.yml