Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions scripts/langchain/capability_check.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
Capability classification for agent issue intake.

Run with:
python scripts/langchain/capability_check.py --tasks-file tasks.md --acceptance-file acceptance.md
python scripts/langchain/capability_check.py \
--tasks-file tasks.md --acceptance-file acceptance.md
"""

from __future__ import annotations
Expand Down Expand Up @@ -240,7 +241,9 @@ def _fallback_classify(
{
"task": task,
"reason": "Requires external service credentials or configuration",
"suggested_action": "Provide credentials or have a human set up the external service.",
"suggested_action": (
"Provide credentials or have a human set up " "the external service."
Copy link

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an issue with string concatenation using adjacent string literals. The strings "Provide credentials or have a human set up " and "the external service." are separated by a space, which will result in them being concatenated as "Provide credentials or have a human set up the external service." This creates an awkward phrasing. The concatenation should either preserve "set up" as a single phrase or the strings should be adjusted to avoid the split in the middle of a phrase.

Suggested change
"Provide credentials or have a human set up " "the external service."
"Provide credentials or have a human set up the external service."

Copilot uses AI. Check for mistakes.
),
}
)
human_actions.append(f"External dependency setup required: {task}")
Expand Down
23 changes: 16 additions & 7 deletions scripts/langchain/followup_issue_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@
Analyze what went wrong and what SPECIFICALLY needs to change. Focus on:

1. **Which original acceptance criteria are actually still unmet?**
- Don't assume all criteria need rework. Only include criteria that the verification shows are genuinely incomplete.
- Don't assume all criteria need rework. Only include criteria that the
verification shows are genuinely incomplete.
- Rewrite criteria that were unclear or unmeasurable as clear, testable statements.

2. **What concrete code changes are needed?**
Expand All @@ -79,7 +80,8 @@

3. **Did previous iterations reveal blockers the next agent should avoid?**
- Only include if there's specific, detailed information about what didn't work and why
- "Iteration 3 failed" is NOT useful. "Iteration 3 attempted X approach but failed because Y, so try Z instead" IS useful
- "Iteration 3 failed" is NOT useful. "Iteration 3 attempted X approach but
failed because Y, so try Z instead" IS useful

Output JSON:
{{
Expand Down Expand Up @@ -115,11 +117,13 @@
- Sized appropriately: not too big ("fix everything") or too small ("add a comma")

**Tasks MUST NOT be:**
- Verification concerns restated as tasks (e.g., "The safety rules section is incomplete" is NOT a task)
- Verification concerns restated as tasks (e.g., "The safety rules section
is incomplete" is NOT a task)
- Original acceptance criteria restated as tasks
- Vague actions like "improve", "ensure", "address concerns about"

**Deferred items:** Anything requiring credentials, external APIs, manual testing, or human decisions
**Deferred items:** Anything requiring credentials, external APIs, manual
testing, or human decisions

Output JSON:
{{
Expand Down Expand Up @@ -217,11 +221,13 @@
```

## Critical Rules
1. Do NOT include "Remaining Unchecked Items" or "Iteration Details" sections unless they contain specific, useful failure context
1. Do NOT include "Remaining Unchecked Items" or "Iteration Details" sections
unless they contain specific, useful failure context
2. Tasks should be concrete actions, not verification concerns restated
3. Acceptance criteria must be testable (not "all concerns addressed")
4. Keep the main body focused - hide background/history in the collapsible section
5. Do NOT include the entire analysis object - only include specific failure contexts from `blockers_to_avoid`
5. Do NOT include the entire analysis object - only include specific failure
contexts from `blockers_to_avoid`

Output the complete markdown issue body.
""".strip()
Expand Down Expand Up @@ -548,7 +554,10 @@ def _prepare_iteration_details(codex_log: str) -> str:
useful_lines.append(context_block)

if not useful_lines:
return "Previous iterations completed without recorded failures. No specific blockers to avoid."
return (
"Previous iterations completed without recorded failures. "
"No specific blockers to avoid."
)

# Deduplicate and limit length
unique_blocks = list(dict.fromkeys(useful_lines))[:5] # Max 5 failure contexts
Expand Down
3 changes: 2 additions & 1 deletion scripts/langchain/issue_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -792,7 +792,8 @@ def apply_suggestions(
# If GitHub Models hit token limit, retry with OpenAI API
if _is_token_limit_error(e) and provider == "github-models":
print(
"GitHub Models token limit hit in apply_suggestions, retrying with OpenAI API...",
"GitHub Models token limit hit in apply_suggestions, "
"retrying with OpenAI API...",
file=sys.stderr,
)
openai_client_info = _get_llm_client(force_openai=True)
Expand Down
22 changes: 14 additions & 8 deletions scripts/langchain/pr_verifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@
from pydantic import BaseModel, Field, ValidationError

PR_EVALUATION_PROMPT = """
You are reviewing a **merged** pull request to evaluate whether the code changes meet the documented acceptance criteria.
You are reviewing a **merged** pull request to evaluate whether the code
changes meet the documented acceptance criteria.

**IMPORTANT: This verification runs AFTER the PR has been merged.** Therefore:
- Do NOT evaluate CI status, workflow runs, or pending checks - these are irrelevant post-merge
Expand Down Expand Up @@ -484,8 +485,10 @@ def evaluate_pr(
Args:
context: The PR context markdown (issue body, PR description, etc.)
diff: Optional PR diff or summary
model: Optional model name (e.g., 'gpt-4o', 'gpt-5.2', 'o1-mini'). Uses default if not specified.
provider: Optional provider ('openai' or 'github-models'). Auto-selects if not specified.
model: Optional model name (e.g., 'gpt-4o', 'gpt-5.2', 'o1-mini').
Uses default if not specified.
provider: Optional provider ('openai' or 'github-models').
Auto-selects if not specified.

Returns:
EvaluationResult with verdict, scores, and concerns.
Expand Down Expand Up @@ -525,7 +528,8 @@ def evaluate_pr(
return result
except Exception as fallback_exc:
return _fallback_evaluation(
f"Primary ({provider_name}): {exc}; Fallback ({fallback_provider_name}): {fallback_exc}"
f"Primary ({provider_name}): {exc}; "
f"Fallback ({fallback_provider_name}): {fallback_exc}"
)
return _fallback_evaluation(f"LLM invocation failed: {exc}")

Expand Down Expand Up @@ -622,9 +626,8 @@ def format_comparison_report(results: list[EvaluationResult]) -> str:
summary_source = result.summary or result.raw_content or ""
summary = _compact_text(summary_source, limit=200) if summary_source else "N/A"
model_name = result.model or "N/A"
lines.append(
f"| {labels[index]} | {model_name} | {result.verdict} | {_format_confidence(result.confidence)} | {summary} |"
)
conf = _format_confidence(result.confidence)
lines.append(f"| {labels[index]} | {model_name} | {result.verdict} | {conf} | {summary} |")
lines.append("")

# Add expandable full details for each provider
Expand Down Expand Up @@ -748,7 +751,10 @@ def main() -> None:
parser.add_argument(
"--provider",
choices=["openai", "github-models"],
help="LLM provider: 'openai' (requires OPENAI_API_KEY) or 'github-models' (uses GITHUB_TOKEN).",
help=(
"LLM provider: 'openai' (requires OPENAI_API_KEY) or "
"'github-models' (uses GITHUB_TOKEN)."
),
)
parser.add_argument(
"--model2",
Expand Down
Loading